Arabic Speech Recognition System in Noisy Environments
Keywords:
Feature extraction, likelihood evaluation, speech recognition, Hidden Markov Model, word error rateAbstract
Speech recognition can be considered as one of the promising techniques. The world's technology behemoths put voice-enabled devices at the heart of their strategy, and as a result, speech recognition has become one of the most active areas of research. Although feature extraction and likelihood evaluation techniques have been developed and improved over the last decade, they still lack innovation. In this research paper, Multivariate Hidden Markov Model (HMM) investigates the performance of conventional features of perceptual linear production (PLP) and RASTA-PLP to design a robust and reliable Arabic speech recognition system (ASR). For training and testing purposes, the proposed system was evaluated using different noisy data sets of human voice. These small vocabulary isolated speech data set contains the pronunciations of 24 Arabic words. Consonant-Vowel Consonant-Vowel Consonant-Vowel (CVCVCV) structure was recorded from 19 Arabic native speakers, with each speaker saying the same word three times (1368 words). Data is saved separately in a wave file format sampled with 48k sampling rate and 32 bits depth. The system was trained in phonetically rich and balanced Arabic speech words list, 10 speakers * 24 words * 3 times, 720 words total and tested with 9 speakers * 24 words* 3 times *, 648 words total). Using test data word vocabulary, the system obtained a very good word recognition accuracy result of 93.82% using PLP and 95.06% using RASTA-PLP.