Implementation of a hardware system to assist illegible using a hidden Markov model based speaker independent, continuous speech recognition system for Sinhala language

Samankula, W.G.D.M.

Please use this identifier to cite or link to this item: http://repository.kln.ac.lk/handle/123456789/16886

Title:	Implementation of a hardware system to assist illegible using a hidden Markov model based speaker independent, continuous speech recognition system for Sinhala language
Authors:	Samankula, W.G.D.M.
Keywords:	Sinhala speech recognition Hidden Markov Model Feature Extraction Operate electrical appliances Microcontroller
Issue Date:	2016
Citation:	Samankula, W.G.D.M. (2016). Implementation of a hardware system to assist illegible using a hidden Markov model based speaker independent, continuous speech recognition system for Sinhala language. M.Phil. Thesis, University of Kelaniya.
Series/Report no.:	TH;1307
Abstract:	In this thesis, a speaker independent speech recognition system was built to recognize the continuous Sinhala speech sentences using the toolkit, HTK-3.4. I based on the statistical approach, Hidden Markov Model (HMM). Three hundred sentences were considered for recording. Data recordings were done with 50 males and 50 females and testing was performed by 10 speakers who had and had not participated for the training. The recognized sequence of words are the commands to automate home appliances such as light, television and radio etc., to help people with differently-able to operate equipment. The different feature extraction methods such as Mel Frequency Cepstral Coefficient (MFCC), Perceptual Linear Prediction (PLP), Linear Predictive Coding (LPC), Bark Frequency Cepstral Coefficients (BFCC), Linear Prediction Reflection Coefficients (LPREFC), LPC Cepstral Coefficients (LPCEPSTRA), log mel-filter bank channel outputs (FBANK) and linear mel-filter bank channel outputs (MELSPEC) were used with different number of feature parameters varied between 4 to 12 by adding log energy coefficients, and their first, second and third derivatives in order to find the optimal number of parameters for each method. The context-independent and contextdependent acoustic models: word-internal and cross-word triphones and tied state triphones were developed. Decision tree state clustering method was applied for creating tied state triphones and the optimal threshold values for the outlier threshold (RO) and the threshold controlling clustering termination (TB) were determined to create the phonetic decision tree in order to get the optimal result. Finally, tied state triphone based multiple mixture models were applied with 2 mixture, 4 mixture, 8 mixture, 16 mixture and 32 mixture systems. The comparison of above mentioned approaches is discussed in detail. The speech recognition system was physically implemented to provide access from a PC or laptop, based on Arduino UNO board (ATmega328 microcontroller). The identified command is transferred to the Arduino UNO board through serial communication and then a signal is transmitted using Radio Frequency (RF) to operate an electrical home appliance using a wireless transceiver module.
URI:	http://repository.kln.ac.lk/handle/123456789/16886
Appears in Collections:	MPhil / PhD Theses

Files in This Item:

File	Description	Size	Format
1307.pdf		173.22 kB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets