Development of Frugal Speech Corpus for Low Resource Indian Languages

Soni, Sapna

Please use this identifier to cite or link to this item: http://10.1.7.192:80/jspui/handle/123456789/4867

Full metadata record

DC Field	Value	Language
dc.contributor.author	Soni, Sapna	-
dc.date.accessioned	2014-08-21T10:14:19Z	-
dc.date.available	2014-08-21T10:14:19Z	-
dc.date.issued	2014-06-01	-
dc.identifier.uri	http://hdl.handle.net/123456789/4867	-
dc.description.abstract	A speech corpus (or spoken corpus) is a database of audio files and text transcriptions. In speech technology, speech corpora are used, among other things, to create acoustic models (which can then be used with a speech recognition engine). Speech corpus is central element for training an acoustic model used in a speech recognition engine. In Linguistics, spoken corpora are used to do research into Phonetics, Conversation analysis, Dialectology and other _elds. Creation of speech corpus is a laborious, expensive and time-consuming task. Recording of speech _les is done manually by many speakers. Then do the transcriptions which is the process of converting the speech to it's corresponding text. Typically there are two types of Speech Corpora: Read Speech - for example, speakers are asked to read Book excerpts Broadcast news Lists of words Sequences of numbers Spontaneous Speech - for examples Dialogs - between two or more people Narratives - a person telling a story Map-tasks - one person explains a route on a map to another Appointment-tasks - two people try to find a common meeting time based on individual schedules Building speech recognition application for resource deficient languages is a challenge because of unavailability of speech corpus. This work proposes a mechanism to develop an inexpensive speech corpus for low resource Indian languages by exploiting existing collections of online speech data to build a frugal speech corpus.	en_US
dc.publisher	Institute of Technology	en_US
dc.relation.ispartofseries	12MICT41;	-
dc.subject	Computer 2012	en_US
dc.subject	Project Report 2012	en_US
dc.subject	Computer Project Report	en_US
dc.subject	Project Report	en_US
dc.subject	12MICT	en_US
dc.subject	12MICT41	en_US
dc.subject	ICT	en_US
dc.subject	ICT 2012	en_US
dc.subject	CE (ICT)	en_US
dc.title	Development of Frugal Speech Corpus for Low Resource Indian Languages	en_US
dc.type	Dissertation	en_US
Appears in Collections:	Dissertation, CE (ICT)

Files in This Item:

File	Description	Size	Format
12MICT41.pdf	12MICT41	3.46 MB	Adobe PDF	View/Open

Show simple item record

IR @ Nirma University