Please use this identifier to cite or link to this item: http://10.1.7.192:80/jspui/handle/123456789/11728
Title: Handwritten Gujarati Character Recognition using Machine Learning Approach
Authors: Sharma, Ankit
Keywords: Theses
IC Theses
Theses IT
Dr. Dipak Adhyaru
Dr. Tanish Zaveri
ITFIC016
ITFIC002
ITFEC008
TT000066
12EXTPHDE93
Issue Date: 2017
Publisher: Institute of Technology
Series/Report no.: TT000066;
Abstract: Handwritten character recognition is an active area of research. Over the past three decades, there has been increasing interest among researchers in problem related to the machine simulation of the human reading process. Optical Character Recognition (OCR) is the tool that is utilized to convert printed or handwritten scanned document into machine readable form/text. Handwritten character recognition is a challenging task and people are striving to convert handwritten literature to computer readable format. Recognising handwritten characters is difficult compared to printed charac- ters because handwritten characters may vary from person to person with respect to the individual writing style, size, curve, strokes and thickness of characters. Languages have played a major role in Indian history and they continue to in uence the lives of the Indians till date. Plentiful research on OCR techniques for Indian languages such as Hindi, Tamil, Bangla, Kannada, Gurumukhi and Malayalam has already been carried out. Development of OCR systems for Gujarati script is still in infancy and hence, there exists many unaddressed challenging problems for research community in this domain. This clearly necessitates the need to attend the task of handwritten Gujarati character recognition. This thesis addresses the issues of handwritten Gujarati character recognition. Gujarati is the mother tongue of people belong to Gujarat state in India. All over the world more than 65 million people use Gujarati language for their communication purpose. As Gujarat is one of the eminent state of India, Gujarati is a well-known and culturally rich language. Gujarati Character Recognition offers more difficulties like the most other Indian languages relative to the western languages due to these reasons: (a) number of classes are higher, (b) structure of characters in Gujarati script contains curves, holes and strokes which result in significant variations in writing style of different persons, (c) presence of similar looking characters (d) unavailability of standard dataset for experimentation and validation. One of the significant contributions of proposed work is towards the development of large and representative datasets for the task of recognising handwritten Gujarati characters and numerals. Benchmark datasets having 88,000 handwritten Gujarati character images and 14,000 handwritten Gujarati numeral images are developed. Special forms are utilized for dataset collection and isolated characters are extracted from these forms. Preprocessing steps including noise removal, size normalization, binarization and thinning are applied on each segmented numeral/character image. Systematic and exhaustive experiments are carried out on these developed datasets using different kinds of features and their fusion. Zone based, projection profiles based and chain code based features are employed as individual features. It is also proposed to use the fusion of these features. Few novel features are also proposed to represent handwritten Gujarati characters. These features include features extracted based on structural decomposition, zone pattern matching and normalized cross correlation. Methods based on artificial neural network (ANN), support vector machine (SVM) and naive Bayes (NB) classiffier are used for handwritten Gujarati character and numeral recognition. In case of individual features, chain code based features provided higher recognition accuracy values compared to other features which were 99.25% and 99.47% with polynomial SVM for numerals and characters datasets respectively. In case of fusion based features, fusion of chain code based and zoning based features provided best results compared to other fusion based features. Proposed structural decomposition based features provided highest accuracy of 99.48% with polynomial SVM for handwritten characters. Experimental results show significant improvement over state-of-the-art and validate our proposals.
URI: http://10.1.7.192:80/jspui/handle/123456789/11728
Appears in Collections:Ph.D. Research Reports

Files in This Item:
File Description SizeFormat 
TT000066.pdfTT0000662.54 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.