Document Data Extraction With Optical Character Recognition

Sanghvi, Vidit

Please use this identifier to cite or link to this item: http://10.1.7.192:80/jspui/handle/123456789/11351

Full metadata record

DC Field	Value	Language
dc.contributor.author	Sanghvi, Vidit	-
dc.date.accessioned	2022-11-07T09:31:46Z	-
dc.date.available	2022-11-07T09:31:46Z	-
dc.date.issued	2022-06-01	-
dc.identifier.uri	http://10.1.7.192:80/jspui/handle/123456789/11351	-
dc.description.abstract	Extracting data in digital form is one of the needed functionality for the companies who process the documents. Many companies does this by manual content writing into computers and it requires a lot of time and one of the tedious works to do. However, since application of optical character recognition has been in trend since few years after successfully transforming content from scanned and non-scanned images and documents to digital format with good amount of accuracy, we explore popular approaches with a goal of building application from scratch to parse the documents we have. We have discussed those approaches and it’s performance, however paper is mainly focused on implementing a system which transforms and stores the content in .xlsx (excel) format. The main goal of this project is to reduce time in processing documents which is currently done by humans at a goods transportation place which classifies the documents as safe to transfer the goods or not safe. The data we have is in native portable document format and also non-native documents. We face challenges parsing them, like handling tabular contents and more overhead of annotation timing. Lastly, we analyse the results we are getting from each approach.	en_US
dc.publisher	Institute of Technology	en_US
dc.relation.ispartofseries	20MCED09;	-
dc.subject	Computer 2020	en_US
dc.subject	Project Report	en_US
dc.subject	Computer Project Report	en_US
dc.subject	Project Report 2020	en_US
dc.subject	20MCE	en_US
dc.subject	20MCED	en_US
dc.subject	20MCED09	en_US
dc.subject	CE (DS)	en_US
dc.subject	DS 2020	en_US
dc.title	Document Data Extraction With Optical Character Recognition	en_US
dc.type	Dissertation	en_US
Appears in Collections:	Dissertation, CE (DS)

Files in This Item:

File	Description	Size	Format
20MCED09.pdf	20MCED09	2.94 MB	Adobe PDF	View/Open

Show simple item record

IR @ Nirma University