Please use this identifier to cite or link to this item:
http://10.1.7.192:80/jspui/handle/123456789/11351
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Sanghvi, Vidit | - |
dc.date.accessioned | 2022-11-07T09:31:46Z | - |
dc.date.available | 2022-11-07T09:31:46Z | - |
dc.date.issued | 2022-06-01 | - |
dc.identifier.uri | http://10.1.7.192:80/jspui/handle/123456789/11351 | - |
dc.description.abstract | Extracting data in digital form is one of the needed functionality for the companies who process the documents. Many companies does this by manual content writing into computers and it requires a lot of time and one of the tedious works to do. However, since application of optical character recognition has been in trend since few years after successfully transforming content from scanned and non-scanned images and documents to digital format with good amount of accuracy, we explore popular approaches with a goal of building application from scratch to parse the documents we have. We have discussed those approaches and it’s performance, however paper is mainly focused on implementing a system which transforms and stores the content in .xlsx (excel) format. The main goal of this project is to reduce time in processing documents which is currently done by humans at a goods transportation place which classifies the documents as safe to transfer the goods or not safe. The data we have is in native portable document format and also non-native documents. We face challenges parsing them, like handling tabular contents and more overhead of annotation timing. Lastly, we analyse the results we are getting from each approach. | en_US |
dc.publisher | Institute of Technology | en_US |
dc.relation.ispartofseries | 20MCED09; | - |
dc.subject | Computer 2020 | en_US |
dc.subject | Project Report | en_US |
dc.subject | Computer Project Report | en_US |
dc.subject | Project Report 2020 | en_US |
dc.subject | 20MCE | en_US |
dc.subject | 20MCED | en_US |
dc.subject | 20MCED09 | en_US |
dc.subject | CE (DS) | en_US |
dc.subject | DS 2020 | en_US |
dc.title | Document Data Extraction With Optical Character Recognition | en_US |
dc.type | Dissertation | en_US |
Appears in Collections: | Dissertation, CE (DS) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
20MCED09.pdf | 20MCED09 | 2.94 MB | Adobe PDF | ![]() View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.