Please use this identifier to cite or link to this item:
http://10.1.7.192:80/jspui/handle/123456789/5872
Title: | An Unsupervised Web Data Extraction System |
Authors: | Patel, Disha |
Keywords: | Computer 2013 Project Report 2013 Computer Project Report Project Report 13MCEI 13MCEI25 INS INS 2013 CE (INS) |
Issue Date: | 1-Jun-2015 |
Publisher: | Institute of Technology |
Series/Report no.: | 13MCEI25; |
Abstract: | The web has become a large collection of many unstructured data or documents. To search for useful information on the web, search engines are to be used. Even from the retrieved results, users are required to search within those documents to Find informa- tion. So it has become difficult to extract the information easily. To solve this issue, use of different web data extraction techniques is to be done. To achieve this goal, few unsupervised web data extraction systems have been studied and there are many di er- ent techniques available for information extraction. A survey for the existing Roadrunner Algorithm which is effcient for data extraction is done. Still, it has few limitations and to overcome that, an approach is proposed which uses Mining Data Records and Tree Align- ment technique for processing of the input HTML pages. An experiment is performed to compare both the results and it is able to overcome the limitations of Roadrunner. So, it can be used to extract the useful data and get the desired results. |
URI: | http://hdl.handle.net/123456789/5872 |
Appears in Collections: | Dissertation, CE (INS) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
13MCEI25.pdf | 13MCEI25 | 2.15 MB | Adobe PDF | ![]() View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.