Please use this identifier to cite or link to this item:
http://10.1.7.192:80/jspui/handle/123456789/4848
Title: | Web Page Classification |
Authors: | Joshi, Rutu |
Keywords: | Computer 2012 Project Report 2012 Computer Project Report Project Report 12MCE 12MCEC 12MCEC11 |
Issue Date: | 1-Jun-2014 |
Publisher: | Institute of Technology |
Series/Report no.: | 12MCEC11; |
Abstract: | Classification of web pages is essential for improving the quality of web search, focused crawling, development of web directories like Yahoo, ODP etc. This paper compares various classification techniques for the task of web page classification. The classification techniques compared include k nearest neighbours (KNN), Naive Bayes (NB), support vector machine (SVM), classification and regression trees (CART) random forest (RF) and particle swarm optimization (PSO).Impact of using different representations of web pages is also studied. The different representations of the web pages that are used comprise Boolean, bag-of-words and term frequency and inverse document frequency (TFIDF). Experiments are performed using WebKB and R8 datasets. Accuracy and f-measure are used as the evaluation measures. Impact of feature selection on the accuracy of the classifier is moreover demonstrated. |
URI: | http://hdl.handle.net/123456789/4848 |
Appears in Collections: | Dissertation, CE |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
12MCEC11.pdf | 12MCEC11 | 640.75 kB | Adobe PDF | ![]() View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.