Please use this identifier to cite or link to this item: http://10.1.7.192:80/jspui/handle/123456789/4848
Title: Web Page Classification
Authors: Joshi, Rutu
Keywords: Computer 2012
Project Report 2012
Computer Project Report
Project Report
12MCE
12MCEC
12MCEC11
Issue Date: 1-Jun-2014
Publisher: Institute of Technology
Series/Report no.: 12MCEC11;
Abstract: Classification of web pages is essential for improving the quality of web search, focused crawling, development of web directories like Yahoo, ODP etc. This paper compares various classification techniques for the task of web page classification. The classification techniques compared include k nearest neighbours (KNN), Naive Bayes (NB), support vector machine (SVM), classification and regression trees (CART) random forest (RF) and particle swarm optimization (PSO).Impact of using different representations of web pages is also studied. The different representations of the web pages that are used comprise Boolean, bag-of-words and term frequency and inverse document frequency (TFIDF). Experiments are performed using WebKB and R8 datasets. Accuracy and f-measure are used as the evaluation measures. Impact of feature selection on the accuracy of the classifier is moreover demonstrated.
URI: http://hdl.handle.net/123456789/4848
Appears in Collections:Dissertation, CE

Files in This Item:
File Description SizeFormat 
12MCEC11.pdf12MCEC11640.75 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.