Please use this identifier to cite or link to this item:
http://10.1.7.192:80/jspui/handle/123456789/195
Title: | Model for Adaptive Mail Filtration and Classification with Customization |
Authors: | Lodha, Shweta |
Keywords: | Computer 2006 Project Report 2006 Computer Project Report Project Report 06MCE 06MCE008 |
Issue Date: | 1-Jun-2008 |
Publisher: | Institute of Technology |
Series/Report no.: | 06MCE008 |
Abstract: | The goal of the thesis is to construct an email filter/classifier utilizing several methods. Therefore, it is important to understand the past and present of both incoming emails, spamming and spam filtering. Based on the best-practice solutions of present day, filter is designed and constructed. The thesis has three main sections: The goal of the first section is to introduce the fact and phenomenon of junk e-mail also called spam. The first section covers what spam is and also gives a brief overview about the history of spam and spam filtering. This section is followed by the explanation why spam exists and what it causes. Furthermore, the different kinds of spam are introduced and finally the spam filtering methods are reviewed. The second section contains the categorization of emails in various categories, based on classification technique named Vector Space Model. Then a description of a vector based algorithm is given. In this algorithm, features are created from individual sentences in the subject and body of a message by forming all possible word pairings from a sentence. Weights are assigned to the features based on the strength of their predictive capabilities for several categories. The predictive capabilities are estimated by the frequency of occurrence of features in collection (emails) as well as application of heuristic rules. The same model can be used for both filtration and classification purpose with varying thresholds. The standard threshold considered here is 0.03. The third section introduces implementation details for modeling and analyzing our email filtration and classification using JAVA. It is followed by self-constructed libraries and methods to fetch emails from server and filter them, which shows the relevance of the filter. The result of the analysis gives the answer regarding classification technique, considering both the achieved accuracy rate in filtering and the computational resources. |
URI: | http://hdl.handle.net/123456789/195 |
Appears in Collections: | Dissertation, CE |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
06MCE008.pdf | 06MCE008 | 722.31 kB | Adobe PDF | ![]() View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.