Enhancing Performance of Pattern Mining Methods

Patel, Sanjay

Please use this identifier to cite or link to this item: http://10.1.7.192:80/jspui/handle/123456789/7197

Full metadata record

DC Field	Value	Language
dc.contributor.author	Patel, Sanjay	-
dc.date.accessioned	2016-11-18T09:06:16Z	-
dc.date.available	2016-11-18T09:06:16Z	-
dc.date.issued	2015-10	-
dc.identifier.uri	http://hdl.handle.net/123456789/7197	-
dc.description.abstract	In the customer driven market these days, there are several options for a person to choose from. The traders in such a market try to attract the customers through various schemes and incentives. Market is powered by innumerable product ranges, quality offers, shops and shopping malls, door-to-door publicity and tele-shopping, publicity through hoardings and pamphlets etc. In such a heterogeneous market scenario, a vendor tries to attract customers using various business tactics. Generally, schemes are introduced for discount on products and buy-one-get-one free offers. At times both schemes are merged. So, it would be just appropriate to correlate the requirements of the consumer and provide a suitable catalog of schemes to achieve win-win situation for consumer and the vendor. A progression of challenges have recently appeared in the data mining field, generated by the immediate shift in status from academic to applied science and the ensuing needs of real-life applications like internet search, social networks etc. With the prompt development in information technology, storage capacity, data collection capacity, networking, data mining applications are now quickly expanding in all engineering domains and sciences, including nanotechnology and biomedical sciences. Frequent pattern mining plays an important role in many data mining tasks, which is a subpart of association rule mining. Association rule mining is a two step process. The first part is to find the frequent patterns and the second part is to generate the rules from the patterns. The second part is straight forward, so most of the research community has concentrated on the first part i.e. to generate frequent patterns from the huge amount of data. Efficient algorithms to discover frequent patterns are crucial in data mining research. The efficiency of the association rule mining is exclusively dependent on the frequent pattern mining methods. Market basket analysis is a well known example of association rule mining. A correlation is established among the various commodities purchased by different customers for offering best discounts for mutual benefit. Several data structures such as two dimensional arrays, tree, tries and directed graph have been used in the frequent pattern mining methods. Candidate-generationand- test approach adopted by apriori and other similar methods which is a costlier and time consuming. Pattern growth methods adopt a divide and conquer approach to decompose both the mining tasks and databases. More space is required to complete the mining using this approach. CATS tree recover the idea of FP-Tree to improve the storage compression and allow frequent pattern mining without generation of candidate itemsets. The tree construction is computationally expensive, because it searches for common items and tries to merge the new transaction into an existing tree path when each transaction is added. Also, the algorithm needs to traverse both upwards and downwards to include frequent items. CanTree algorithm allows mining in a single database scan. The items are arranged in lexicographic or alphabetic order. The efficiency of the algorithm is totally dependent on the order of items available in the transactions. CP-Tree is also a tree based algorithm, which is working on the insertion and restructuring phase. Directed graph based mechanisms are also available for frequent pattern mining, which are not suitable especially for frequent pattern mining. Swarm intelligence techniques have shown remarkable performance to solve optimization problems. Ant colony optimization (ACO) and Artificial bee colony(ABC) are swarm intelligence techniques. From the literature, it is observed that ACO is used in classification, clustering and association rule mining, but not specially for frequent pattern mining. Undirected graph is the basic requirement for ant colony optimization. The first system deals with frequent pattern mining and incremental frequent pattern mining using undirected graph. In this system, the items of the transaction are represented as nodes of the graph, and the edges are represented using the transaction number. The number of parameters on the edge is the major problem of the first system. Prime number based framework is proposed to resolve the same in the second system. There are two parameters on the edge. The first one is the Frequent itemset Identifier (FID) and the second one is the total count(C)between two nodes. Major problem with this technique is to find the GCD (Greatest Common Divisor) and factors of the large number. The concept of ant colony optimization is applied on the prime number based framework to reduce the complexity of the system. Still the problems remain same for huge database size. Hence, parallel structure, analogous to the artificial bee colony architecture is proposed. The hadoop map-reduce framework is used for the same. The parallel structure is efficient as compared to the serial structure ignoring the I/O and communication delay. Structure of the FP-Growth method is modified through the extension of the FP-Growth method. The structure is different than all the structure proposed for the FP-Growth method. The above proposed approaches, models, techniques and enhancements were assessed with both artificial and real-world data sets and the results confirmed that improved performance on pattern mining task was completed.	en_US
dc.publisher	Institute of Technology	en_US
dc.relation.ispartofseries	TT000038;	-
dc.subject	Theses	en_US
dc.subject	Computer Theses	en_US
dc.subject	Theses IT	en_US
dc.subject	Dr. K. Kotecha	en_US
dc.subject	09EXTPHDE24	en_US
dc.subject	TT000038	en_US
dc.title	Enhancing Performance of Pattern Mining Methods	en_US
dc.type	Thesis	en_US
Appears in Collections:	Ph.D. Research Reports

Files in This Item:

File	Description	Size	Format
TT000038.pdf	TT000038	4.76 MB	Adobe PDF	View/Open

Show simple item record

IR @ Nirma University