Please use this identifier to cite or link to this item: http://10.1.7.192:80/jspui/handle/123456789/8384
Title: Hashtag Clustering using Genetic Algorithm
Authors: Gambhava, Nileshkumar
Keywords: Theses
Computer Theses
Theses IT
Dr. K. Kotecha
11EXTPHDE78
TT000079
ITDIR002
Issue Date: 2018
Publisher: Institute of Technology
Series/Report no.: TT000079;
Abstract: It's hard to believe that vithin a decade, social media platforms have revolutionized the world: starting from the way we interact with our friends, family; and acquaintances to the case of doing business and important role in politics and social issues. T,vitter is one of the most influencing micro blogging platforms in this revolutionary era of social media. Communication through social media net,~rorks has facilitated the fastest and richest platform for information spreading and opinion sharing. Intelligent exploration of this information can generate valuable records as they represent the essence of real-world societal aspects. Tweets, i.e. short messages posted by the user to interact. with the social world can be used to predict the trends, timeline generation; community detection, etc ... Extracting useful information from tweets is challenging because of the huge volume of short unstructured, noisy tweets. A hash tag, a word or a phrase preceded by a hash sign ( #), a.re more relevant to extract information. Grouping of similar hasht.ags may play a vital role in extracting the information from the clumsy world of the social media because of several reasons. First, users use different hashtags for the same topic e.g. #declnct, #decplearning, #dl; etc. for deep learning. Second, users use multiple similar hashta.gs in a tweet to emphasize the tweet in a broader or multiple similar domains, like # machinclcarning #ai #neurnlnets etc. Third, a hashtag can have multiple meaning like #AI may indicate "Artificial Intelligence" or "Adobe Illustrator" or "Area of Interest" etc. The problem of grouping similar hashtags is none other than classical clustering problem. Hashtag clustering is one of the important techniques to extract the information by categorizing hveets in different clusters. Hashtag clustering is a challenging task due to two major reasons. First, the number of clusters is not known in advance; second, domain-related information is not available. Genetic Algorithm (GA) is an adaptive heuristic search algorithm that mimics the evolutionary process of natural selection and follows the principle of survival of the fittest. vVe propose a model for hashtag clustering using GA that addresses the mentioned issues. To the best of our knowledge, this is the first attempt to cluster hashtags using GA. \\Te have proposed novel heuristic for initial population generation to generate candidate solutions from different regions of search space. \Ve have outlined GA framev>"ork to cluster hashtags and experimented the different set of genetic parameters. \Ve have tested our model on a large set of tweets dmvnloadcd from popular 76 Indian media twitter accounts. The results obtained by our model are compared using crowdsourcing method as there is no other source available to validate the quality of the results. Our results arc superior compared to crowdsourcing results. Also, the users' validation for the resultant clusters proves the accuracy of the proposed model. V,!e demonstrated the applicability of ha.shtag clusters generated usmg our model for an application of an event timcline generation. \V c propose a novel formula, named as prornJncnccRank of a hvcct., to select highly impacting tweets for generating an event timclinc using hashtag clusters. The proposed formula is evaluated using heuristics to generate the timeline for three major events found in tweets dataset. The timeline generated shows the efficiency of our approach in terms of considering the substantial diversity, relevance, and effectiveness of the proposed prorninenccRank based heuristic.
URI: http://10.1.7.192:80/jspui/handle/123456789/8384
Appears in Collections:Ph.D. Research Reports

Files in This Item:
File Description SizeFormat 
TT000079.pdfTT00007915.94 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.