Please use this identifier to cite or link to this item: http://10.1.7.192:80/jspui/handle/123456789/8345
Title: Design and Development of Privacy Preserving Techniques for Data Stream Mining
Authors: Solanki, Pareshkumar Mahendrabhai
Keywords: Theses
Computer Theses
Theses IT
Dr. Sanjay Garg
11EXTPHDE58
ITFCE027
TT000070
Issue Date: 2018
Publisher: Institute of Technology
Series/Report no.: TT000070;
Abstract: Data mining is the crucial field of pulling out information from bulky dataset with diverse application areas such as healthcare, banking and financial, telecommunication, shopping records, personal data and so on. These applications frequently produce huge volume of data which is stored statically and dynamically in the available network. The mined statistics can be in the form of clusters, patterns, rules and classification. Distribution of such data is demonstrated to be advantageous for data mining application. This dataset frequently encompasses classifiable information individually and consequently freeing such data may result in privacy breaches. Preserving privacy while delivering data is a fundamental study area in data security and also it is a major issue in delivering individual exact sensitive information. Efficient preservation of data proprietor’s privacy is a crucial issue while broadcasting the data for analysis purpose. As per our knowledge, dataset is an essential asset for industry in order to take a decision by examining it. In order to distribute the data along side preserving privacy, the data proprietor must come up with a result which accomplishes the double goal of privacy preservation as well as accuracy of data mining task, mostly clustering and classification. Data mining can be valuable in many applications, but due to insufficient protection the data may be abused for other goals. It is essential to prevent revealing of not only the individual confidential information but also the critical knowledge. Generally, data proprietors do not find it safe to publish datasets for mining purpose because of their worry that releasing of data may compromise an individual’s private information. Perturbation and Anonymizing datasets before releasing overcomes such a fear as it guarantees secrecy of personal information. But, protecting personal information and achieving mining results as close as that of with original datasets poses great challenges. The Proposed research work tries to find out solutions for this growing concern. Several algorithms have been proposed that understand the characteristics of the dataset and perturb either sensitive attribute values or keep sensitive attribute’s values unchanged and anonymized quasi-identifier’s values. Various data perturbation and anonymization based algorithms proposed so far have focused mainly on static data and very few are on data streams. Heuristic based data perturbation has been proposed where privacy has been maximized through computed tuple values for each instance and user define sensitive drift with minimum information loss. Proposed algorithm has been evaluated to measure information gain and to achieve privacy. Many datasets contain multiple sensitive attributes so, there is a need to provide perturbation and anonymization to preserve the privacy. Based on this concern, the research work is also carried out for detail analysis of data anonymization alternatives and proposed heuristic based PRIVACYearn based multi-iterative kanonymization and perturbation approach in data stream. This approach also proposes to find out the best fit generalization that leads to minimum loss of information and better protection of individual’s privacy. Finally, we have proposed heuristic based geometric data perturbation in data stream. Developed algorithms for data perturbation and anonymization have been tested using wide range of standard datasets over frequently used mining algorithms like, K-Mean clustering and Naive Bayes classification.
URI: http://10.1.7.192:80/jspui/handle/123456789/8345
Appears in Collections:Ph.D. Research Reports

Files in This Item:
File Description SizeFormat 
TT000070.pdfTT0000702.7 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.