Privacy-Preserving Classification Of Horizontally Partitioned Data Streams

Radhika, Kotecha

Please use this identifier to cite or link to this item: http://10.1.7.192:80/jspui/handle/123456789/8348

Title:	Privacy-Preserving Classification Of Horizontally Partitioned Data Streams
Authors:	Radhika, Kotecha
Keywords:	Theses Computer Theses Theses IT Dr. Sanjay Garg 12EXTPHDE88 TT000050 ITFCE027 Classification Data Streams Output-Privacy-Preservation Horizontal Partitioning Anonymization Genetic Programming Ensemble Learning
Issue Date:	2017
Publisher:	Institute of Technology
Series/Report no.:	TT000050;
Abstract:	With technological advancements, several real-world applications generate massive amount of data. Such data, also known as data streams, are continuously arriving at an unprecedented rate and contain valuable knowledge. Due to their effectiveness in sup- porting decision-making processes and knowledge discovery, the data mining techniques have attracted considerable interest and attention of research communities. Extracting patterns from such voluminous data streams requires development of new algorithms or modifications in the traditional data mining algorithms. In recent years, data stream classification has been an active area of research in data stream mining and is the focus of this work. It is apparent that the power of these mining techniques may breach the privacy of individuals to whom the data refers and the field of privacy-preserving data mining (PPDM) has emerged in response to this issue. Specifically, PPDM techniques aim to perform a trade-off between efficiency in data mining and exposure (direct or via inference) of sensitive information in the original data. Further, not only the original data but also the data mining output can lead to disclosure of sensitive information. But when the data mining output reveals no private patterns, it can be reliably claimed that the privacy of underlying data is protected. Specifically, when the final goal is to release the output of data mining (a model), its effectiveness in preserving privacy is of the utmost concern. This research work focuses on preserving output-privacy, that is, on preventing inference using the released classifier. But, data stream classification and privacy-preservation are two con icting goals be- cause the data stream classifier should be ready to predict at any point and has memory limitations whereas privacy-preserving methods may require multiple scans over the data. Hence, the crucial issue of privacy-preserving data stream classification (PPDSC) is emerg- ing as a novel research area. This work proposes a systematic method named Diverse and Anonymized HOefinding Tree (DAHOT) to address this issue. The algorithm uses Hoe find- ing tree as a base classifier for classifying data streams and a variant of k-anonymity as well as l-diversity principles to preserve the privacy of the output classifier. Further, advancement in networking technologies has triggered mining of distributed data. Different organizations (data holders) want to undertake a joint data mining task to obtain certain global patterns. Such collaboration is essential because of the mutual benefits it brings. However, free sharing of data is restricted due to privacy and se- curity concerns, leading to the need of privacy-preserving distributed data mining The work focuses on horizontally partitioned (homogeneously distributed) data as numerous applications fall under this data model. Since the work presented in this thesis targets classification of data streams, the emerged problem is framed as privacy-preserving classification of horizontally partitioned data streams. Several applications from diverse domains like credit-card fraud detection, disease outbreak detection, loan approval, etc. are examples of privacy-preserving classification of horizontally partitioned data streams. As a solution, a novel framework is proposed in this thesis, where each participating site (data holder) induces a DAHOT classiffier and third-party combines these local classifiers to form a global classifier. No private information is to be disclosed to the merger site too. Within this framework, a method named DAHOT-GPeCT is proposed that uses Genetic Programming (GP) for induction of a global classifier at the merger site from the local DAHOT classifiers induced by participating parties. Furthermore, a method named DAHOT-GPeCT-Ensemble, which is an extension of DAHOT-GPeCT is proposed. DAHOT-GPeCT-Ensemble uses a combination of GP and Ensemble learning to obtain a global privacy-preserving classiffier from horizontally partitioned data streams.
URI:	http://10.1.7.192:80/jspui/handle/123456789/8348
Appears in Collections:	Ph.D. Research Reports

Files in This Item:

File	Description	Size	Format
TT000050.pdf	TT000050	3.11 MB	Adobe PDF	View/Open

Show full item record

IR @ Nirma University