Please use this identifier to cite or link to this item: http://10.1.7.192:80/jspui/handle/123456789/8348
Full metadata record
DC FieldValueLanguage
dc.contributor.authorRadhika, Kotecha-
dc.date.accessioned2019-05-09T09:55:58Z-
dc.date.available2019-05-09T09:55:58Z-
dc.date.issued2017-
dc.identifier.urihttp://10.1.7.192:80/jspui/handle/123456789/8348-
dc.description.abstractWith technological advancements, several real-world applications generate massive amount of data. Such data, also known as data streams, are continuously arriving at an unprecedented rate and contain valuable knowledge. Due to their effectiveness in sup- porting decision-making processes and knowledge discovery, the data mining techniques have attracted considerable interest and attention of research communities. Extracting patterns from such voluminous data streams requires development of new algorithms or modifications in the traditional data mining algorithms. In recent years, data stream classification has been an active area of research in data stream mining and is the focus of this work. It is apparent that the power of these mining techniques may breach the privacy of individuals to whom the data refers and the field of privacy-preserving data mining (PPDM) has emerged in response to this issue. Specifically, PPDM techniques aim to perform a trade-off between efficiency in data mining and exposure (direct or via inference) of sensitive information in the original data. Further, not only the original data but also the data mining output can lead to disclosure of sensitive information. But when the data mining output reveals no private patterns, it can be reliably claimed that the privacy of underlying data is protected. Specifically, when the final goal is to release the output of data mining (a model), its effectiveness in preserving privacy is of the utmost concern. This research work focuses on preserving output-privacy, that is, on preventing inference using the released classifier. But, data stream classification and privacy-preservation are two con icting goals be- cause the data stream classifier should be ready to predict at any point and has memory limitations whereas privacy-preserving methods may require multiple scans over the data. Hence, the crucial issue of privacy-preserving data stream classification (PPDSC) is emerg- ing as a novel research area. This work proposes a systematic method named Diverse and Anonymized HOefinding Tree (DAHOT) to address this issue. The algorithm uses Hoe find- ing tree as a base classifier for classifying data streams and a variant of k-anonymity as well as l-diversity principles to preserve the privacy of the output classifier. Further, advancement in networking technologies has triggered mining of distributed data. Different organizations (data holders) want to undertake a joint data mining task to obtain certain global patterns. Such collaboration is essential because of the mutual benefits it brings. However, free sharing of data is restricted due to privacy and se- curity concerns, leading to the need of privacy-preserving distributed data mining The work focuses on horizontally partitioned (homogeneously distributed) data as numerous applications fall under this data model. Since the work presented in this thesis targets classification of data streams, the emerged problem is framed as privacy-preserving classification of horizontally partitioned data streams. Several applications from diverse domains like credit-card fraud detection, disease outbreak detection, loan approval, etc. are examples of privacy-preserving classification of horizontally partitioned data streams. As a solution, a novel framework is proposed in this thesis, where each participating site (data holder) induces a DAHOT classiffier and third-party combines these local classifiers to form a global classifier. No private information is to be disclosed to the merger site too. Within this framework, a method named DAHOT-GPeCT is proposed that uses Genetic Programming (GP) for induction of a global classifier at the merger site from the local DAHOT classifiers induced by participating parties. Furthermore, a method named DAHOT-GPeCT-Ensemble, which is an extension of DAHOT-GPeCT is proposed. DAHOT-GPeCT-Ensemble uses a combination of GP and Ensemble learning to obtain a global privacy-preserving classiffier from horizontally partitioned data streams.en_US
dc.publisherInstitute of Technologyen_US
dc.relation.ispartofseriesTT000050;-
dc.subjectThesesen_US
dc.subjectComputer Thesesen_US
dc.subjectTheses ITen_US
dc.subjectDr. Sanjay Gargen_US
dc.subject12EXTPHDE88en_US
dc.subjectTT000050en_US
dc.subjectITFCE027en_US
dc.subjectClassificationen_US
dc.subjectData Streamsen_US
dc.subjectOutput-Privacy-Preservationen_US
dc.subjectHorizontal Partitioningen_US
dc.subjectAnonymizationen_US
dc.subjectGenetic Programmingen_US
dc.subjectEnsemble Learningen_US
dc.titlePrivacy-Preserving Classification Of Horizontally Partitioned Data Streamsen_US
dc.typeThesisen_US
Appears in Collections:Ph.D. Research Reports

Files in This Item:
File Description SizeFormat 
TT000050.pdfTT0000503.11 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.