Please use this identifier to cite or link to this item: http://10.1.7.192:80/jspui/handle/123456789/11136
Title: Enhancing Voice Liveness Detection in Speaker Veri cation systems using Machine Learning approach
Authors: Mankad, Sapan Hareshbhai
Keywords: Theses
Computer Theses
Theses Computer
Dr Sanjay Garg
15EXTPHDE143
ITFCE027
ITFIT008
TT000111
Issue Date: Apr-2021
Publisher: Institute of Technology
Series/Report no.: TT000111;
Abstract: Biometric authentication has recently replaced conventional authentication mecha- nisms of \holding" or \remembering" some evidence to claim one's identity. With the advent of these biometric enabled operations, the issue of \forgotten password" or \theft" has been resolved, but at the same time, there are some threats on these bio- metric systems. Since last decade, there has been an increasing interest in developing voice controlled biometric systems, often termed as Automatic Speaker Veri cation (ASV) systems. Voice Biometrics systems are vulnerable to four common spoo ng attacks, namely, impersonation (mimicry), replay (playback), voice conversion and speech synthesis. Among these, playback spoo ng attacks are the easiest to implement from the per- spective of the attacker. Thus, being low e ort attacks, these attacks present the maximum threat to voice biometric systems. This thesis addresses this problem and investigates approaches to detect such attacks. Thus, the goal of this thesis is to supplement the state-of-the-art in playback spoo ng detection in automatic speaker veri cation systems by presenting countermeasures and antispoo ng approaches. In this work, we have attempted to address security issues in voice biometrics systems. These systems are most susceptible to playback spoo ng attacks due to availability of smartphones to any end user. Due to this, ASV systems are not yet commercialized signi cantly. We have used ASVspoof 2017 dataset for implementa- tion and presented our ndings for replay spoo ng detection using fusion of short-term spectral features. Results using inverted Mel frequency cepstral coe cients (IMFCC) are promising and point to directions for further research with the help of these po- tential features. ASV systems are the most susceptible to replay spoo ng attacks. Liveness of an input audio sample has to be ensured to counter such attacks. Experiments are conducted with standalone and fused feature representation of audio in this thesis to assess the performance of the antispoo ng systems using spoo ng detection equal error rate (EER). Further, the impact of proposed static IMFCC based system un- der mismatched conditions by training and testing it in di erent environments (with di erent background conditions) alongwith other systems is evaluated. Results show that the proposed system outperforms other systems used in this study in the exper- iments. Motivated from promising results of IMFCCs, a deep investigation into high- frequency regions of the audio signals on features derived from intrinsic mode func- tions (IMF) obtained through Empirical Mode Decomposition (EMD) is carried out. Experiments show promising results for replay spoo ng detection task on benchmark corpus. An emphasis on the rst IMF based representation serves as a preprocessing technique to retrieve high frequency components of an underlying signal. In order to examine the role of recording instruments in detecting recorded speech, a novel multiclass classi cation based framework using transfer learning has been proposed. A comparison of spoo ng detection system as binary vs. multiclass classi cation task has been performed. Data augmentation has been attempted on audio data to supplement the available corpus to understand the impact of arti cially generated data samples on playback spoo ng detection task. This has been accom- plished using conventional oversampling technique and modern generative adversarial networks based technique.
URI: http://10.1.7.192:80/jspui/handle/123456789/11136
Appears in Collections:Ph.D. Research Reports

Files in This Item:
File Description SizeFormat 
TT000111.pdfTT00011129.78 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.