Please use this identifier to cite or link to this item: http://10.1.7.192:80/jspui/handle/123456789/9349
Title: Machine Learning Techniques for Healthcare 4.0 Framework
Authors: Bhowmick, Preeti
Keywords: EC 2018
Project Report 2018
EC Project Report
EC (Communication)
Communication
Communication 2018
18MECC
18MECC04
Issue Date: 1-Jun-2020
Publisher: Institute of Technology
Series/Report no.: 18MECC04;
Abstract: As per the World Health Organization (WHO), Cardio Vascular Diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year. This problem is a specific challenge to India, because of the heavy distribution of doctors skewed towards large metros, poor quality of healthcare services, overburdened healthcare staff, and poor infrastructure. With the CVDs on a rise among people, it becomes necessary to predict it at an early stage and control it with the use of medications. Machine Learning techniques can be one way for predicting whether an individual has a CVD or not. This thesis aims to study which Machine Learning algorithm will be most suitable for predicting whether an individual has a CVD or not. The algorithms most suitable for this problem statement are Supervised Classification algorithms as the output is required in binary format where 0 will suggest the individual does not have CVD and 1 will suggest the individual has CVD. Supervised classification algorithms in the scope of this thesis include Logistic Regression, k-Nearest Neighbor (kNN), Decision Tree, Random Forest, Support Vector Machine and Naive Bayes. The first part of this thesis includes in-depth understanding of Supervised Classification algorithms, implementing them, analyzing and comparing the results to find the one which best suits the problem statement. The algorithms are implemented in Python using the SPYDER Integrated Development Environment (IDE) provided by the Anaconda Distribution Platform. The data set on which the algorithms are implemented is the Framingham Heart Study data set. Framingham Heart Study data set was provided by National Heart, Lung and Blood Institute of USA and was found suitable for this thesis because of the strong study and analysis carried out since 1948 in finding the risk factors for CVDs. The algorithms were compared on the basis of evaluation parameters like Confusion Matrix, Accuracy, Error Rate, Specificity, Sensitivity, Precision, False Positive Rate, F1 Score and ROC-AUC value. The results show that Random Forest which is an ensemble algorithm improved the performance of the model as compared to Decision Tree algorithm. Hyperparameter tuning was done to improve the results of kNN and Decision Tree algorithm. By hyperparameter tuning in kNN, it was found that for k=3 and using Chebyshev distance metric, the number of True Positive cases increased from 20 to 28. By hyperparameter tuning in Decision Tree, it was found that for m=14 and using GINI Index method, the number of True Positive cases increased from 27 to 38. By analyzing the Accuracy and Confusion Matrix of all algorithms, it can be concluded that, if the data set is skewed to one class, a good approach is to consider the Confusion Matrix along with the Accuracy. The results show that Naive Bayes algorithm outperformed Logistic Regression, k-Nearest Neighbour, Decision Tree, Random Forest and Support Vector Machine by predicting 42 cases of True Positives. The second part of the thesis included deployment of Naive Bayes model as a website service. For that the Naïve Bayes model was first deployed into a website application using Flask web framework (development environment). The Flask application takes input details from the user as per the input fields of the data set and predicts the output based on Naive Bayes model in the backend. The Flask application is then deployed to Microsoft Azure cloud (production environment) to make the website service available to public, for taking primary examination, to know, if the individual is at the risk of developing a CVD. Additional information was also added in the website like “What is CVD?”, “Symptoms of Heart attack”, “Risk factors and Prevention of CVD” and “WHO Factsheet on CVD”.
URI: http://10.1.7.192:80/jspui/handle/123456789/9349
Appears in Collections:Dissertation, EC (Communication)

Files in This Item:
File Description SizeFormat 
18MECC04.pdf18MECC0411.87 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.