Clustering and Time Series Prediction for Spatio-Temporal Geographic Dataset

Agrawal, Kedar Prasad

Please use this identifier to cite or link to this item: http://10.1.7.192:80/jspui/handle/123456789/7189

Full metadata record

DC Field	Value	Language
dc.contributor.author	Agrawal, Kedar Prasad	-
dc.date.accessioned	2016-11-10T08:17:04Z	-
dc.date.available	2016-11-10T08:17:04Z	-
dc.date.issued	2016-07-26	-
dc.identifier.uri	http://hdl.handle.net/123456789/7189	-
dc.description.abstract	Owing to the generation of petabytes of data (may be of type classical, spatial, temporal or hybrid) on daily basis from different sources, work is required to be carried out such that these voluminous amount of data can be utilized meaningfully using relevant data mining tasks. When it is required to deal with Spatio-Temporal dataset, data mining related tasks becomes more challenging specially in case of obtaining arbitrary shaped clusters of good quality and reliable forecasting. Based on reliable forecasting, some anticipatory action like Land Usage, availability of good and healthy crops or no crops, good rains, flood or detecting drought areas etc. can be taken which is beneficial to masses. In clustering, issues like detection of arbitrary shaped clusters, handling high dimensional data, independence from order of data input, interpretability, ability to deal with nested clusters, scalability etc. and while forecasting, issues like handling non-stationarity of time series, non-linear domain, selection and tuning of parameters of existing or newly developed technique(s) needs to be addressed with utmost care. Spatio-Temporal Data Mining (STDM) is a process of the extraction of implicit knowledge, spatial and temporal relationships, or other patterns not explicitly stored in spatio-temporal databases. As data is growing not only from static view point, but they also evolve spatially and temporally which is dynamic in nature that is the reason why this field is now becoming very important field of research. In addition Spatio-Temporal (ST) -Data tends to be highly auto-correlated, because of which assumptions which are taken in Gaussian distribution models fails, as in Gaussian Distribution, an assumption of independence is taken into consideration, which is not the case with ST Data. Vital issues in spatio temporal clustering technique for Earth observation data is to obtain good quality arbitrarily shaped clusters and its validation. The presented research work addresses these issues and presents their solutions. In order to achieve said objective, an attempt has been made to develop a clustering algorithm named as “Spatio-Temporal - Ordering Points to Identify Clustering Structure (ST-OPTICS)” which is modified version of existing density based technique OPTICS. Experimental work carried out is analyzed and found that quality of clusters obtained and run time efficiency are much better than existing technique i.e. ST DBSCAN. An attempt has been made to hybridize the results generated by ST-OPTICS with agglomerative approach to improve the visualization and the interpretation of obtained clusters. Validations of the obtained results have also been performed by visualization and various performance indices. Results shows performance improvement of ST-OPTICS clustering technique. In order to improve the accuracy of prediction, fusion of statistical and machine learning models have been done. Statistical model like Integration of Auto Regressive (AR) and Moving Average (MA) is capable to handle non-stationary time series but it can deal with only single time series. While machine learning approach (i.e. Support Vector Regression (SVR)) can handle dependency among different time series along with non-linear separable domains, however it cannot incorporate the past behavior of time-series. This led to combine these two approaches for improving accuracy of time series prediction, where focus has been given on minimization of forecast error using residuals, which helps to take appropriate action for near future. Keeping in view objective, hybridization of Auto Regressive Integrated Moving Average (ARIMA) with SVR models has been done. In order to reduce number of area wise models and reduction in time complexity for tuning different parameters, emphasis has been laid down on handling issues related to scalability by taking suitable representative samples from each sub-areas. Results obtained shows that the performance of proposed hybrid model is better than individual models.	en_US
dc.publisher	Institute of Technology	en_US
dc.relation.ispartofseries	TT000036;	-
dc.subject	Theses	en_US
dc.subject	Computer Theses	en_US
dc.subject	Theses IT	en_US
dc.subject	Dr. Sanjay Garg	en_US
dc.subject	11EXTPHDE71	en_US
dc.subject	TT000036	en_US
dc.title	Clustering and Time Series Prediction for Spatio-Temporal Geographic Dataset	en_US
dc.type	Thesis	en_US
Appears in Collections:	Ph.D. Research Reports

Files in This Item:

File	Description	Size	Format
TT000036.pdf	TT000036	4.86 MB	Adobe PDF	View/Open

Show simple item record

IR @ Nirma University