Topic: Handling Missing Data in Spatial Big Data Warehouse: Application to agricultural data

Keywords: Big Data; Spatial Data Warehouse; Data Mining

 

Abstract

Nowadays more and more spatial data are available from sensors networks, Web data, open data, etc (spatial big data). Simulation models require more and more data for example for calibration, but in this context several quality issues arise that have important drawbacks on the success of geo-business intelligence projects. In particular, missing values are usually present in these huge quantities of spatial data limiting analysis capabilities for spatial decision-makers the goal of this PhD thesis is to provide some new methods issued from data mining, statistics and linear programming to handle missing data. These methods should be scalable and time performing in order to being be applied to big data. Parallel and distributed processing of our approach will be developed in the PhD thesis. Our contribution will be validated with various agricultural data related to resource management. Technologies such as SPARK/SHARK [13] or MapReduce [14] will be targeted in the thesis. The CRI-Auvergne infrastructure will be needed for the development of our approach.