Workshop Data Science @ IBC 2016

June 15, 2016 from 2pm to 5pm : LIRMM Bat5 Room 1/124 (see map)

Organized by: This email address is being protected from spambots. You need JavaScript enabled to view it.

13h30 Café d'accueil

14h Introduction :
Esther Pacitti
Equipe Zenith, Univ. Montpellier, Inria, LIRMM

Data Science: opportunities and risks
Patrick Valduriez
Equipe Zenith, Inria, Univ. Montpellier, LIRMM

Data has been quoted as the new oil, to reflect that big data can be turned into high-value information and new knowledge. Although data analysis has been around for a while, starting with statistics and evolving lately into exploratory data analysis, data mining and business intelligence, the new dimensions of big data (volume, variety, velocity, etc.) make it very hard to process and analyze data, and derive good conclusions. To address this grand challenge, data science is emerging as a new science that combines computer science, statistics and machine learning, visualization and human-computer interactions to collect, clean, integrate, analyze and visualize big data. The ultimate goal is to create new data products and services, as well as training legions of data scientists. In this talk, I will introduce data science, including big data and cloud technologies. I will also illustrate the main opportunities and risks, in particular by telling my favorite stories about the good, the bad and the ugly.

Fast data analytics for time series and other ordered data
Dennis Shasha
New York University and Inria (int. chair in Zenith)

The relational model is based on a single data type and a few operations: unordered tables which can be selected, projected, joined, and aggregated. This model is in fact unnecessary for simplicity and needlessly limits the expressive power, making it difficult to express query on ordered data such as time series data and other sequence data.
This talk presents a language for expressing ordered queries, optimization techniques and performance results. The talk goes on to present experiments comparing the system against other popular data analytic systems including Sybase IQ, Python's popular Pandas library and MonetDB using a variety of benchmarks including the ones that those systems use themselves. On the same hardware, our system is faster.

Discussion