The comparison of genetic data (notably at the genomic scale) under an evolutionary perspective naturally follows its acquisition and first treatment (WP1‐HTS). It constitutes the basis for: inferring gene function (in connection with WP3‐ Annotation); reconstructing the history of species/populations; elucidating the genetic basis of adaptation (in connection with WP5‐Databases); understanding the dynamics of molecular evolution.
The state of the art relies on the use of probabilistic modeling and advanced algorithmic techniques (e.g. heuristics with performance guarantees, stochastic approaches). Recent years have seen the development of likelihood‐based inferences for population genetics data, using algorithms applicable on small datasets as have been typical of the field in the last twenty years. However, these algorithms are too slow to handle hundreds or more loci.
Several more descriptive approaches have been reconsidered to tackle this problem, as well as methods based on refined summaries of the data. An alternative methodology is to obtain distributions of summary statistics by simulation of biological processes for different parameter values. However, even the more recent adaptive versions of this methodology are very far from being able to analyze huge data sets. The challenge is then to scale up these tools to the analysis of modern genomic data, both at the inter‐specific level (e.g. the 10,000 vertebrate genomes project2) and at the intra‐specific level (e.g. the 1000 human genomes project3).
Another problem with the current approaches is that they largely do not exploit the potential synergy between phylogenetics and population genetics, despite their significant overlaps both in the scope of the research and in the methodology employed.