The Computational Biology Institute (IBC) aims at the development of innovative methods and software to analyze, integrate and contextualize large-scale biological data in the fields of health, agronomy and environment. Scalable computational solutions able to handle this ever-increasing volume of data constitute the present and future bottleneck that may limit their economic impact. Several branches of research will thus be combined: algorithmics (combinatorial, numerical, highly parallel, stochastic), modeling (discrete, qualitative, quantitative, probabilistic), and data management and information retrieval (integration, workflows, cloud). Concepts and tools will be validated using key applications in fundamental biology (transcriptomics, structure and function of proteins, development and morphogenesis), health (pathogens, cancer, stem cells), agronomy (plant genomics, tropical agriculture), and environment (population dynamic, biodiversity).The project is divided into five complementary work-packages that include the main aspects of processing biological data on a large scale:
- WP1-HTS: Methods for high-throughput sequencing analysis
- WP2-Evolution: Scaling-up evolutionary analyses
- WP3-Annotation: Structural and functional annotation of proteomes
- WP4-Imaging: Integrating cell and tissue imaging with Omics data
- WP5-Databases: Biological data and knowledge integration
IBC is a multidisciplinary project center supported for five years (2012-2017) by the French "Investissements d'Avenir" Call and its trustees. IBC currently involves 65 permanent researchers with broad multidisciplinary spectrum, based in one company and fourteen laboratories of Montpellier. IBC should become a privileged meeting place for computational biology and bioinformatics researchers, not only bringing together those involved with the original project, but also a large community of academic and industry researchers on regional, national and international levels. IBC activity will invite world-class researchers to collaborate with, organize scientific events, train young researchers, and promote results and exchange information with industrial partners.
Institute for Integrative Biology of the Cell
Universite Paris-Sud - CNRS -CEA
Orsay, France Computational pipelines for NGS data analysis involve mutiple hypotheses and simplifications leading to an important loss of information. For instance, a major limiting factor is the mapping step where NGS reads are aligned to a reference genome or transcriptome. In RNA-seq analysis, relying on a reference transcriptome amounts to ignoring novel genes, alternative transcripts and transcripts from repeats or with high levels of mutation or editing. Hundreds of dedicated software have been developed to bypass these limitations and retrieve specific event types, with highly diverging results.
We have developed a method for RNA-seq data analysis, DE-kupl (1), in which NGS data is analysed at the level of raw sequence using k-mers (i.e. subsequences of length k, with typically k=31) followed by differential expression analysis. Only k-mers that are differentially represented between two sets of libraries are extracted and analyzed. Therefore all biological variation present in the original NGS dataset is theroretically collected, with no prior hypothesis about their origin.
We will show how DE-kupl can be applied to various experimental settings and present our plans for future developments, including application to the discovery of novel biomarkers based on cliniciallly annotated DNA-seq or RNA-seq data.
(1) Audoux J, Philippe N, Chikhi R, Salson M, Gallopin M, Gabriel M, Le Coz J, Commes T, Gautheret D. (2017) DE-kupl: Exhaustive capture of biological variation in RNA-seq data through k-mer decomposition. Genome Biol. 18: 243.
IBC is a project "Investissements d'Avenir" managed by University of Montpellier