A huge number of new genomes of a wide range of species are expected in the near future. Today, the growth of the sequencing data significantly exceeds the growth of capacities to analyze these data. In line with the dramatic growth of this information and urgent needs in new bioinformatics tools, our AXE3 deal with the development of new algorithms and software, data integration and workflows to implement the complex processing chains required to analyze proteome data.
New genome proteins are usually annotated based on homology with better‐characterized proteins of already annotated species using tools such as BLAST, HMMER and INTERPRO suite to provide functional annotations in standardized frameworks (e.g. Gene Ontology). However, when applied to genomes that are phylogenetically distant from classic model organisms, this strategy fails to annotate a large part of the proteins. Especially, this is the case for most human pathogens.
We plan to develop approaches for improving annotation of protein domains. For example, combining the results achieved on all homologues of the same protein will be a strategy to increase the sensitivity of the procedure. Another approach will be to use 3D structure information and molecular modeling to assess the likelihood of dubious domain occurrence. Special tools will also be developed for characterization of regions with non-globular structures (arrays of tandem repeats and intrinsically unstructured regions). Conventional approaches developed for globular domains have limited success when applied to these regions and the existing specialized tools remain highly perfectible. Finally, we plan to develop approaches for functional annotation of proteins that integrate structural information (e.g. protein domains and tandem repeats) with other types of data (e.g. gene expression).