Safe Solutions for Gap Filling in Genomic Assemblies

Leena Salmela -  Dpt of Computer Science, University of Helsinki, Finland.

Friday 13th April 2018 - 2 pm - IBC  (Campus St Priest Bat 5 room 2.022)

One of the last steps in a genome assembly project is filling the gaps between consecutive contigs in the scaffolds. The gap filling problem generally asks for an s-t path in an assembly graph whose length matches the gap length estimate. This problem is known to be NP-hard in general. Here we derive a simpler dynamic programming solution than already known, pseudo-polynomial in the maximum value of the input range.

Although several methods have addressed the gap filling problem, only few have focused on strategies for dealing with multiple gap filling solutions and for guaranteeing reliable results. Such strategies include reporting only unique solutions, or exhaustively enumerating all filling solutions and heuristically creating their consensus. We present a new method for reliable gap filling: filling gaps with those sub-paths common to all gap filling solutions.

We implemented our algorithm in a tool called Gap2Seq and compared our exact gap-filling solution experimentally to popular gap-filling tools. Our experiments show that on bacterial and conservative human assemblies we can fill more gaps than previous tools with a similar precision.

See https://www.helsinki.fi/en/computer-science or https://www.cs.helsinki.fi/u/lmsalmel/publications.html

A related seminar will be given on the 6th of April 2018 at 2pm the LIRMM (same building) by Riku Walve also  from Helsinki University. (see http://www.lirmm.fr/recherche/equipes/mab/seminaires-de-bioinformatique).

IBC seminars