Docente
Referente del progetto
Guido Walter Di Donato ( mail)
Area di ricerca
Architetture dei sistemi di elaborazione
Keyword (max 3 separate da virgola)
Graph, Genomics
Descrizione (max 500 caratteri)
Graph-based data structures provide a natural mechanism for the compact representation of related genomic sequences, and variations among them, as alternative paths in a directed graph. Consequently, many genome assembly tools currently use internal graph representations and offer the possibility to output the assembly graph in various formats. However, most genome assembly projects still focus on “classic” contigs and scaffolds rather than assembly graphs, due to the lack of proper tools for the analysis and the quality assessment of such graphs. In this context, we are currently developing GAGET, a tool for the evaluation of genome assembly graphs, based on the alignment of reference sequences to the graphs themselves.
Currently, GAGET computes a series of different quality metrics, adapted from the sequence to the graph domain (e.g. N50, NG50, GC content), and it provides as output a report with different plots describing the results. The aim of this project is to develop an interactive Graphic User Interface (GUI) for navigating the assembly graph and the reference genome, and visualizing the computed metrics. An additional goal is to improve the accuracy of the current algorithm for selecting the best set of compatible local alignments between the reference and the assembly graph, in order to reconstruct the path in the graph that better represents the reference sequence.
Currently, GAGET computes a series of different quality metrics, adapted from the sequence to the graph domain (e.g. N50, NG50, GC content), and it provides as output a report with different plots describing the results. The aim of this project is to develop an interactive Graphic User Interface (GUI) for navigating the assembly graph and the reference genome, and visualizing the computed metrics. An additional goal is to improve the accuracy of the current algorithm for selecting the best set of compatible local alignments between the reference and the assembly graph, in order to reconstruct the path in the graph that better represents the reference sequence.