Metagenomics is the study of the collection of genomes in an environment. Environments as diverse as Antarctic lakes, hot springs, or the human gut can be biologically characterized by extracting and sequencing DNA from samples taken from them. A characteristic of many of these samples is their complexity, posing difficulties to their analysis and characterization. However, metagenomics allows the taxonomic and functional characterization of samples. These two kinds of characterizations also enable the comparison of different habitats for biodiversity assessment.
In this course students will be introduced to the command line environment used to analyze high-throughput sequencing data (HTS). The initial cleaning steps that must be performed on every HTS dataset will be described and we will use the processed data for proper functional and taxonomical characterization of a metagenomic dataset. We will use methods such as mapping to whole genome databases, de novo assembly, gene annotation, building of non-redundant gene catalogue, and metagenomic species concept identification. Due to the wide usage of metabarcoding for the taxonomic characterization of an environment, we will also discuss amplicon sequencing strategies and data analysis. The course will be based on both theory and hands-on exercises.
Graduate or postgraduate degree in Life Sciences and basic knowledge of Statistics and Genetics. Prior experience with bioinformatics is not required but some experience with running commands in R or Linux will be a plus.
All participants must bring their own personal laptop (Windows, Macintosh, Linux). Participants with PC’s operating Windows on them will be asked to install VirtualBox with Ubuntu (free of cost). Instructions will be sent prior to the beginning of the course.