Fact sheet 11: NGS analyses

Samar Fatma and Paloma Perez-Bello Gil

Next Generation Sequencing describes DNA sequencing technology which revolutionized biological research. Presently, a number of different NGS platforms are available using different and improved sequencing technologies to overcome bottleneck in whole genome sequencing. Output of these sequencing machine are sequences of nucleotides called ‘Reads’ corresponding to all or a part of a DNA fragment. Length of theses reads vary for different sequencing technologies ranging from ~150 to ~1000 base pairs. Millions of reads are generated for a single input sample depending upon the size of the genome and the technology used. To proceed with the analysis, it is necessary to stitch together sequence reads to a long continuous sequence called ‘contig’. This process of piecing together these reads is called assembly. To handle the immense amount of sequenced data, different algorithms with different paradigms are used in genome assemblers. These can be divided into two main kinds: comparative or reference based assembly and de novo assembly. To unravel the potential of genome sequences, annotation is needed to extract relevant information ranging from gene models and functional information to microRNA and epigenetic modification. For non-model species annotation is generally confined to protein-coding sequence. Therefore, a series of steps are involved in genome annotation using different bioinformatics tools.

Bioinformatic analysis of Next Generation Sequencing data is a broader term covering variation in genomic sequence to structural and functional analysis of proteins. Looking into the importance of variation in genomic sequences and their association with traits or diseases, Genome Wide Association Studies (GWAS) are done across the population or set of individuals. In plants, it provides insights into the gene level by associating phenotypic variation with single nucleotide polymorphisms (SNPs) to identify often small haplotype blocks that are significantly correlated with quantitative trait variation. Phenotypic changes that are not based on DNA sequence variation but chemical modifications that influence gene activity and expression are called epigenetic changes. Thus, Epigenetic Wide Association Studies (EWAS) observe genome – wide set of these epigenetic marks in different individuals and infer association between epigenetic variation and identifiable phenotype/trait. For further reading please refer to [1–6].

1. Behjati S, Tarpey PS. What is next generation sequencing? Arch Dis Child Educ Pract Ed. 2013;98:236–8.

2. Wajid B, Serpedin E. Review of General Algorithmic Features for Genome Assemblers for Next Generation Sequencers. Genomics, Proteomics Bioinforma. 2012;10:58–73. doi:10.1016/j.gpb.2012.05.006.

3. Martin JA, Wang Z. Next-generation transcriptome assembly. Nat Rev Genet. 2011;12:671–82. doi:10.1038/nrg3068.

4. Brachi B, Morris GP, Borevitz JO. Genome-wide association studies in plants: The missing heritability is in the field. Genome Biol. 2011;12.

5. Ekblom R, Wolf JBW. A field guide to whole-genome sequencing, assembly and annotation. Evol Appl. 2014;7:1026–42.

6. Richards CL, Alonso C, Becker C, Bossdorf O, Bucher E, Colomé-Tatché M, et al. Ecological plant epigenetics: Evidence from model and non-model species, and the way forward. Ecol Lett. 2017;20:1576–90.