Fact sheet 15: Methylation extraction and variant calling

Sultan Nilay Can and Bhumika Dubay

 

DMP/DMR Calling in terms of context (CHH, CHG, CG)

Cytosine carbon 5 position (5meC) methylation of DNA is an important epigenetic mechanism mostly found in CpG or CpHpG (H = A,C,T) sequence contexts to control gene expression (transcription), chromosome stability, genomic imprinting and silencing of transposons in plants. Therefore, the production of high quality whole genome methylation maps of single cytosine is an important attitude for understanding of how DNA methylation play a role in regulation of gene expression or the generation of abnormal phenotype. Moreover, we can get an impression about the patterns of methylation establishment.

Whole-genome bisulfite sequencing (WGBS) and reduced representation bisulfite sequencing (RRBS) have been widely used to measure DNA methylation at a single CpG resolution. These collected methylation information needs to be analyzed and related with a conclusion to assign a biological meaning to it. That’s why some statistical analyses needed to differentiate methylated regions from unmethylated ones. Differentially methylated regions (DMRs) is one important term shows the contiguous genomic regions whose DNA methylation character shows difference between two groups of samples and they are composed of Differentially Methylated Positions (DMPs). DMRs have been used to characterize cell-type or condition specific DNA methylation. The detection of DMRs is a necessary condition to characterize different epigenetic states. For now, this is an important bottleneck in current methylome analysis [7].

The main problem of detecting DMRs has two levels: The first level is to find a genomic region, in the second level, the individuals of two groups are significantly differed in their methylation levels. Some current solutions were provided such as using pooled data and employ some statistical distributions or suitable regression models fitted to single CpG methylation rates. After testing single CpGs for differential methylation (DMC), significant DMCs are merged to DMRs using däfferent methods (. There are several tools able to detect DMRs, but most designed only for mammalian systems and, thus, they were designed to primarily call CpG methylation. The methylation level of each cytosine was calculated as (#of methylated reads ) /  (#of methylated reads + #of unmethylated reads ). [1]

In Arabidopsis, CHG and CHH methylations have different sequence preferences. Most of mammalian and plant DNA methylation is restricted to symmetrical CG sequences, but plants also have significant levels of cytosine methylation in their symmetric context CHG (where H is A, C or T) and even in asymmetric sequence [6].

Recent study [5] results showed that CpG methylation is always symmetrically methylated, whereas non-CpG sites are strand-specifically methylated in introns, SINE elements and LINE elements. Even though hydroxymethylcytosine (hmC) could not be distinguished from methylcytosine by the current bisulfite conversion method showed that hmC is unlikely to occur in non-CpG sites; thus, we do not expect hmC to influence our main conclusions. We also showed that the skew of non-CpG methylation in intron is more pronounced at the boundaries and more significant for highly expressed genes.

Variant Calling from “Bisulfite Sequencing”

SNP identification is important to identify of allele-specific epigenetic events like, gamete specific and genetic imprinting. However, SNP calling from BS-Seq data has been shown to be complicated [1]. One problem is that reads from two genomic strands are not complementary at methylated loci. Two methods were widely used to solve this problem. First option is to  align the reads in a three-letter space; the other is a wildcard algorithm which accounts for the C/T conversions [3]. There exist many software packages use these two methods. Bismark , MethylCoder and BS Seeker are based on the Burrows–Wheeler transform. Bismark and BS Seeker use Bowtie  to align the reads in a three-letter space [4]. BSMAP  uses a wildcard algorithm. Also a tool called as Biscuit is quite common for variant detection from bisulfite data [12].

Two steps are implemented to obtain the final SNP set. These are “Dynamic matrix algorithm” and “Approximate Bayesian modeling” [8][9].

Different statistical models (HMM,Non Linear / Linear, Bimodal etc.)

Every DMR tool has its own algorithm to detect and collect methylated regions and they prefer different statistical algorithms in their method. Also, some of the tools are mostly R executable but they can be executed in bash also. Each algorithm has their own advantages and disadvantages in it [10]. It would be better to take a look at commonly used couple of tools to analyze their algorithms [13].

 

[1] Barturen, G., Rueda, A., Oliver, J. L., & Hackenberg, M. (2013). MethylExtract: High-Quality methylation maps and SNV calling from whole genome bisulfite sequencing data. F1000Research, 2, 217. http://doi.org/10.12688/f1000research.2-217.v2

[2] Catoni, M., Tsang, J. M., Greco, A. P., & Zabet, N. R. (n.d.). DMRcaller: A versatile R/Bioconductor package for detection and visualization of differentially methylated regions in CpG and non-CpG contexts. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/29986099

[3] Chen P.Y., et al. (2010) BS Seeker: precise mapping for bisulfite sequencing. BMC Bioinformatics, 11, 203.

[4] Krueger F., Andrews S.R. (2011) Bismark: a flexible aligner and methylation caller for bisulfite-seq applications. Bioinformatics, 27, 1571–1572.

[5] Langmead B., Salzberg S.L. (2012) Fast gapped-read alignment with Bowtie 2, Nat. Methods, 9, 357–359.

[6] Lister R, O'Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell. 2008;133:523–536.

[7] Luo, S., & Preuss, D. (2003). Strand-biased DNA methylation associated with centromeric regions in Arabidopsis. Proceedings of the National Academy of Sciences of the United States of America, 100(19), 11133–11138. http://doi.org/10.1073/pnas.1831011100

[8] Meissner, A., Gnirke, A., Bell, G. W., Ramsahoye, B., Lander, E. S., & Jaenisch, R. (2005). Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis. Nucleic Acids Research, 33(18), 5868–5877. http://doi.org/10.1093/nar/gki901

[9] Pedersen B., et al. (2011) MethylCoder: software pipeline for bisulfite-treated sequences. Bioinformatics, 27, 2435–2436.

[10] Robinson, M. D., Kahraman, A., Law, C. W., Lindsay, H., Nowicka, M., Weber, L. M., & Zhou, X. (2014). Statistical methods for detecting differentially methylated loci and regions. Frontiers in Genetics, 5, 324. http://doi.org/10.3389/fgene.2014.00324

[11] Wilkins J.F. (2005) Genomic imprinting and methylation: epigenetic canalization and conflict. Trends Genet., 21, 356–365.

[12] Xi Y., Li W. (2009) BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinformatics, 10, 232.

[13] https://omictools.com/snp-detection-category