Fact sheet 12: Bisulfite Sequencing Methods

Adam Nunn & Samar Fatma

Though it is by no means the only biological mechanism within the domain of epigenetics, DNA methylation is among the most prevalent and the most studied throughout the field. But how exactly do we detect differences in methylation within species? One technique that has emerged at the forefront of epigenetic research is bisulfite sequencing: a distinct adaption of next-generation sequencing technology which brings genome-wide methylation profiles into nucleotide-level resolution.

The technique, as refined by Lister et al. [1] and Cokus et al. [2], involves the treatment of extracted DNA from test samples with sodium bisulfite, a deamination agent which mediates the conversion of unmethylated cytosine nucleotides into uracil. Cytosine bases which carry methyl groups (e.g. 5-methylcytosine, 5-hydroxymethylcytosine) are left unaffected by the treatment and remain in their original, unconverted state. As uracil residues are subsequently interpreted during standard DNA sequencing as thymine, these bisulfite-treated samples can be subjected to standard sequencing protocols and used to generate FastQ reads which carry epigenetic information. Once generated, the reads effectively move the question from a biological one to a computational, algorithmic one.

In standard sequencing the next step is usually to follow protocols for read alignment of the generated reads to a reference genome assembly. This presents some issues when handling bisulfite data, as thymine residues can no longer be considered as entirely independent entities to cytosine. Read alignment algorithms usually operate on the basis of some kind of scoring matrix, which assigns an overall probability for the alignment of two sequences based on the number and position of matches, mismatches, insertions and deletions between nucleotides. The problem arises in that reference cytosines can conceptually match with thymines in bisulfite-treated reads, but not vice versa (with the exception of mutations e.g. single nucleotide polymorphisms). Existing algorithms are often not built to handle this asymmetry between bases, so the solution is either to further adapt these tools in some way or to operate specifically with algorithms designed for bisulfite data. Several tools now exist in representation of either category, including notably Bismark [3] and BWA-meth [4], which adapt the popular standard aligners Bowtie [5] and BWA [6], and software such as Segemehl [7] or ERNE [8] which are capable of interpreting bisulfite reads in their own right.

The principles of bisulfite sequencing notwithstanding, another important consideration when designing such an experiment involves the chosen strategy for library preparation. In the previous chapter we discussed sequencing depth and coverage; this applies here as we seek to maximize sequencing coverage with regards to the scope of the questions we are looking to answer, and the practical limitations of the study such as cost and time. Will your study look to investigate genome-wide methylation patterns, or is it enough to focus on a reduced subset of the DNA? Herein we will consider the applications of Reduced-Representation Bisulfite Sequencing (RRBS), Whole-Genome Bisulfite Sequencing (WGBS), and target enrichment methods such as SeqCapEpi. In particular, what implications do such protocols have on the robustness of the data, and what should we adjust for in terms of quality control during the downstream analyses?

By the end of the chapter we will have covered the various technical concerns of bisulfite sequencing from DNA extraction and library preparation through to the sequencing itself and the downstream extraction of methylated positions. The bioinformatics principles determine the validity of the data in answering the questions posed by the study, and a priori consideration is therefore fundamental to the successful outcome of any such experiment. Finally, we will discuss the hard limitations of bisulfite sequencing and give brief suggestions for alternative methods that might be used to address these issues.

1. Lister R, Malley RCO, Tonti-filippini J, Gregory BD, Berry CC, Millar  a H, et al. NIH Public Access. 2009;133:523–36.

2. Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD, et al. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature. 2008;452:215–9.

3. Krueger F, Andrews SR. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. bioinformatics. 2011 Apr 14;27(11):1571-2.

4. Pedersen BS, Eyring K, De S, Yang IV, Schwartz DA. Fast and accurate alignment of long bisulfite-seq reads. arXiv preprint arXiv:1401.1129. 2014 Jan 6.

5. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature methods. 2012 Apr;9(4):357.

6. Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. bioinformatics. 2009 Jul 15;25(14):1754-60.

7. Otto C, Stadler PF, Hoffmann S. Fast and sensitive mapping of bisulfite-treated sequencing data. Bioinformatics. 2012 May 10;28(13):1698-704.

8. Prezza N, Del Fabbro C, Vezzi F, De Paoli E, Policriti A. ERNE-BS5: aligning BS-treated sequences by multiple hits on a 5-letters alphabet. InProceedings of the ACM conference on bioinformatics, computational biology and biomedicine 2012 Oct 7 (pp. 12-19). ACM.