Fact sheet 13: Transcriptomics

Bhumika Dubay and Adam Nunn

Transcriptomics is the study of RNAs and their functions in the cell and because of the versatility in application it plays a crucial role in epigenetic studies. Basic transcriptomics studies comprise data accumulation, data preprocessing and data analysis as any other omics study. There are multiple ways of generating data for transcriptomic analyses depending on its application. Microarray and RNAseq are two major and widely studied methods among them but recent advances in the latter have made it a more preferable method for gene expression profiling [1].

Expression quantification and determining the differential expression between the sample and the control is one of the major steps in transcriptomic studies. Expression is usually measured by the number of reads mapped to each locus in the transcriptome assembly step of RNAseq analyses [2]. HTSeq, FeatureCounts, Rcount, maxcounts, FIXSEQ, and Cuffquant are some of the tools that can measure the expression [2]. After quantifying the gene expression we can compare gene expression between conditions, such as a drug treatment vs. non-treated, methylation vs. no methylation and so on, and find out which genes are up- or down-regulated in each condition. Differentially expressed genes can be identified using tools that count the sequencing reads per gene and compare them between samples. Two of the most commonly used tools for this kind of analyses are DESeq and edgeR, packages from Bioconductor which are based on the negative binomial distribution [2].

Coexpression networks are data-derived representations of genes behaving in a similar way in the differential expression analysis [3]. They are used to infer genes involved in specific pathways based on Pearson correlation and one of the examples is the weighted gene co-expression network analysis that has been successfully used to identify co-expression modules and intramodular hub genes based on RNAseq data.

Gene set enrichment analysis (GSEA) is the next step which is a method that identify classes of genes that are over-represented in a large set of genes, and may have an association with specific phenotypic condition. The method uses statistical approaches to identify significantly enriched or depleted groups of genes [4]. There are many websites and downloadable programs that provide data sets and run the analyses such as PlantRegMap, Broad Institute, Enrichr, GeneSCF, DAVID, AmiGO 2, ToppGene Suite etc.

Diverse classes of RNA, ranging from small to long non-coding RNAs, have emerged as key regulators of gene expression, genome stability and defense against foreign genetic elements and this is why it is important to include them in the transcriptomics studies for epigenetic regulations [5]. The same techniques that are used for the transcriptomic studies of coding RNA can be used to analyse the non-coding RNAs as well.

Measuring the expression of an organism’s genes in different tissues or conditions, or at different times, gives information on how genes are regulated and reveal details of an organism’s biology. It can also be used to infer the functions of previously unannotated genes. Transcriptome analysis has enabled the study of how gene expression changes in different organisms and has been instrumental in the understanding of human disease. An analysis of gene expression in its entirety allows detection of broad coordinated trends which cannot be discerned by more targeted assays.

  1. McGettigan PA. Transcriptomics in the RNA-seq era. Curr Opin Chem Biol. 2013;17:4–11.
  2. Greenbaum D, Colangelo C, Williams K, Gerstein M (2003). "Comparing protein abundance and mRNA expression levels on a genomic scale". Genome Biology. 4 (9): 117. doi:10.1186/gb-2003-4-9-117. PMC 193646 Freely accessible. PMID 12952525.
  3. Griffith M, Walker JR, Spies NC, Ainscough BJ, Griffith OL. Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud. PLoS Comput Biol. 2015;11:1–20.
  4. Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D. A combined algorithm for genome-wide prediction of protein function. Nature. 1999;402:83–6.
  5. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci. 2005;102:15545–50. doi:10.1073/pnas.0506580102.
  6. Holoch D, Moazed D. RNA-mediated epigenetic regulation of gene expression. Nat Rev Genet. 2015;16:71–84. doi:10.1038/nrg3863.