(B) Interactive table of per-gene expression values, which you can sort and search. At the top there is a button that lets you toggle between gene or transcript expression. (A) histogram of log transformed FPKM values. Basepair RNA-seq analysis pipelines provide expression in raw counts, FPKM, and another common normalization method, TPM (Figure 3).įigure 3: Representative screenshot of the figure and table Basepair provides. The measure FPKM (Fragments Per Kilobase of exon model per Million mapped reads) is a common within-sample normalization method. Raw read counts do not account for biases such as gene length and total number of reads in each sample. However, raw read counts are not appropriate to compare expression between samples and genes. The number of reads that align to each gene is quantified using programs such as HTSeq-count or featureCounts. The next step is to quantify gene expression. Basepair visualizes the read processing steps from the total starting reads, how many were trimmed, to how many aligned (Figure 2).įigure 2: Representative Sankey plot from Basepair showing the amount of reads you have starting from raw data to trimming and alignment. Alignment outputs data in the SAM format (Sequence Alignment Map), which is then converted to the compressed format (BAM) and used for further downstream analyses. However, some reads may align to multiple locations, so it is unclear which gene the read came from, and are removed. Ideally, reads align uniquely to one place in the genome, and you generally want to see >70% uniquely aligned reads. There are several metrics to pay attention to after alignment is finished. Another option if your species has no reference is to assemble your own transcriptome using Trinity, which is also offered on Basepair. Basepair provides the popular STAR and TopHat tools. Alignment is the algorithm to figure out which gene a read came from. Read AlignmentĪfter trimming, reads are typically aligned to either a reference genome or transcriptome. “Q30” means the percentage of bases in all reads with quality score of 30 or greater. For a more detailed overview of how trimming impacts data quality, read our blog post on the subject.įigure 1: Representative plot from Basepair showing the mean quality scores for each position in the read after trimming. This increases the amount of useful data for downstream analyses. Trimming is important because it removes the poor quality parts of the read, or even the entire read if necessary. ![]() But how is quality measured? Sequencing machines will output a quality score for each base pair sequenced (called a Q value). On Basepair, the impact of trimming on read quality is commonly shown in figures like Figure 1. Adaptors are short sequences used to prepare your sample for sequencing and need to be removed before analysis. Due to this dizzying array of RNA-seq applications, we want to provide a gentle introduction on the major steps in analyzing RNA-seq data:Īs the first step, low quality reads and adaptor sequences need to be removed from the data (a step called trimming). It can find new cancer subtypes, discover the transcriptome of never-before-sequenced species, elucidate tissue-specific gene expression, and many other applications. Hence, researchers can use RNA-seq to answer many of their most compelling questions. There are a wide range of applications for RNA-seq, from expression quantification, discovery of novel genes and gene isoforms, differential expression, and many other types of functional analysis. RNA-seq is a technology that examines the whole transcriptome at unprecedented levels of sensitivity.
0 Comments
Leave a Reply. |