Question: Transcript abundance estimation for time-course RNA-seq data
0
gravatar for relathman
12 months ago by
relathman0
Germany
relathman0 wrote:

Dear community,

I am currently analysing a set of time-course (~10 time points) stranded paired-end RNA-seq data with the particular objective of identifying time-dependent changes in alternative splicing.

However, I am still undecided whether alpine or Salmon or another method would be better suited for the estimation of transcript abundance. Up to this point, I used STAR to align the data to the reference genome for each individual time point. Is there an optimal work flow that you would recommend for read-depth normalization and isoform-specific transcript abundance estimation for time-course data?

Concerning read-depth normalisation, I have read the recommendation of "downsampling" reads in order to get comparable numbers for all samples, assuming that the number of reads is roughly equal across samples. However, the number of reads from my samples differs substantially (50-125 m reads) and I do not want to throw away so much data.

I would be very grateful to hear your thoughts on the matter.

ADD COMMENTlink modified 12 months ago by Michael Love24k • written 12 months ago by relathman0
Answer: Transcript abundance estimation for time-course RNA-seq data
2
gravatar for Michael Love
12 months ago by
Michael Love24k
United States
Michael Love24k wrote:

alpine is mostly a tool for researching bias itself, for production-level quantification, please use Salmon instead. They implement the same GC bias model essentially, but Salmon is higher quality software and is massively faster and more efficient.

Salmon provides accurate estimates even if there is a dependence of coverage on fragment GC content, which is one of the most common biases in Illumina sequencing data. I recommend to therefore always run Salmon with the flag --gcBias. See the manuscript and Salmon home page for more details.

Read depth is normalized if you load Salmon data into R using tximport, and then follow the steps for downstream analysis. The downstream packages will then normalize for depth. It is not recommended to downsample, this leads to loss of information, and parametric models can handle the change in depth with an offset. edgeR, DESeq2, limma-voom all go this route rather than throw away some of the data.

You can perform testing at the isoform level if you like. From our latest preprint, there are many Bioconductor packages which work well for "DTE" (differential transcript expression). See Figure 13 for comparison.

https://f1000research.com/articles/7-952/v1

ADD COMMENTlink modified 12 months ago • written 12 months ago by Michael Love24k

Many thanks for your fast reply and your explanation. Since Salmon can either map reads itself as well as work with precomputed read alignments, I am unsure what the best approach would be - does quantification with Salmon work "better" with Salmon-mapped reads or are these two independent steps?

ADD REPLYlink written 12 months ago by relathman0

Take a look at the Salmon paper.

I think alignment free was a bit better on simulated data without bias.

ADD REPLYlink written 12 months ago by Michael Love24k

I've followed your advice and used tximport for read depth normalization of the Salmon TPM values. In the next step, I want to normalize between samples using TMM from edgeR. I am not necessarily interested in performing classical DTU/DGE analysis but rather want to visualize and compare the expression changes over time. Would you recommend using scaledTPM transcript counts or lengthscaledTPM transcript counts as an input for the edgeR TMM normalization?

Many thanks again!

ADD REPLYlink modified 12 months ago • written 12 months ago by relathman0

I'm more familiar with the DESeq2 normalization functions, so I'll show that. If you had TPMs and wanted to normalize them so they are more comparable across samples, you can use the median ratio method:

sf <- estimateSizeFactorsForMatrix(tpm.mat)
​tpm.norm <- t( t(tpm.mat) / sf )
ADD REPLYlink written 12 months ago by Michael Love24k

Thanks a lot! Would tpm.mat then be the matrix of the originally estimated transcript counts (unscaled TPMs = countsFromAbundance="no" option from tximport) as output by Salmon?

ADD REPLYlink written 12 months ago by relathman0

Abundances are never modified by tximport, only counts. So it’s the abundance matrix regardless of options

ADD REPLYlink written 12 months ago by Michael Love24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 367 users visited in the last hour