Question

Salmon, tximport and DeSeq2 for differential expression analysis of 3' biased data

0

Entering edit mode

sunil.mangalam • 0

@sunilmangalam-8881

Last seen 6.9 years ago

United States

Hi,

I wist to use tximport followed by DeSeq2 for differential expression analysis of RNA-Seq data quantified by Salmon (from fastq files). The libraries are 3' end biased and have limited isoform level information (situation similar to that shown in Figure 1C of the Sonenson et al paper on the tximport package). I am trying to find the best set of parameters for DE analysis. Is it better to use the transcript length offset for normalization in this case?

Thanks

Sunil Sukumaran

Research Associate

Monell Chemical Senses Center

Philadelphia PA

tximport DESeq2 "tximport" deseq2 salmon • 2.9k views

ADD COMMENT • link 6.9 years ago sunil.mangalam • 0

score 0 · Answer 1 · 2017-06-05

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

The normal tximport => DESeq2 pipeline should work fine (using average transcript length offset). It will only have a differential offset across samples when there is evidence of this from the data. There could be transcript length differences that are observable even if the data is 3' biased, e.g. alternative exons with sufficient local coverage.

ADD COMMENT • link 6.9 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

I want to dig into this a bit. more. As you mentioned, there is some splicing info in 3' biased data, but the number of reads drops of precipitously as we move towards the 5' end of genes in my data set. When I use featureCounts, normalizing for gene length by FPKM/RPKM really screwed up the analysis and I got very few differentially expressed genes. DESeq2 normalization which does not factor in gene length is more meaningful for this data set. I am aware that salmon models 3' (and 5') bias, but my understanding is that it does so for regular libraries where the bias is only at the very 3' or 5' ends (~200 bp or so). It would be wonderful if salmon can model the 3' bias resulting from RNA amplification, but I suspect this is not the case- perhaps it is asking for too much...

The effective length provided by salmon drops down quite a bit for transcripts that are expressed at very low levels, but it is quite close to the full length when they are even only moderately expressed. So would you recommend dropping the effective length normalization altogether and just sum the counts at gene level?

Thank you.

Sunil

ADD REPLY • link 6.9 years ago sunil.mangalam • 0

0

Entering edit mode

Not related to salmon, but why would the (R|F)PKM screw up your differential expression analysis when you used featureCounts?

I mean, if you ran featureCounts to get counts over your gene features, the most natural thing I could think of doing would be to then feed those counts directly into edgeR, edgeR->voom, or DESeq2 ... no (R|F)PKMs in sight ... know what I mean?

Are you saying that the effective length is incredibly variable across samples and this is something you further want to try to correct for?

ADD REPLY • link 6.9 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Agree with Steve: unless there are differences in effective length across samples, it won't affect the tools that use counts (even using the average transcript length offset from tximport). Whether the effective length is the same as transcript length or much smaller (or larger), differences across genes/transcripts will all be zero-ed out before being imported as an offset for DESeq2. All that remains is the difference for a gene/transcript across samples.

ADD REPLY • link 6.9 years ago Michael Love 41k

0

Entering edit mode

Yes, I agree... Actually I got confused with a comparison of a cuffdiff analysis with the STAR-FeatureCounts-DeSeq2 pipeline I did long back. At that time I interpreted it as an effect of the FPKM normalization, but perhaps it is more because cuffdiff is very conservative compared to DeSeq2. For what it is worth, I run DeSeq2 using Salmon results with/without length normalization and post a summary here.

Thanks!

ADD REPLY • link 6.9 years ago sunil.mangalam • 0

score 0 · Answer 2 · 2017-06-05

0

Entering edit mode

sunil.mangalam • 0

@sunilmangalam-8881

Last seen 6.9 years ago

United States

Thanks.

Sunil

ADD COMMENT • link 6.9 years ago sunil.mangalam • 0