Question: salmon output for DESeq2 analysis
0
gravatar for capricygcapricyg
6 months ago by
capricygcapricyg0 wrote:

HI, Michael,

I read your DESeq2 vignette: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

and found that DESeqDataSet could be derived from the salmon output (Transcript abudance) or count matrix.

I wonder if you ever compare the results of these two process (salmon->tximport->DESeq2 versus counts->DESeq2) for differential gene call from the same sequencing dataset?

Thanks.

C

deseq2 • 191 views
ADD COMMENTlink modified 6 months ago • written 6 months ago by capricygcapricyg0
Answer: salmon output for DESeq2 analysis
0
gravatar for Michael Love
6 months ago by
Michael Love24k
United States
Michael Love24k wrote:

Yes such comparisons were made in the tximport publication.

ADD COMMENTlink written 6 months ago by Michael Love24k

 

Michael,

Thank you very much for your quick response!

To make sure my understanding is correct, I found the following paper:

https://www.ncbi.nlm.nih.gov/pubmed/26925227

And you conclusion is: "salmon->tximport->DESeq2" is better than "counts->DESeq2"?

Kind regards,

C.

ADD REPLYlink modified 6 months ago • written 6 months ago by capricygcapricyg0

Yes the advantages are that it protects against estimation bias from DTU, enables certain fragment level biases to be estimated and preserves multimapping reads.

ADD REPLYlink modified 6 months ago • written 6 months ago by Michael Love24k

I have different concerns, actually:

counts data usually come from genome alignment; however, salmon data from the transcriptome alignment. I found tximport converted counts were not really matching the genome alightment-based counts...

ADD REPLYlink written 6 months ago by capricygcapricyg0

What would be the point of tximport if you got the same thing as the genome-based alignment? Put another way, both alignment to the genome with subsequent counting and alignment to the transcriptome and then collapsing to the gene level are attempts to get at the same thing - the relative amount of transcript in a given sample for each gene. But we don't know how much transcript there is!

The fact that two different methods of estimating some underlying (unobserved) quantity don't necessarily agree doesn't invalidate either of them, because we don't know what the base truth is. If you want to believe that aligning to the genome and then generating counts is 'the right way to do things', then you should do that. If you are persuaded by Mike's paper that you get better results aligning to the transcriptome and then summarizing using tximport, then you should do that instead. But comparing the two and noting they are different doesn't tell you anything because the only reason for having a different method is because it's different than what came before.

ADD REPLYlink written 6 months ago by James W. MacDonald50k

Hi, James,

As you mentioned that we don't know what the base truth is, whenever the outputs are different, I just would like to know if anyone has ever tested which one makes more sense...

C.

ADD REPLYlink written 6 months ago by capricygcapricyg0

Yes. Mike did, in the tximport paper that he already mentioned. Have you read it?

ADD REPLYlink written 6 months ago by James W. MacDonald50k

Good points. Just want to point out that Charlotte Soneson is the first author.

ADD REPLYlink written 6 months ago by Michael Love24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 306 users visited in the last hour