Question

Different results using DESeq2

0

Entering edit mode

bekah ▴ 40

@bekah-12633

Last seen 6.9 years ago

Hiya,

I am finding that if I am inputting my data in one count matrix and calling to contrast two different treatments from the four (each with 5 sample replicates) using the contrast function in DESeq2 that I get different differential expression results to that if I look at the samples using the automated trinity DE analysis pipeline using DESeq2. I am assuming that this is because all of the samples are taken into account when calculating dds (20 samples, but 10 specified to use for P value calculation) when I am manually using R, when compared to trinity which pulls the pairwise comparisons and makes a new matrix to input (10 samples instead of 20)?

Best wishes,

Rebekah

deseq2 • 2.1k views

ADD COMMENT • link updated 8.9 years ago by Michael Love 43k • written 8.9 years ago by bekah ▴ 40

score 0 · Answer 1 · 2017-03-21

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 19 hours ago

United States

That would result in different parameters and inference yes. I don't know how Trinity calls DESeq2.

ADD COMMENT • link 8.9 years ago Michael Love 43k

0

Entering edit mode

Trinity seems to make a new matrix containing just the data from the two pairs of treatments to input into DESeq2 - is it better to use this method or use the entire dataset and use the contrast function in the script?

ADD REPLY • link 8.8 years ago bekah ▴ 40

0

Entering edit mode

See the DESeq2 FAQ in the vignette for my answer to this question.

ADD REPLY • link 8.8 years ago Michael Love 43k

0

Entering edit mode

Cheers - sorry I had missed that entirely when I read the vignette the first time!

ADD REPLY • link 8.8 years ago bekah ▴ 40

0

Entering edit mode

I'm a little bit confused - is there a benefit for having the same single dispersion value for the genes across all samples, as in is it better to be consistent (therefore having a consistent sensitivity to selecting sig, DE genes) if wanting to eventually look at the different log counts across all samples? Or is this just a benefit in that it takes less time than having to input each pairwise comparison in?

ADD REPLY • link 8.8 years ago bekah ▴ 40

0

Entering edit mode

One advantage to having a single dispersion parameter is that you have more samples with which to estimate it, so less variance on the estimator. If the dispersion is similar across groups, then you have improved estimation from estimating with a single parameter. However, if the dispersion is very different, it tends to be too conservative in that you overestimate dispersion for some groups due to high dispersion in other groups. Note that dispersion of counts is not the same as variance of counts, dispersion can be thought of as approximately the square of the coefficient of variation.