Question: Are spurious correlations possible in correlating two DESeq2 foldchanges computed using the same reference sample group?
gravatar for Cornwell, Adam
27 days ago by
United States
Cornwell, Adam110 wrote:

I recently looked at a section of a paper (unpublished) in which the authors, wary of introducing spurious correlations when comparing two sets of DESeq2 foldchanges, ended up computing foldchanges for two sample groups against two different subsets of their control samples. The potential issue is summarized in Wikipedia. In short, the idea is that if there are gene expression sample sets A, B, and, C, that correlating the ratios A/C and B/C might show some relationship even if A and B are independent due to having been calculated with C as the denominator.

Embarrassingly, I hadn't considered this before, although correlating ratios is definitely something I have done. I could certainly see this as an issue with microarray data, where foldchanges are usually computed as the simple ratio of mean group expression. Upon reviewing the foldchange calculation method in DESeq2, it seems like it could also be a problem? Is this the case?

Ratios- foldchanges in particular- are often nice to work with due to biological interpretability. However, if it's potentially dangerous to correlate them in cases where all sample groups of interest were compared against a single set of control samples- even if we're working with foldchange estimates from DESeq2- then I'll add that to my mental list of things not to do in bioinformatics.

Sorry for the lack of a data-based example, but I figured this is a mostly theoretical question.


ADD COMMENTlink modified 16 days ago • written 27 days ago by Cornwell, Adam110
gravatar for Michael Love
26 days ago by
Michael Love20k
United States
Michael Love20k wrote:

Yes, under the null of no differences among A, B, and C, the standard MLE for the log2 of C vs A and B vs A will be positively correlated. I  think an LFC shrinkage method will reduce this correlation some but not entirely, because LFCs consistent with 0 for both comparisons will move closer to the origin in this plot, but I think that there will still be some positive correlation under the null. I wouldn't report a correlation here, nor a correlation test p-value, as the dependence is baked in.

One thing I'll note: I wouldn't have a problem making a scatter plot with only those LFCs that have a low FDR in both groups. Under the null you should get none of these LFC pairs. Given that there is a significant difference between say, C and A, for some gene, seeing if B happens to be on the same side of A as C, or on the other side (LFC sign change) is interesting.

ADD COMMENTlink modified 26 days ago • written 26 days ago by Michael Love20k
gravatar for Ryan C. Thompson
26 days ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.0k wrote:

The limma package has a function called genas that is meant to solve exactly this problem: estimating the degree of genuine correlation between fold changes that are expected to be correlated under the null hypothesis as a result of using a common reference. You should check into that function and the associated references that explain the method.

ADD COMMENTlink modified 26 days ago • written 26 days ago by Ryan C. Thompson7.0k

Cool, didn’t know about that.

ADD REPLYlink written 26 days ago by Michael Love20k

That's very cool, except the function references mainly Belinda Phipson's PhD dissertation, and the library link provided in the genas help says "This item is currently not available from this repository" (emphasis original). The Majewski et al article does not really explain the method, nor does the Ritchie et al.

ADD REPLYlink written 26 days ago by Peter Langfelder1.6k

Hmm, it's been a while since I've actually chased down these references. Perhaps one of the limma authors more familiar with it can help?

ADD REPLYlink written 26 days ago by Ryan C. Thompson7.0k

Nice, I suppose that was added since the last time I went through the limma documentation since I haven't come across it before.

ADD REPLYlink written 25 days ago by Cornwell, Adam110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 364 users visited in the last hour