Question: Are spurious correlations possible in correlating two DESeq2 foldchanges computed using the same reference sample group?
0
8 months ago by
United States

I recently looked at a section of a paper (unpublished) in which the authors, wary of introducing spurious correlations when comparing two sets of DESeq2 foldchanges, ended up computing foldchanges for two sample groups against two different subsets of their control samples. The potential issue is summarized in Wikipedia. In short, the idea is that if there are gene expression sample sets A, B, and, C, that correlating the ratios A/C and B/C might show some relationship even if A and B are independent due to having been calculated with C as the denominator.

Embarrassingly, I hadn't considered this before, although correlating ratios is definitely something I have done. I could certainly see this as an issue with microarray data, where foldchanges are usually computed as the simple ratio of mean group expression. Upon reviewing the foldchange calculation method in DESeq2, it seems like it could also be a problem? Is this the case?

Ratios- foldchanges in particular- are often nice to work with due to biological interpretability. However, if it's potentially dangerous to correlate them in cases where all sample groups of interest were compared against a single set of control samples- even if we're working with foldchange estimates from DESeq2- then I'll add that to my mental list of things not to do in bioinformatics.

Sorry for the lack of a data-based example, but I figured this is a mostly theoretical question.

Thanks.

deseq2 correlation • 247 views
modified 7 months ago • written 8 months ago by Cornwell, Adam110
Answer: Are spurious correlations possible in correlating two DESeq2 foldchanges compute
2
8 months ago by
Michael Love24k
United States
Michael Love24k wrote:

Yes, under the null of no differences among A, B, and C, the standard MLE for the log2 of C vs A and B vs A will be positively correlated. I  think an LFC shrinkage method will reduce this correlation some but not entirely, because LFCs consistent with 0 for both comparisons will move closer to the origin in this plot, but I think that there will still be some positive correlation under the null. I wouldn't report a correlation here, nor a correlation test p-value, as the dependence is baked in.

One thing I'll note: I wouldn't have a problem making a scatter plot with only those LFCs that have a low FDR in both groups. Under the null you should get none of these LFC pairs. Given that there is a significant difference between say, C and A, for some gene, seeing if B happens to be on the same side of A as C, or on the other side (LFC sign change) is interesting.

Answer: Are spurious correlations possible in correlating two DESeq2 foldchanges compute
2
8 months ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.3k wrote:

The limma package has a function called genas that is meant to solve exactly this problem: estimating the degree of genuine correlation between fold changes that are expected to be correlated under the null hypothesis as a result of using a common reference. You should check into that function and the associated references that explain the method.

That's very cool, except the function references mainly Belinda Phipson's PhD dissertation, and the library link provided in the genas help says "This item is currently not available from this repository" (emphasis original). The Majewski et al article does not really explain the method, nor does the Ritchie et al.

Hmm, it's been a while since I've actually chased down these references. Perhaps one of the limma authors more familiar with it can help?