Are spurious correlations possible in correlating two DESeq2 foldchanges computed using the same reference sample group?
3
0
Entering edit mode
@cornwell-adam-5680
Last seen 15 months ago
United States

I recently looked at a section of a paper (unpublished) in which the authors, wary of introducing spurious correlations when comparing two sets of DESeq2 foldchanges, ended up computing foldchanges for two sample groups against two different subsets of their control samples. The potential issue is summarized in Wikipedia. In short, the idea is that if there are gene expression sample sets A, B, and, C, that correlating the ratios A/C and B/C might show some relationship even if A and B are independent due to having been calculated with C as the denominator.

Embarrassingly, I hadn't considered this before, although correlating ratios is definitely something I have done. I could certainly see this as an issue with microarray data, where foldchanges are usually computed as the simple ratio of mean group expression. Upon reviewing the foldchange calculation method in DESeq2, it seems like it could also be a problem? Is this the case?

Ratios- foldchanges in particular- are often nice to work with due to biological interpretability. However, if it's potentially dangerous to correlate them in cases where all sample groups of interest were compared against a single set of control samples- even if we're working with foldchange estimates from DESeq2- then I'll add that to my mental list of things not to do in bioinformatics.

Sorry for the lack of a data-based example, but I figured this is a mostly theoretical question.

Thanks.

deseq2 correlation • 1.2k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 18 hours ago
United States

Yes, under the null of no differences among A, B, and C, the standard MLE for the log2 of C vs A and B vs A will be positively correlated. I  think an LFC shrinkage method will reduce this correlation some but not entirely, because LFCs consistent with 0 for both comparisons will move closer to the origin in this plot, but I think that there will still be some positive correlation under the null. I wouldn't report a correlation here, nor a correlation test p-value, as the dependence is baked in.

One thing I'll note: I wouldn't have a problem making a scatter plot with only those LFCs that have a low FDR in both groups. Under the null you should get none of these LFC pairs. Given that there is a significant difference between say, C and A, for some gene, seeing if B happens to be on the same side of A as C, or on the other side (LFC sign change) is interesting.

ADD COMMENT
2
Entering edit mode
@ryan-c-thompson-5618
Last seen 7 months ago
Scripps Research, La Jolla, CA

The limma package has a function called genas that is meant to solve exactly this problem: estimating the degree of genuine correlation between fold changes that are expected to be correlated under the null hypothesis as a result of using a common reference. You should check into that function and the associated references that explain the method.

ADD COMMENT
0
Entering edit mode

Cool, didn’t know about that.

ADD REPLY
0
Entering edit mode

That's very cool, except the function references mainly Belinda Phipson's PhD dissertation, and the library link provided in the genas help says "This item is currently not available from this repository" (emphasis original). The Majewski et al article does not really explain the method, nor does the Ritchie et al.

ADD REPLY
0
Entering edit mode

Hmm, it's been a while since I've actually chased down these references. Perhaps one of the limma authors more familiar with it can help?

ADD REPLY
0
Entering edit mode

Nice, I suppose that was added since the last time I went through the limma documentation since I haven't come across it before.

ADD REPLY

Login before adding your answer.

Traffic: 682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6