I recently looked at a section of a paper (unpublished) in which the authors, wary of introducing spurious correlations when comparing two sets of DESeq2 foldchanges, ended up computing foldchanges for two sample groups against two different subsets of their control samples. The potential issue is summarized in Wikipedia. In short, the idea is that if there are gene expression sample sets A, B, and, C, that correlating the ratios A/C and B/C might show some relationship even if A and B are independent due to having been calculated with C as the denominator.
Embarrassingly, I hadn't considered this before, although correlating ratios is definitely something I have done. I could certainly see this as an issue with microarray data, where foldchanges are usually computed as the simple ratio of mean group expression. Upon reviewing the foldchange calculation method in DESeq2, it seems like it could also be a problem? Is this the case?
Ratios- foldchanges in particular- are often nice to work with due to biological interpretability. However, if it's potentially dangerous to correlate them in cases where all sample groups of interest were compared against a single set of control samples- even if we're working with foldchange estimates from DESeq2- then I'll add that to my mental list of things not to do in bioinformatics.
Sorry for the lack of a data-based example, but I figured this is a mostly theoretical question.