Question

FDR and mixed FDR in limma fry()

0

Entering edit mode

Regeroka • 0

@regeroka-20875

Last seen 4.9 years ago

Hi,

I'm analyzing a large number of gene sets in RNAseq data, hoping to find sets that can differentiate between 2 conditions. I'm using fry(), from the limma package.

I understand that there are 3 different hypotheses tested for each gene set (upregulated, downregulated, and the last one to my understanding is that the genes are not equally expressed in the two conditions = there is a mix of genes, which are either up or downregulated). I hope my understanding is correct there.

Out of thousands of genesets, one has an FDR of 0.02, for which the FDR.Mixed is rather high, and the rest are all above 0.25. About 10 has an FDR.mixed bellow 0.01, and for some of those, the two sided/normal FDR is really high (almost 1). For visualizing the results, I tried creating heatmaps with the mean pattern expressions of the significant sets, and the separation is not as good as I'd like.

My questions are:

What is a sensible choice of FDR cutoff? Should I consider both FDR and FDR.Mixed?
How to interpret the two sided FDR and the mixed FDR? (How to interpret/what does it mean that the two sided FDR is low, yet the mixed FDR is high, and also the other way around?)
Does it make sense to take a closer look at sets with a significant mixed FDR, and split them futher depending on the direction of DE (up, down, not DEd)?

Thank you for your help in advance!

limma FDR • 1.8k views

ADD COMMENT • link updated 4.9 years ago by Gordon Smyth 50k • written 4.9 years ago by Regeroka • 0

0

Entering edit mode

Was already asked on BioStars, without answer: https://www.biostars.org/p/382472/

ADD REPLY • link 4.9 years ago Kevin Blighe ★ 3.9k

score 1 · Accepted Answer · 2019-06-02

Yes, your understanding of the up, down and mixed hypotheses seems to be correct.

If all (or almost all) of the genes in a set should change in the same direction, then you should use the two-sided directional p-value and FDR. This would apply for example if you are considering inflamatory genes, and all the genes increase in expression when an inflamatory immune response is in progress.

Alternatively, if you know the direction of change for each gene, then you should input the directions to fry using gene.weights and, again, you should use the two-sided directional p-value and FDR. For example, if the gene set was obtained from a differential expression analysis of a previous dataset, then you will always know the direction and magnitude of change for each gene.

If the genes in the set are both positively and negatively associated with the biological process, and you don't know which direction corresponds to each gene, then you would need to use the mixed p-values and FDRs. I don't use the the mixed FDRs myself as often as directional p-values because they are hard to interpret. If a set is significant in a mixed sense, then you essentially need to examine DE results for individual genes to see what is happening.