Question

FDR vs Permutation approach in edgeR

0

Entering edit mode

Son Pham ▴ 60

@son-pham-6437

Last seen 9.3 years ago

United States

I'm doing a pairwise comparison (classic two groups comparison in EdgeR) and wonder if I should do an additional permutation test? It means, randomly permute the labels in these two groups and find DE genes, and thus, aim to control the FDR.

I know in EdgeR package, we use Benjamini Hochberg for calculating the FDR so probably the permutation is not needed? Anyone could explain relation of doing permutations for FDR vs. using Benjamini-Hochberg algorithm?

edger • 3.3k views

ADD COMMENT • link updated 9.3 years ago by Gordon Smyth 50k • written 9.3 years ago by Son Pham ▴ 60

score 1 · Answer 1 · 2014-12-31

There's no need to do any permutations for FDR control in edgeR. In fact, it seems like a considerable amount of effort would be required to implement a statistically rigorous permutation procedure. For example, do you permute samples before or after empirial Bayes shrinkage? How do you consolidate the estimated number of false discoveries across different permutations? Does the ratio of the estimated number of false positives to the observed number of total discoveries make any statistical sense, with respect to the original (i.e., Benjamini-Hochberg) definition of the FDR?

The BH method should be more than adequate for your analysis. I'd advise just sticking with it, given that it's already implemented in topTable via p.adjust; it's simpler to use than any permutation method; and it's (heavily) tried and tested.

score 0 · Answer 2 · 2014-12-31

The Benjamini and Hochberg algorithm takes p-values as input whereas permutation is intended to generate p-values. The two approaches are quite different and cannot be usefully combined.

As Aaron has said, applying permutation to RNA-seq data in a useful way is quite difficult. Doing it in the obvious way -- simply permuting the labels, getting permutation p-values for each gene and then applying multiple testing -- will get you nowhere.

The best attempt to use permutation for RNA-seq data is probably the PoissonSeq package, which is available from CRAN but not from Bioconductor. I am not convinced, however, that permutation can be entirely reliable for RNA-seq data. Permutation assumes that all the samples are exchangeable (a priori equivalent), which cannot be true for RNA-seq libraries because they are sequenced to different depths. PoissonSeq was found to be somewhat anti-conservative in the following study:

http://genomebiology.com/2014/15/2/R29

If you want to read more, see Section 2.11 "Parametric modelling versus permutation methods" of the following article:

http://www.statsci.org/smyth/pubs/limmaPreprint.pdf

This article refers to the limma package, but the same comments apply to edgeR.