FDR vs Permutation approach in edgeR
2
0
Entering edit mode
Son Pham ▴ 60
@son-pham-6437
Last seen 9.9 years ago
United States

I'm doing a pairwise comparison (classic two groups comparison in EdgeR) and wonder if I should do an additional permutation test? It means, randomly permute the labels in these two groups and find DE genes, and thus, aim to control the FDR. 

I know in EdgeR package, we use Benjamini Hochberg for calculating the FDR so probably the permutation is not needed? Anyone could explain relation of doing permutations for FDR vs. using Benjamini-Hochberg algorithm?

 

edger • 3.8k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 17 hours ago
The city by the bay

There's no need to do any permutations for FDR control in edgeR. In fact, it seems like a considerable amount of effort would be required to implement a statistically rigorous permutation procedure. For example, do you permute samples before or after empirial Bayes shrinkage? How do you consolidate the estimated number of false discoveries across different permutations? Does the ratio of the estimated number of false positives to the observed number of total discoveries make any statistical sense, with respect to the original (i.e., Benjamini-Hochberg) definition of the FDR?

The BH method should be more than adequate for your analysis. I'd advise just sticking with it, given that it's already implemented in topTable via p.adjust; it's simpler to use than any permutation method; and it's (heavily) tried and tested.

ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 9 minutes ago
WEHI, Melbourne, Australia

The Benjamini and Hochberg algorithm takes p-values as input whereas permutation is intended to generate p-values. The two approaches are quite different and cannot be usefully combined.

As Aaron has said, applying permutation to RNA-seq data in a useful way is quite difficult. Doing it in the obvious way -- simply permuting the labels, getting permutation p-values for each gene and then applying multiple testing -- will get you nowhere.

The best attempt to use permutation for RNA-seq data is probably the PoissonSeq package, which is available from CRAN but not from Bioconductor. I am not convinced, however, that permutation can be entirely reliable for RNA-seq data. Permutation assumes that all the samples are exchangeable (a priori equivalent), which cannot be true for RNA-seq libraries because they are sequenced to different depths. PoissonSeq was found to be somewhat anti-conservative in the following study:

 http://genomebiology.com/2014/15/2/R29

If you want to read more, see Section 2.11 "Parametric modelling versus permutation methods" of the following article:

  http://www.statsci.org/smyth/pubs/limmaPreprint.pdf

This article refers to the limma package, but the same comments apply to edgeR.

ADD COMMENT

Login before adding your answer.

Traffic: 752 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6