Question

limma-voom or edgeR for paired designs both within and between comparisons

1

Entering edit mode

aec ▴ 90

@aec-9409

Last seen 3.9 years ago

Dear all,

I am wondering what is the better approach either edgeR or limma-voom for this paired design:

individual
patient1
patient2
patient3
patient4
patient5
patient6
patient6
patient7
patient7
patient8
patient8
patient9
patient9
patient10
patient10

I need the following comparisons:

a) control vs tumor1 (not paired)

b) control vs tumor2 (not paired)

c) tumor1 vs tumor2 (paired)

Not sure if to apply ~condition design for a) and b) and ~individual+condition design for c) with edgeR

or directly apply limma-voom duplicateCorrelation function and blocking by individual (avoiding different designs/test for each question).

Thanks,

edger limma-voom paired samples • 2.6k views

ADD COMMENT • link 7.6 years ago aec ▴ 90

score 1 · Answer 1 · 2016-09-20

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 1 hour ago

The city by the bay

Unpaired control versus tumour comparisons will require limma with duplicateCorrelation. edgeR can't handle that type of analysis, and it's better to stick with one design wherever possible.

ADD COMMENT • link 7.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks a lot Aaron, I go for limma-voom then.

ADD REPLY • link 7.6 years ago aec ▴ 90

score 0 · Answer 2 · 2016-09-22

0

Entering edit mode

aec ▴ 90

@aec-9409

Last seen 3.9 years ago

Dear Aaron I got no DE genes for any of the comparisons. Would you be so kind to check my code?

y=DGEList(counts=counts)
A<-rowSums(y$counts)
isexpr<-A>40
y=y[isexpr,keep.lib.size=FALSE]
y=calcNormFactors(y)

condition=factor(info$condition)
design=model.matrix(~0+condition)
v=voom(y,design)
cor=duplicateCorrelation(v,design,block=info$individual)
fit=lmFit(v,design,block=info$individual, correlation=cor$consensus)
cm=makeContrasts(
CONTROLvsTUMOR1= conditiontumor1-conditioncontrol,
CONTROLvsTUMOR2= conditiontumor2-conditioncontrol,
TUMOR1vsTUMOR2= conditiontumor2-conditiontumor1,
levels=design)
fit2=contrasts.fit(fit,cm)
fit2=eBayes(fit2)

summary(decideTests(fit2))

 CONTROLvsTUMOR1 CONTROLvsTUMOR2 TUMOR1vsTUMOR2
-1                 0                   0                    0
0              18420               18420                18420
1                  0                   0                    0

ADD COMMENT • link 7.6 years ago aec ▴ 90

1

Entering edit mode

I see that you're persisting with filtering on row sums rather than row means. As I told you in a previous post, this approach is not robust to changes in the number of samples in your experimental design. You'll have to remember to change the threshold every time you copy-and-paste this code to use with a different data set. Well, whatever; it's your analysis.

The rest of the code looks fine to me. If you don't get any DE genes... maybe it's because there aren't any DE genes in your experiment (or at least, none with log-fold changes large enough to rise above the high level of noise between individuals). You could try using voomWithQualityWeights instead of voom, to downweight aberrant samples that might be inflating the variance and reducing power. However, this usually doesn't have a massive impact on the DE results.

If all else fails, I would suggest accepting a higher FDR threshold (say, 10%) to look for DE genes. This should give you a set that is less reliable than the one that you might have identified at 5%, as the error rate is doubled in the former. However, with luck, it will be a non-empty set, which means that there will at least be something for the experimentalists to follow up on.

Finally, post additional information or remarks via the "Add comment" button at the bottom of an existing answer/comment, rather than as a new answer (unless you are, in fact, answering your own question).

ADD REPLY • link 7.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

Aaron,

You are right I am changing the filtering every time depending on the experiment but I just follow the limma documentation, they use row sums for all the examples.

From a PCA plot it is very difficult to see outliers in my case so I do not know which samples are inflating the variance.

I have already changed different FDR cut-offs but I got none. All the genes have Padj close to 1.

Sorry for not using 'Add comment" before from now on I will use it.

ADD REPLY • link 7.6 years ago aec ▴ 90

0

Entering edit mode

I've just run it with 'voomWithQualityWeights' but also I got no significant results.

ADD REPLY • link 7.6 years ago aec ▴ 90

0

Entering edit mode

You wrote:

"I just follow the limma documentation, they use row sums for all the examples"

but actually in all the limma documentation there is only one example (Section 18.1 of the User's Guide) where filtering on row sums is used. In all other examples, for example Section 18.2 of the User's Guide or the workflow article:

http://f1000research.com/articles/5-1408

filtering on cpm is used. You might also like to read the edgeR workflow article that has some discussion of filtering:

http://f1000research.com/articles/5-1438

Filtering considerations are the same whether for edgeR or limma.

ADD REPLY • link 7.6 years ago Gordon Smyth 50k

0

Entering edit mode

To do a proper analysis, you should also examine the various quality control plots described in the workflow papers linked to below.

In my experience, when people unexpectedly don't find any DE genes, it is often because there are serious data quality issues. Such issues can usually be identified by plotting the data and sometimes the analysis can be modified to accommodate them.

ADD REPLY • link 7.6 years ago Gordon Smyth 50k

0

Entering edit mode

I examined several quality issues/plots and nothing strange appears. If I run edgeR with paried design for the within patients comparison and no paired design for the between patients I do obtain some DE genes after FDR correction (very few but some). Checking their biological functions make sense to me. But now I am confused because I know it is not the most ideal approach. I perform a different design and comparison every time with edgeR, whereas with limma-voom duplicateCorrelation I calculate eveything within one framework.

ADD REPLY • link 7.6 years ago aec ▴ 90

0

Entering edit mode

The LRT method in edgeR is known to be slightly liberal (see the voom paper) so it will detect a few more genes than limma. The more appropriate comparison is against the QL methods, i.e., glmQLFit and glmQLFTest. Even then, the comparison between limma and edgeR is not entirely fair for the within-patient contrast, because the use of duplicateCorrelation involves some additional assumptions compared to simply blocking on patient in the design matrix. Similarly, for the between-patient contrast, failing to model the correlations between patient samples with edgeR will lead to some liberalness and more detected DE genes. In short, I don't think there's anything here that suggests that limma is doing something wrong, especially if the number of additional DE genes with edgeR is low.

ADD REPLY • link 7.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

I understand what you mean but at least I would expect that the DE genes in edgeR have Pv close to significance in limma-voom duplicateCorrelation but it is not the case. With limma-voom all corrected Pv are close to 1.

ADD REPLY • link 7.6 years ago aec ▴ 90

0

Entering edit mode

You can't compare p-values to adjusted p-values as if they are the same thing. Adjusted P-values (i.e. FDR values) have a very different interpretation.

ADD REPLY • link 7.6 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

I meant adjusted Pv in both cases. With duplicateCorrelation all genes have adjusted Pv close to 1, whereas for edgeR approach I got significant adjusted Pv (<0.05)

ADD REPLY • link 7.6 years ago aec ▴ 90

0

Entering edit mode

And these genes with low p-values may well be false positives from edgeR, for the reasons I mentioned above. The fact that limma detects no DE genes, and edgeR detects very few genes, indicates that you don't have much power to detect anything that might be there. You can have a look at the results from edgeR if you like, but you should take them with a grain of salt given that we know it's liberal and limma detects nothing at all.

ADD REPLY • link 7.6 years ago Aaron Lun ★ 28k

	condition	individual
1	control	patient1
2	control	patient2
3	control	patient3
4	control	patient4
5	control	patient5
6	tumor1	patient6
7	tumor2	patient6
8	tumor1	patient7
9	tumor2	patient7
10	tumor1	patient8
11	tumor2	patient8
12	tumor1	patient9
13	tumor2	patient9
14	tumor1	patient10
15	tumor2	patient10