limma-voom or edgeR for paired designs both within and between comparisons
2
1
Entering edit mode
aec ▴ 90
@aec-9409
Last seen 4.5 years ago

Dear all,

I am wondering what is the better approach either edgeR or limma-voom for this paired design:

 
condition
individual
1
control
patient1
2
control
patient2
3
control
patient3
4
control
patient4
5
control
patient5
6
tumor1
patient6
7
tumor2
patient6
8
tumor1
patient7
9
tumor2
patient7
10
tumor1
patient8
11
tumor2
patient8
12
tumor1
patient9
13
tumor2
patient9
14
tumor1
patient10
15
tumor2
patient10

I need the following comparisons:

a) control vs tumor1 (not paired)

b) control vs tumor2 (not paired)

c) tumor1 vs tumor2 (paired)

Not sure if to apply ~condition design for a) and b)  and ~individual+condition design for c) with edgeR

or directly apply limma-voom duplicateCorrelation function and blocking by individual (avoiding different designs/test for each question).

 

Thanks,

 

edger limma-voom paired samples • 2.9k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 3 hours ago
The city by the bay

Unpaired control versus tumour comparisons will require limma with duplicateCorrelationedgeR can't handle that type of analysis, and it's better to stick with one design wherever possible.

ADD COMMENT
0
Entering edit mode

Thanks a lot Aaron, I go for limma-voom then.

ADD REPLY
0
Entering edit mode
aec ▴ 90
@aec-9409
Last seen 4.5 years ago

Dear Aaron I got no DE genes for any of the comparisons. Would you be so kind to check my code?

y=DGEList(counts=counts)
A<-rowSums(y$counts)
isexpr<-A>40
y=y[isexpr,keep.lib.size=FALSE]
y=calcNormFactors(y)

condition=factor(info$condition)
design=model.matrix(~0+condition)
v=voom(y,design)
cor=duplicateCorrelation(v,design,block=info$individual)
fit=lmFit(v,design,block=info$individual, correlation=cor$consensus)
cm=makeContrasts(
CONTROLvsTUMOR1= conditiontumor1-conditioncontrol,
CONTROLvsTUMOR2= conditiontumor2-conditioncontrol,
TUMOR1vsTUMOR2= conditiontumor2-conditiontumor1,
levels=design)
fit2=contrasts.fit(fit,cm)
fit2=eBayes(fit2)

summary(decideTests(fit2))

 CONTROLvsTUMOR1 CONTROLvsTUMOR2 TUMOR1vsTUMOR2
-1                 0                   0                    0
0              18420               18420                18420
1                  0                   0                    0

 

ADD COMMENT
1
Entering edit mode

I see that you're persisting with filtering on row sums rather than row means. As I told you in a previous post, this approach is not robust to changes in the number of samples in your experimental design. You'll have to remember to change the threshold every time you copy-and-paste this code to use with a different data set. Well, whatever; it's your analysis.

The rest of the code looks fine to me. If you don't get any DE genes... maybe it's because there aren't any DE genes in your experiment (or at least, none with log-fold changes large enough to rise above the high level of noise between individuals). You could try using voomWithQualityWeights instead of voom, to downweight aberrant samples that might be inflating the variance and reducing power. However, this usually doesn't have a massive impact on the DE results.

If all else fails, I would suggest accepting a higher FDR threshold (say, 10%) to look for DE genes. This should give you a set that is less reliable than the one that you might have identified at 5%, as the error rate is doubled in the former. However, with luck, it will be a non-empty set, which means that there will at least be something for the experimentalists to follow up on.

Finally, post additional information or remarks via the "Add comment" button at the bottom of an existing answer/comment, rather than as a new answer (unless you are, in fact, answering your own question).

ADD REPLY
0
Entering edit mode

Aaron,

You are right I am changing the filtering every time depending on the experiment but I just follow the limma documentation, they use row sums for all the examples.

From a PCA plot it is very difficult to see outliers in my case so I do not know which samples are inflating the variance.

I have already changed different FDR cut-offs but I got none. All the genes have Padj close to 1.

Sorry for not using 'Add comment" before from now on I will use it.

ADD REPLY
0
Entering edit mode

I've just run it with 'voomWithQualityWeights' but also I got no significant results.

ADD REPLY
0
Entering edit mode

You wrote:

"I just follow the limma documentation, they use row sums for all the examples"

but actually in all the limma documentation there is only one example (Section 18.1 of the User's Guide) where filtering on row sums is used. In all other examples, for example Section 18.2 of the User's Guide or the workflow article:

   http://f1000research.com/articles/5-1408

filtering on cpm is used. You might also like to read the edgeR workflow article that has some discussion of filtering:

  http://f1000research.com/articles/5-1438

Filtering considerations are the same whether for edgeR or limma.

ADD REPLY
0
Entering edit mode

To do a proper analysis, you should also examine the various quality control plots described in the workflow papers linked to below.

In my experience, when people unexpectedly don't find any DE genes, it is often because there are serious data quality issues. Such issues can usually be identified by plotting the data and sometimes the analysis can be modified to accommodate them.

ADD REPLY
0
Entering edit mode

I examined several quality issues/plots and nothing strange appears. If I run edgeR with paried design for the within patients comparison and no paired design for the between patients I do obtain some DE genes after FDR correction (very few but some). Checking their biological functions make sense to me. But now I am confused because I know it is not the most ideal approach. I perform a different design and comparison every time with edgeR, whereas with limma-voom duplicateCorrelation I calculate eveything within one framework.

ADD REPLY
0
Entering edit mode

The LRT method in edgeR is known to be slightly liberal (see the voom paper) so it will detect a few more genes than limma. The more appropriate comparison is against the QL methods, i.e., glmQLFit and glmQLFTest. Even then, the comparison between limma and edgeR is not entirely fair for the within-patient contrast, because the use of duplicateCorrelation involves some additional assumptions compared to simply blocking on patient in the design matrix. Similarly, for the between-patient contrast, failing to model the correlations between patient samples with edgeR will lead to some liberalness and more detected DE genes. In short, I don't think there's anything here that suggests that limma is doing something wrong, especially if the number of additional DE genes with edgeR is low.

ADD REPLY
0
Entering edit mode

I understand what you mean but at least I would expect that the DE genes in edgeR have Pv close to significance in limma-voom duplicateCorrelation but it is not the case. With limma-voom all corrected Pv are close to 1.

ADD REPLY
0
Entering edit mode

You can't compare p-values to adjusted p-values as if they are the same thing. Adjusted P-values (i.e. FDR values) have a very different interpretation.

ADD REPLY
0
Entering edit mode

I meant adjusted Pv in both cases. With duplicateCorrelation all genes have adjusted Pv close to 1, whereas for edgeR approach I got significant adjusted Pv (<0.05)

ADD REPLY
0
Entering edit mode

And these genes with low p-values may well be false positives from edgeR, for the reasons I mentioned above. The fact that limma detects no DE genes, and edgeR detects very few genes, indicates that you don't have much power to detect anything that might be there. You can have a look at the results from edgeR if you like, but you should take them with a grain of salt given that we know it's liberal and limma detects nothing at all.

ADD REPLY

Login before adding your answer.

Traffic: 414 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6