Question: edgeR : How to test differential exon expression on case/control study across all time points when accounting for the within subject variability?
0
2.2 years ago by
heikki.sarin0 wrote:

Hi,

So I'm trying to test differential exon usage on case/control study across all time points when accounting for the within subject variability. I know the full and reduced models between which to test with but I'm a bit unsure how to apply it to edgeR.

Models:

Full: ~condition + time + condition:subject.nested + condition:time

Reduced: ~condition + time + condition:subject.nested

So far I made the 1) DGEList, 2) specified the full model to model.matrix 3) calculated calcNormFactors(dge), and estimated 4) estimateDisp(dge, design, robust=TRUE)

But now with:

fit <- glmFit(dge,design)
fit <- glmLRT(fit, design, coef = 35:36)

the the testing I'm a bit unsure how define correctly the LRT test. My goal is to set test so it would test if there's difference between the full and reduced model. If a make the both model.matrixes they are otherwise similar but the two last coefs 35 and 36 are missing from the reduced model.matrix. If I want to test what I described earlier, is it correct if I assign the to last coefs to glmLRT?

Really would appreciate the help.

modified 2.2 years ago • written 2.2 years ago by heikki.sarin0
Answer: edgeR : How to test differential exon usage on case/control study across all tim
4
2.2 years ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:

Yes, specifying coef=35:36 will test whether coefficients 35 and 36 are zero. There is no need to specify the null design matrix, because edgeR does that for you automatically.

However glmLRT() tests for differential expression, not for differential usage. If you really want to test for differential exon usage (i.e., looking for splice-variants etc), then you should be using diffSpliceDGE(). See Section 2.16 of the User's Guide, or the Pasilla Case Study, or just type ?diffSpliceDGE.

Thank you for your answer. Couple of follow-up notions/questions regarding the issue.

1) So, edgeR kind of "drops" the designated coefs (eg. 35:36 in this case) from the full model.matrix and makes the LRTtest against  that and the full model set in design? Just trying to understand the principles how edgeR works.

2) For some reason with the : fit <- glmLRT(fit, design, coef = 35:36), I get the error message. Error in glmfit$coefficients %*% contrast : non-conformable arguments. Have no idea why is that. What might be reason and how to solve it? 3) Is it possible to use glmQLFit and glmQLFtest for this kind of complex experimental design? If is, how should one specify coefs/contrast so that we could answer to the same questions that we're getting an answer from LRTtest (with the coefs set to 35:36)? Just wondering is this an option if we'd like to choose more conservative test. 4) Mixed up my words earlier. Indeed we're conducting a differential expression analysis for exon-level data now and later after the this we're looking into the differential exon usage analysis with diffSpliceDGE. Appreciate the insight as the edgeR manual/vignette is not in all parts totally self-explanatory. ADD REPLYlink modified 2.2 years ago by Gordon Smyth38k • written 2.2 years ago by heikki.sarin0 1 1) Yes, edgeR conducts the LRT for the designated coefs. LRTs are a well established technique in statistics. Yes, it refits the model without the designated coefficients and compares the likelihoods between the two models. You can think of the LRT as comparing a null design matrix with a full design matrix if you like, and the computation is in fact done that way. But, in the end, the LRT is just a way of testing the null hypothesis that the true coefficients[35:36] are zero. 2) I'm almost 100% sure that it is impossible to get the error that you show from the code you say that you've used. Could it be that you actually called glmLRT() with contrast=35:36 instead of coef=35:36? That would indeed lead to the error you see. 3) Yes, glmQLFit and glmQLFTest can always be used and we do recommend it. The syntax is virtually the same as for glmFit and glmLRT. Just read one of the examples. If you need help with this, please ask a new question (rather than continuing this thread). 4) Could you please edit the title of your question so that it correctly reflects what you are seeking help with? What are you looking for in the edgeR User's Guide that you haven't found? You seem to have found the correct syntax without any problems. ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Gordon Smyth38k 1)Thank you for making that clear. The concept of LRT is very familiar to me but the main problem was to understand how does glmLRT function and the coef argument have to be set so that I get what I'm trying achieve. As contrast, I've been using also DESeq2 for gene-level testing and there one can specify the full and reduced model both in the function - so there's a little concept difference. 2) I'm 100% certain that I used coef=35:36 and for some reason now also even if I try coef=35 or a single coef it won't work. Also re-did the syntax and run it on R but still gives the same error. >dge <- DGEList(counts=countdata_merged, samples = coldata, group=coldata$condition)
> dge <- calcNormFactors(dge)

> #Lets calculate and plot dispersions
> dge <- estimateDisp(dge, design, robust=TRUE)
> fit <- glmFit(dge,design)
> fitx <- glmLRT(fit, design, coef=35:36)
Error in glmfit\$coefficients %*% contrast : non-conformable arguments

The design is set up like:

x <- model.matrix(~ condition + time + condition:subject.nested + condition:time, coldata)
all.zero <- apply(x, 2, function(x) all(x==0))
all.zero

idx <- which(all.zero)
full <- x[,-idx]
design <- full
design

Weirdly, when I set the same "dge" and "design" into the glmQLFit and glmQLFTest and run it with coef=35:36 there's no problem. And the error disappears and results seem to be reasonable and correct.

3) Been following the edgeR User's Guide on section 4.5 as we have also the exon dataset. However, the model we're using is far more complex (see above) than in the example and that is the main issue why I have trouble applying the example to our dataset and really understanding whether our syntax is correct.

So if use the same syntax in glmQLFTest as in glmLRT and set the coef=35:36 will it give the same answer to the question we're looking for as it would if did the analysis with glmLRT? Or do I have change the coef's for the glmQLFTest if I want to find an answer to the same question (as with glmLRT)? Just wondering that is it possible because glmLRT uses LRT as statistical test but glmQLFTest uses F-test. Can't really find an answer to this on the user guide.

4) Did adjust the title to better match the issue :)

User guide would be really self-explanatory if you have quite simple experiment model but if you're not an expert on statistics etc it's not quite so easy apply the examples more complex experiments.

As I said before, the syntax for the glmQL functions is identical to that for the glm functions. If you type help(glmQLFTest) you will see that it has the same 'coef' and 'contrast' arguments as glmLRT() with exactly the same meaning.

Regarding the error message, you are in effect claiming a bug in edgeR but we cannot reproduce the error you claim. To be be honest, I don't see how the error is possible: the code shown in the error message is only run if 'contrast' has been specified, which you say is not so. The fact that the behaviour seems to be changing for you from one time to the next, even though edgeR itself hasn't changed, does suggest that the problem is something to do with your own code or setup. If you still think that this is a problem with the edgeR package, then please provide a reproducible example that we can run to reproduce the error ourselves, and we will be happy to investigate.