Question

understanding edgeR glmLRT

0

Entering edit mode

arubio • 0

@arubio-12877

Last seen 4.3 years ago

Hi, I have some transcriptomics data from a microorganism collected at different locations. I've been trying to perform an LRT using edgeR to test the location effect on transcription. What I want is to make a full model with all the coefficients, and compare it with a null model in which location plays no role. I thought I had the correct formulas but after the test absolutely all the features/tags are significant (with absurdly small FDR), so I must be doing something wrong. A mockup of my code is as follows:

x <- normalized_counts
site <- factor(c("a","a","b","b","c","c","d","d","e","e"))
design <- model.matrix(~ 0 + site)

y <- DGEList(counts=x,group=site)
y <- calcNormFactors(y)
y <- estimateDisp(y,design)
fit <- glmFit(y,design)
lrt <- glmLRT(fit, coef=c(1:ncol(fit$design)))

In case anyone knows the sleuth package, I previously used the lrt implemented there with the following configuration:

so <- sleuth_prep(s2c
                  , ~ site
                  , target_mapping = t2g
                  , aggregation_column = "ens_gene"
                  , transformation_function = function(x) log2(x + 0.5)
)
so <- sleuth_fit(so)
so <- sleuth_fit(so
                 , ~1
                 , 'reduced')
so <- sleuth_lrt(so, 'reduced', 'full')

And wanted to make a similar test using edgeR. I would greately appreciate any insights.

edger lrt • 3.9k views

ADD COMMENT • link updated 4.3 years ago by Gordon Smyth 50k • written 4.3 years ago by arubio • 0

score 2 · Answer 1 · 2019-12-18

Your model matrix is computing the mean of each group, and then the call to glmLRT is doing an ANOVA-like test to see if any of the coefficients is different from zero. Which by definition will be true for most genes (e.g., the mean expression for most of the genes will be arguably different from zero). But that's not what you care about, but instead you care about differences between groups, in particular if there are differences in expression due to the location.

The model in sleuth appears to be fitting a factor effects model, where there is a baseline and all the other coefficients are differences between a given sample type and the baseline, and then the test is a LRT between the full and reduced model.

If you want to fit a cell means model in edgeR, you then have to define a contrast (or contrasts) between different coefficients that you want to test. This is all covered in the edgeR User's Guide, and if you haven't read that (like probably several times), you shouldn't be trying to do stuff with edgeR.

score 0 · Answer 2 · 2019-12-18

Following on from James' answer, you actually need this:

site <- factor(c("a","a","b","b","c","c","d","d","e","e"))
design <- model.matrix(~site)

y <- DGEList(counts=x,group=site)
i <- filterByExpr(y)
y <- y[i, , keep.lib.size=FALSE]
y <- calcNormFactors(y)
y <- estimateDisp(y,design)
fit <- glmFit(y,design)
lrt <- glmLRT(fit, coef=2:4)

but note that edgeR assumes x to be raw counts, not "normalized counts" (whatever they are!).