GLM Likelihood ratio test
1
1
Entering edit mode
ygowtha • 0
@ygowtha-7405
Last seen 5.4 years ago
United States

Hello everyone

I am Yogender gowtham and I am a graduate student from Clemson University (USA). I am working on understanding the transcriptome of Chinese hamster ovary (CHO) cells using RNA-seq. To perform differential gene expression analysis between my treatments, I am using edgeR software package.

Since I have multiple factors in my treatments, I am using GLM to perform differential gene expression analysis. However I have few questions related to the likelihood ratio test and the likelihood ratio in the output.

1) From what I understand, the likelihood ratio tests compares two models - a full model and a reduced model to see which one fits better. Could someone please tell me what would be the null and alternate hypothesis for this test?

2) Below is a typical output from my pairwise comparison.

 logFC logCPM LR PValue FDR gene23551 3.680215 7.995512 4263.779 0 0

It looks like greater the likelihood ratio, lower is the p-value. Is the LR value a estimate from likelihood ratio test? What exactly is it and how does it relate to p-value?

Thank you

rnaseq edgeR GLM • 4.4k views
2
Entering edit mode

To answer your first question, we need to know what design matrix you've used, and what parameters you used to run glmLRT (namely, what you specified in the coef or contrast arguments).

0
Entering edit mode

Thank you for your reply Aaron. I have five different conditions with two factors: Cell lines (DP12 and DH) and serum level (0, 5 and 10).

The design matrix I used is:

design <- model.matrix(~0+group)
rownames(design) <- samples$file design colnames(design) #[1] "groupDH_10" "groupDH_5" "groupDP12_0" "groupDP12_10" "groupDP12_5" This was used to call the contrasts: designcontrasts<- makeContrasts( DH.10v5 = groupDH_5 - groupDH_10, DP.10v5 = groupDP12_5- groupDP12_10, DH.5vDP5 = groupDP12_5 - groupDH_5, DH.10vDP10 = groupDP12_10 - groupDH_10, DP.10v0 = groupDP12_0 - groupDP12_10, DP.5v0 = groupDP12_0 - groupDP12_5, levels=design) Please let me know if you need more details. Thanks ADD REPLY 0 Entering edit mode Two things: first. you need to show how you called glmLRT in addition to the above code. Second, Based on the names of your contrasts, I'm guessing you probably don't want to test all of them at once, but rather you want to test them individually one at a time, which would require six separate calls to glmLRT ADD REPLY 0 Entering edit mode That is correct. I did six separate calls to glmLRT. For examples: # glmLRT #DHFR 5% vs 10% lrt<- glmLRT(glmfit, contrast = designcontrasts[, "DH.10v5"]) # toptags result <- topTags(lrt, n = Inf, sort.by="PValue") head(result) de <- rownames(result[result$table$FDR<0.01,]) head(result$table)
sum(result$table$FDR<=0.01)#4792
write.table(result\$table, file = "DHFR 5vs10_new.txt", sep="\t")

Could you please explain in detail what the coefficients actually are, with respect to the above example of contrast between DHFR_5 - DHFR_10. Please let me know if you need more details with regards to code.

Thanks

2
Entering edit mode

Each coefficient of your design matrix represents the log-average count for one of your groups. You can figure which ones are which fairly easily from the column names of the matrix.

Now, for the comparison above, the contrast vector of designcontrasts[, "DH.10v5"] probably looks like c(-1, 1, 0, 0, 0). Each entry of the vector corresponds to a coefficient of the design matrix. The specified contrast is effectively saying:

-1 * groupDH_10 + 1 * groupDH_5 + 0 * groupDP12_0 + 0 * groupDP12_10 + 0 * groupDP12_5 = 0

That's your null hypothesis, i.e., that DH_10 is equal to DH_5. Then, glmLRT internally refactors design into an alternative formulation so that, upon dropping the last coefficient of this alternative model, the reduced model is formed in which your null hypothesis is true. The dropped coefficient represents the log-fold change between DH_10 and DH_5.

0
Entering edit mode

Hello Ryan and Aaron

I was reviewing the comments and I noticed your comment on calling the contrast at once. Could you please tell how I could do it all at once, using the above example as reference? And would having six separate calls to glmLRT affect the statistical power?

The way I have been calling them is below:

# DH 10% vs 5%
lrt<- glmLRT(glmfit, contrast = designcontrasts[, "DH.10v5"])
# toptags
result <- topTags(lrt, n = Inf, sort.by="PValue")

#DP12 5% vs 10%
lrt2<- glmLRT(glmfit, contrast = designcontrasts[, "DP.10v5"])
# toptags
result2 <- topTags(lrt2, n = Inf, sort.by="PValue")

#DP12 5% vs 0%
lrt5<- glmLRT(glmfit, contrast = designcontrasts[, "DP.5v0"])
# toptags
result5 <- topTags(lrt5, n = Inf, sort.by="PValue")
1
Entering edit mode

To drop them all at once:

lrt.all <- glmLRT(glmfit, contrast = designcontrasts)

This does an ANOVA-like comparison, where the "global" null hypothesis is that there are no differences between any of the groups. While this will generally detect more DE genes than running them as separate pairwise contrasts - subtle changes across multiple contrasts provide more evidence against the global null, and there's less damage from the multiple testing correction when you just do a single ANOVA compared to 6 separate contrasts - it's not really a fair comparison, because the null hypotheses are different between the two approaches.

Long story short; if you want to find DE between any of your groups, use the ANOVA. If you want to find DE between two specific groups, use the appropriate pairwise comparison. The ANOVA isn't the be-all and end-all of DE detection, as it won't tell you which of the groups are involved in the DE for each gene, just that there is DE "somewhere" in the data. (Technically, you can dig out the differences with post hoc tests, but that's not supported in edgeR. Besides, if you're going to do that, you might as well do the pairwise comparisons to start with.)

4
Entering edit mode
@gordon-smyth
Last seen 32 minutes ago
WEHI, Melbourne, Australia

1) You have tested the hypothesis that one of the coefficients of the linear model is zero vs the alternative that it is not zero. So the full model is the glm you have fitted and the reduced model is the same model but with this coefficient set to zero. In other words, the reduced (null) model has one fewer column in the design matrix than the full (alternative) model.

2) LR is the minus twice the log-likelihood ratio, a quantity that is asymptotically chi-square under the null hypothesis. The p-value is the tail probability of the chi-square statistic, so the larger the LR the smaller the p-value.

0
Entering edit mode

Thank you for your reply Gordon. Could you please also explain which coefficient of the linear model is being test? Is it the normalized counts or the log fold changes?

0
Entering edit mode

The coefficient being tested is whichever coefficient you told it to test. We can't answer that unless you show the code you used.