Question: How to get t-statistic values using edgeR?
0
gravatar for Biologist
6 months ago by
Biologist70
Biologist70 wrote:

Recently, I have seen a post about fgsea for gene set enrichment analysis. In this DESeq results to pathways in 60 Seconds with the fgsea package. In this post they use stat values column from Deseq2 results to rank the genes for gene set enrichment analysis.

With edgeR I use the following code.

fit <- glmQLFit(y, design, robust=TRUE)
contrast.matrix <- makeContrasts(Tumor-Control, levels=design)
qlf <- glmQLFTest(fit, contrast=contrast.matrix)
topTags(qlf)

Coefficient:  -1*Control 1*Tumor 
          logFC   logCPM        F       PValue          FDR
Gene1   3.3152959 3.584387 319.9770 2.413285e-17 3.136064e-13
Gene2   1.2035429 9.099493 269.5468 2.402948e-16 1.270767e-12
Gene3    5.5160446 5.575741 690.8451 2.933667e-16 1.270767e-12
Gene4   3.8715029 2.612542 217.3808 4.068607e-15 1.321789e-11
Gene5   2.5809156 3.008769 186.4712 2.934439e-14 7.626608e-11
Gene6   1.0808460 7.195544 143.2828 7.951168e-13 1.722090e-09
Gene7   1.2751258 5.673212 131.3892 2.285582e-12 4.243019e-09
Gene8   2.5480277 2.417406 127.8459 3.178656e-12 5.163329e-09
Gene9   0.8964683 7.478979 121.9161 5.615874e-12 8.108698e-09
Gene10  1.4907664 3.428103 106.5157 2.754773e-11 3.579828e-08

As I use the glmQLFTest I have the Fstat in the results. But, I want to get t-statistic values column using edgeR to rank genes based on that column and use for Gene set enrichment analysis.

Ofcourse, I can use camera package for this, but I'm interested in ranking genes based on t-statistic and use for GSEA.

How to get the t-statistic using edgeR package?

edger R gsea fgsea t-statistic • 281 views
ADD COMMENTlink modified 6 months ago by Gordon Smyth39k • written 6 months ago by Biologist70
Answer: How to get t-statistic values using edgeR?
2
gravatar for Gordon Smyth
6 months ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

You need:

t.stat <- sign(qlf$table$logFC) * sqrt(qlf$table$F)
z <- zscoreT(t.stat, df=qlf$df.total)

The z-statistic should be input to fgsea rather than the t-statistic because the t-statistics are not all on the same df.

You won't be surprised though that I feel that camera would be better. I haven't tested fgsea but any pre-ranked gsea method ignores correlations between genes and therefore would have to give p-values that are too liberal.

ADD COMMENTlink modified 6 months ago • written 6 months ago by Gordon Smyth39k

Thanks Gordon. I also gave a try with camera now.

For camera Is the following a right way?

pathways.hallmark <- gmtPathways("h.all.v6.2.symbols.gmt")
cam <- camera(y, pathways.hallmark, design, contrast=contrast.matrix, inter.gene.cor=0.01)
head(cam,5)
                                           NGenes Direction  PValue     FDR
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION    130        Up 3.9e-08 2.0e-06
HALLMARK_MYOGENESIS                           116        Up 3.9e-07 9.9e-06
HALLMARK_TNFA_SIGNALING_VIA_NFKB              157        Up 2.5e-06 4.1e-05
HALLMARK_UNFOLDED_PROTEIN_RESPONSE            110      Down 1.4e-03 1.8e-02
HALLMARK_MYC_TARGETS_V2                        57      Down 3.3e-03 3.2e-02

If it is right, can you please tell me how I can get the genes (NGenes) which are enriched in those hallmark pathways. I want their names.

ADD REPLYlink written 6 months ago by Biologist70

Gordon may I know your answer for my previous comment. thanks

ADD REPLYlink written 6 months ago by Biologist70

As ?camera will tell you, Ngenes is the total number of genes in the set. Besides, there is no concept of explicit enrichment in camera. The method doesn't draw a line between differential and non-differential genes in the set, it's using the entire distribution of test statistics. You could, of course, draw this line yourself (e.g., taking all genes in the set that exhibit a significant difference in the specified contrast at a FDR of 5%), but that's not how camera is working internally.

ADD REPLYlink written 6 months ago by Aaron Lun25k
Answer: How to get t-statistic values using edgeR?
0
gravatar for alemutak
6 months ago by
alemutak0
alemutak0 wrote:

1- for two group comparison: logFC/std.error(logFC), you can find the standard error of logFC I guess in the detailed results of edgeR.

2 - OR, qnorm(1-PValue/2)*sign(logFC) --- an approximation

ADD COMMENTlink modified 6 months ago • written 6 months ago by alemutak0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 349 users visited in the last hour