How to get t-statistic values using edgeR?
2
1
Entering edit mode
Biologist ▴ 90
@biologist-9801
Last seen 2.2 years ago

Recently, I have seen a post about fgsea for gene set enrichment analysis. In this DESeq results to pathways in 60 Seconds with the fgsea package. In this post they use stat values column from Deseq2 results to rank the genes for gene set enrichment analysis.

With edgeR I use the following code.

fit <- glmQLFit(y, design, robust=TRUE)
contrast.matrix <- makeContrasts(Tumor-Control, levels=design)
qlf <- glmQLFTest(fit, contrast=contrast.matrix)
topTags(qlf)

Coefficient:  -1*Control 1*Tumor
logFC   logCPM        F       PValue          FDR
Gene1   3.3152959 3.584387 319.9770 2.413285e-17 3.136064e-13
Gene2   1.2035429 9.099493 269.5468 2.402948e-16 1.270767e-12
Gene3    5.5160446 5.575741 690.8451 2.933667e-16 1.270767e-12
Gene4   3.8715029 2.612542 217.3808 4.068607e-15 1.321789e-11
Gene5   2.5809156 3.008769 186.4712 2.934439e-14 7.626608e-11
Gene6   1.0808460 7.195544 143.2828 7.951168e-13 1.722090e-09
Gene7   1.2751258 5.673212 131.3892 2.285582e-12 4.243019e-09
Gene8   2.5480277 2.417406 127.8459 3.178656e-12 5.163329e-09
Gene9   0.8964683 7.478979 121.9161 5.615874e-12 8.108698e-09
Gene10  1.4907664 3.428103 106.5157 2.754773e-11 3.579828e-08

As I use the glmQLFTest I have the Fstat in the results. But, I want to get t-statistic values column using edgeR to rank genes based on that column and use for Gene set enrichment analysis.

Ofcourse, I can use camera package for this, but I'm interested in ranking genes based on t-statistic and use for GSEA.

How to get the t-statistic using edgeR package?

r edger t-statistic fgsea gsea • 1.6k views
6
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

You need:

t.stat <- sign(qlf$table$logFC) * sqrt(qlf$table$F)
z <- zscoreT(t.stat, df=qlf\$df.total)

The z-statistic should be input to fgsea rather than the t-statistic because the t-statistics are not all on the same df.

You won't be surprised though that I feel that camera would be better. I haven't tested fgsea but any pre-ranked gsea method ignores correlations between genes and therefore would have to give p-values that are too liberal.

0
Entering edit mode

Thanks Gordon. I also gave a try with camera now.

For camera Is the following a right way?

pathways.hallmark <- gmtPathways("h.all.v6.2.symbols.gmt")
cam <- camera(y, pathways.hallmark, design, contrast=contrast.matrix, inter.gene.cor=0.01)
NGenes Direction  PValue     FDR
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION    130        Up 3.9e-08 2.0e-06
HALLMARK_MYOGENESIS                           116        Up 3.9e-07 9.9e-06
HALLMARK_TNFA_SIGNALING_VIA_NFKB              157        Up 2.5e-06 4.1e-05
HALLMARK_UNFOLDED_PROTEIN_RESPONSE            110      Down 1.4e-03 1.8e-02
HALLMARK_MYC_TARGETS_V2                        57      Down 3.3e-03 3.2e-02

If it is right, can you please tell me how I can get the genes (NGenes) which are enriched in those hallmark pathways. I want their names.

1
Entering edit mode

As ?camera will tell you, Ngenes is the total number of genes in the set. Besides, there is no concept of explicit enrichment in camera. The method doesn't draw a line between differential and non-differential genes in the set, it's using the entire distribution of test statistics. You could, of course, draw this line yourself (e.g., taking all genes in the set that exhibit a significant difference in the specified contrast at a FDR of 5%), but that's not how camera is working internally.

0
Entering edit mode

0
Entering edit mode
alemutak • 0
@alemutak-14833
Last seen 3.0 years ago

1- for two group comparison: logFC/std.error(logFC), you can find the standard error of logFC I guess in the detailed results of edgeR.

2 - OR, qnorm(1-PValue/2)*sign(logFC) --- an approximation