We have done a simple analysis in edgeR, to find significant DE genes between two groups, followed by GO analysis with goana. We tried to follow protocol/manual. After finding significant GO terms we would like to see which genes are in these GO terms. We found a post describing how to get the DE genes from the GO terms, but following Aaron's(goana limma- extract list of DE genes and genes in the enriched GO terms?) method, we were not successful. We are not sure what is going wrong, and whether this is a bug or whether we are missing something.
Some essential code here:
fit <- glmFit(y, design)
lrt <- glmLRT(fit)
summary(decideTests(lrt)) tissuevein Down 794 NotSig 13056 Up 102
### GO analysis
go <- goana(lrt)
go4 <- topGO(go, ont="BP", sort="Up", n=50)
head(go4) Term Ont N Up Down P.Up P.Down GO:0043252 sodium-independent organic anion transport BP 2 2 0 6.017011e-05 1 GO:0006270 DNA replication initiation BP 8 2 0 1.635777e-03 1 GO:0071881 adenylate cyclase-inhibiting adrenergic receptor signaling pathway BP 1 1 0 7.888509e-03 1 GO:0042891 antibiotic transport BP 1 1 0 7.888509e-03 1 GO:0140074 cardiac endothelial to mesenchymal transition BP 1 1 0 7.888509e-03 1 GO:0071878 negative regulation of adrenergic receptor signaling pathway BP 1 1 0 7.888509e-03 1
sig_toptags3_up <- subset(toptags3$table, FDR < 0.05 & logFC > 0)
head(sig_toptags3_up) genes Entrezgene logFC logCPM LR PValue FDR 28703 PTGS1 5742 7.612927 9.502819 40.68602 1.787619e-10 6.235215e-07 13240 DCN 1634 6.643868 9.932415 36.77281 1.327288e-09 2.314790e-06 25397 NR2F2 7025 7.074398 8.899530 35.37534 2.719074e-09 2.918194e-06 20564 LHX6 26468 7.702296 7.428558 34.05959 5.344974e-09 4.386652e-06 10239 COL3A1 1281 5.889018 9.778522 31.89044 1.631177e-08 9.482577e-06 29575 RIMS1 22999 7.017122 7.891823 29.78443 4.828546e-08 2.041451e-05
## Slightly modified from Aaron's method
# Basic set-up:
DB <- paste("org", "Hs", "eg", "db", sep = ".") require(DB, character.only = TRUE) GO2ALLEGS <- paste("org", "Hs", "egGO2ALLEGS", sep = ".") EG.GO <- AnnotationDbi::toTable(get(GO2ALLEGS)) d <- duplicated(EG.GO[, c("gene_id", "go_id", "Ontology")]) EG.GO <- EG.GO[!d, ]
de.by.go <- split(EG.GO$gene_id, EG.GO$go_id) de.by.go <- lapply(de.by.go, FUN=function(x) { x[x %in% sig_toptags3_up$Entrezgene] })
de.by.go.stack <- stack(de.by.go)
table_sig_GO <- de.by.go.stack[de.by.go.stack$ind %in% rownames(go4),]
table_sig_GO2 <- merge(table_sig_GO, sig_toptags3_up[,1:2], by.x="values", by.y="Entrezgene", all.x=T, all.y=F, sort=F)
GO4_terms <- go4[,1:2] rownames(GO4_terms) <- rownames(go4)
table_sig_GO3 <- merge(table_sig_GO2, GO4_terms, by.x="ind", by.y="row.names", all.x=T, all.y=F, sort=T)
dim(table_sig_GO3) [1] 34 5
head(table_sig_GO3) ind values genes Term Ont 1 GO:0000209 57161 PELI2 protein polyubiquitination BP 2 GO:0000209 79176 FBXL15 protein polyubiquitination BP 3 GO:0000209 246184 CDC26 protein polyubiquitination BP 4 GO:0002250 1191 CLU adaptive immune response BP 5 GO:0002250 2185 PTK2B adaptive immune response BP 6 GO:0002250 3383 ICAM1 adaptive immune response BP
The results show 34 rows (while there are 50 GO terms, and 73 DE genes in these according go4 results), also the GO terms that are present contain not the right number of genes. Sometimes less, but also sometimes more! Is this the only way to retrieve the right genes from goana? Or are we missing something? Any ideas about this?
I'm having a tough time reading your code. Can you distil it into a minimum working example?
Hi Aaron, thanks for your reply. I modified your code in order to get only the significant GO terms in data.frame, and also with gene symbol and GO description. But to give a simple example, I use exactly your code from goana limma- extract list of DE genes and genes in the enriched GO terms?.
In this example GO:0051321 should have 2 significant genes, but none are found, GO:0035108 reports 3 genes (but only 2 are expected), and GO:0002460 has 3 genes just like the topGO results indicate. Can you follow this code now?