I'm trying to carry out GO-enrichment analysis of microarray data. I can't understand how can I adopt data after limma for topGO input. For example: I have a data.frame, which contains gene symbols, expression values, t, B, adjusted p.values - basically the results of
>trgts <- readTargets("targets4.csv", sep = ";")
>rough <- read.maimages(trgts, source="agilent",
columns = list(R ="rDyeNormSignal", G = "gDyeNormSignal",rIsFeatNonUnifOL = "rIsFeatNonUnifOL", >gIsFeatNonUnifOL="gIsFeatNonUnifOL",rIsBGNonUnifOL= "rIsBGNonUnifOL",gIsBGNonUnifOL="gIsBGNonUnifOL",
rIsFeatPopnOL="rIsFeatPopnOL",gIsFeatPopnOL="gIsFeatPopnOL",rIsBGPopnOL= "rIsBGPopnOL",
gIsBGPopnOL="gIsBGPopnOL", rIsSaturated="rIsSaturated",gIsSaturated="gIsSaturated"),
other.columns = c("rIsFeatNonUnifOL","gIsFeatNonUnifOL", "rIsBGNonUnifOL","gIsBGNonUnifOL",
"rIsFeatPopnOL","gIsFeatPopnOL", "rIsBGPopnOL",
"gIsBGPopnOL", "rIsSaturated","gIsSaturated"),
annotation = c("accessions","chr_coord","Sequence",
"ProbeUID", "ControlType", "ProbeName", "GeneName","SystematicName"
, "Description"))
roughbet = normalizeBetweenArrays(rough,method="Aquantile")
roughave <- avereps(roughbet,ID=roughbet$genes$ProbeName)
design <- modelMatrix(trgts, ref="Col0")
>fitRC <- lmFit(roughave, design)
>fitRC <- eBayes(fitRC)
>signifC = topTable(fitRC, coef = "mut1", lfc = 1, p.value = 0.05,adjust.method = "BH", number = Inf)
>signifCC = signifCC <- signifC[signifC$ControlType == 0,]
#the next function makes annotation from agilent database and cbind info about probes, including GO_IDs.
>agilentannC <- function(x) {
for (i in 1:nrow(x)) { x$ID[i] <- (which(AGIDB2$ID == x$ProbeName[i]))}
AGIDBcutC <- AGIDB2[x$ID,]
XannotateC <<- cbind(x, AGIDBcutC)
CCC <- XannotateC
row.names(CCC) <- CCC$GENE_SYMBOL
mut1data <- PREPAREDC
So I have all this - a table contains genes of interest, selected by lfc and p.values, their p.v.'s, LogFC, aveExp, and even GO IDs - and really can't understand how to make topGOdata of it! Please help!!
> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
[1] LC_COLLATE=Russian_Russia.1251 LC_CTYPE=Russian_Russia.1251 LC_MONETARY=Russian_Russia.1251
[4] LC_NUMERIC=C LC_TIME=Russian_Russia.1251
attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] limma_3.38.3 Rgraphviz_2.26.0 hgu95av2.db_3.2.3 org.Hs.eg.db_3.7.0 topGO_2.34.0
[6] SparseM_1.77 GO.db_3.7.0 AnnotationDbi_1.44.0 IRanges_2.16.0 S4Vectors_0.20.1
[11] Biobase_2.42.0 graph_1.60.0 BiocGenerics_0.28.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 bit_1.1-14 lattice_0.20-38 blob_1.1.1 tools_3.5.3 DBI_1.0.0
[7] matrixStats_0.54.0 yaml_2.2.0 bit64_0.9-7 digest_0.6.18 BiocManager_1.30.4 memoise_1.1.0
[13] RSQLite_2.1.1 compiler_3.5.3 pkgconfig_2.0.2
I go over a working example on Biostars, here: https://www.biostars.org/p/350710/
If your genes are not HGNC symbols, then you can use biomaRt to convert them. topGO also works with Ensembl gene IDs and Entrez identifiers.
Thank you for your reply. But I already have GO ID's from my annotation function. And my experiment deals with Arabidopsis thaliana agilent microarray. So it's TAIR gene symbols, like AT5G15324, there. The problem is I don't understand how to put my data in topGOdata format.
If I run it this way:
it doesn't work at all.
It would be the best, if there is some way to construct topGOdata manually.
You may follow this previous example on Biostars: https://www.biostars.org/p/250927/#250936