Question: How do you specify "chip" in printGenes of TopGO for maize data
gravatar for Kaperuza
9 months ago by
Kaperuza0 wrote:


I am using TopGO to analyze different lists of DEGs from maize obtained via analysis of RNA-seq data. 

With the script below I get the list of overrepresented terms for the CC annotations. So far so good.

Now I would like to print out the significant genes associated with each of the overrepresented terms.

I am trying to use printGenes, but run into trouble defining what I should put in “chip” for my maize data.

In the script you will see the error message I get.

The first lines of the input data I am using looks like this:

Data to define the gene Universe maize_v3.agg.nr_CC.txt 

Feature ID    GO cellular component
GRMZM2G104390    GO:0000228, GO:0005737, GO:0005667, GO:0000118
GRMZM2G104394    GO:0009534, GO:0048046, GO:0005783, GO:0009506, GO:0005739, GO:0009505, GO:0005774
GRMZM2G104396    GO:0016602, GO:0044451
GRMZM2G104397    GO:0012505, GO:0071944, GO:0005654, GO:0009507, GO:0005815, GO:0016020



Below you will also find my session Info.

I would appreciate your help figuring this one out.

Thanks in advance and looking forwards to your response



# Analysis of cellular component GO Term enrichment
geneID2GO <-readMappings(file="/Users/Kaperuza/maize_v3.agg.nr_CC.txt")
geneUniverse <- names(geneID2GO)

# Analysis list of DEGs
genesOfInterest <-read.table("/Users/Kaperuza/UPREGULATED.txt", header=FALSE)
genesOfInterest <- as.character(genesOfInterest$V1)
geneList <- factor(as.integer(geneUniverse %in% genesOfInterest))
names(geneList) <- geneUniverse

# Only use those ones which are annotated with at least 10 terms
myGOdata <- new("topGOdata", description="UPREGULATED DESeq2", ontology="CC", allGenes=geneList, annot=annFUN.gene2GO, gene2GO=geneID2GO, nodeSize=10)

sg <- sigGenes(myGOdata)

test.stat <- new("weightCount", testStatistic = GOFisherTest, name = "Fisher test", sigRatio = "ratio")

resultWeight <- getSigGroups(myGOdata, test.stat)

allRes <- GenTable(myGOdata, weight = resultWeight, orderBy = "weight", ranksOf= "weight", topNodes =300) write.csv(, file="/Users/Kaperuza/UPREGULATED_DESeq2_topGO300_CC_Weight.csv”)

#Trying to get the genes corresponding to each goid in the results
gt <- printGenes(myGOdata, whichTerms = allRes$GO.ID, chip = "geneUniverse", numChar = 40)

#Error in get(paste(chip, "ENTREZID", sep = "")) :

# object 'geneUniverseENTREZID' not found

#traceback ()

5: get(paste(chip, "ENTREZID", sep = ""))

4: get(paste(chip, "ENTREZID", sep = ""))

3: .local(object, whichTerms, ...)

2: printGenes(myGOdata, whichTerms = allRes$GO.ID, chip = "geneUniverse",

       numChar = 40)

1: printGenes(myGOdata, whichTerms = allRes$GO.ID, chip = "geneUniverse",

       numChar = 40)

#The same happens if I use “GeneID2GO”




R version 3.5.0 (2018-04-23)

Platform: x86_64-apple-darwin15.6.0 (64-bit)

Running under: macOS High Sierra 10.13.6

Matrix products: default

BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib

LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib


[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:

[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  

[9] base     

other attached packages:

[1] topGO_2.32.0         SparseM_1.77         GO.db_3.6.0          AnnotationDbi_1.42.1

[5] IRanges_2.14.10      S4Vectors_0.18.3     Biobase_2.40.0       graph_1.58.0        

[9] BiocGenerics_0.26.0

loaded via a namespace (and not attached):

[1] Rcpp_0.12.17                locfit_1.5-9.1              lattice_0.20-35            

[4] digest_0.6.15               GenomeInfoDb_1.16.0         plyr_1.8.4                 

[7] backports_1.1.2             acepack_1.4.1               RSQLite_2.1.1              

[10] ggplot2_2.2.1               pillar_1.2.3                zlibbioc_1.26.0            

[13] rlang_0.2.1                 lazyeval_0.2.1              rstudioapi_0.7             

[16] data.table_1.11.4           annotate_1.58.0             blob_1.1.1                 

[19] rpart_4.1-13                Matrix_1.2-14               checkmate_1.8.5            

[22] splines_3.5.0               BiocParallel_1.14.1         geneplotter_1.58.0         

[25] stringr_1.3.1               foreign_0.8-70              htmlwidgets_1.2            

[28] RCurl_1.95-4.10             bit_1.1-14                  munsell_0.5.0              

[31] DelayedArray_0.6.1          compiler_3.5.0              pkgconfig_2.0.1            

[34] base64enc_0.1-3             htmltools_0.3.6             nnet_7.3-12                

[37] SummarizedExperiment_1.10.1 tibble_1.4.2                gridExtra_2.3              

[40] htmlTable_1.12              GenomeInfoDbData_1.1.0      Hmisc_4.1-1                

[43] matrixStats_0.53.1          XML_3.98-1.11               bitops_1.0-6               

[46] grid_3.5.0                  xtable_1.8-2                gtable_0.2.0               

[49] DBI_1.0.0                   magrittr_1.5                scales_0.5.0               

[52] stringi_1.2.3               XVector_0.20.0              genefilter_1.62.0          

[55] latticeExtra_0.6-28         Formula_1.2-3               RColorBrewer_1.1-2         

[58] tools_3.5.0                 bit64_0.9-7                 DESeq2_1.20.0              

[61] yaml_2.1.19                 survival_2.42-4             colorspace_1.3-2           

[64] cluster_2.0.7-1             GenomicRanges_1.32.3        memoise_1.1.0              

[67] knitr_1.20


topgo printgenes • 245 views
ADD COMMENTlink modified 9 months ago by James W. MacDonald50k • written 9 months ago by Kaperuza0
Answer: How do you specify "chip" in printGenes of TopGO for maize data
gravatar for James W. MacDonald
9 months ago by
United States
James W. MacDonald50k wrote:

The 'chip' argument is expecting a ChipDb package, which is an R package that contains a SQLite database that maps manufacturer IDs to other things like Gene IDs and GO terms. These packages are normally based on a commercial microarray (hence the 'chip' argument name).

Trying to get this to work would be a bit of an adventure, as it appears you have Maize DB IDs, and the normal infrastructure for Bioconductor packages is to use NCBI Gene IDs or GI numbers. There is a OrgDb on AnnotationHub:

> library(AnnotationHub)

> hub <- AnnotationHub()
updating metadata: retrieving 1 resource
  |======================================================================| 100%

snapshotDate(): 2018-04-30
> query(hub, c("zea mays","orgdb"))
AnnotationHub with 2 records
# snapshotDate(): 2018-04-30
# $dataprovider:
# $species: Zea mays, Zea mays_var._japonica
# $rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH61838"]]'

  AH61838 |              
  AH61839 |

> z <-  hub[["AH61838"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

> z
OrgDb object:
| ORGANISM: Zea mays
| SPECIES: Zea mays
| Taxonomy ID: 4577
| Db type: OrgDb
| Supporting package: AnnotationDbi

And you could hypothetically create a ChipDb package that would be based on this OrgDb, and then re-run your code after re-mapping all your IDs, but that sounds like a drag.

An alternative would be to fork topGO and change how it uses the ChipDb so it can use a data.frame of mappings instead. That might be easier, but would still take a bit of work and might be more trouble than it is worth.

ADD COMMENTlink written 9 months ago by James W. MacDonald50k

Another alternative would be to map your Maize DB IDs to NCBI GI numbers, and then you could use the AnnotationHub OrgDb directly with GOstats, which does include facilities to get the genes that are significant for each GO term. That would probably be much less work.

ADD REPLYlink written 9 months ago by James W. MacDonald50k

Thank you James for your feedback.

The original file of the new GO annotation lists all the GO terms associated to each gene, as shown below. To assign the gene names to my significant results I have to first parse out from this annotation the genes (and their GO terms) for each of my lists of DEGs and then use this subset to assign gene names to the go terms in my TopGO results.

I am still intrigued why this is not possible from inside TopGO...or is it and I just missed a turn?

GRMZM2G127031 GO:0000394
GRMZM2G127031 GO:0002679
GRMZM2G127031 GO:0004674
GRMZM2G127031 GO:0005524
GRMZM2G127031 GO:0005730
GRMZM2G127031 GO:0005856
GRMZM2G127031 GO:0005886
GRMZM2G127031 GO:0007275
GRMZM2G127031 GO:0009506
GRMZM2G127031 GO:0009507
GRMZM2G127031 GO:0009743
GRMZM2G127031 GO:0009863
GRMZM2G127031 GO:0010200
GRMZM2G127031 GO:0010363
GRMZM2G127031 GO:0012505
GRMZM2G127031 GO:0042742
GRMZM2G127031 GO:0046777
GRMZM2G127031 GO:0050832
GRMZM2G127031 GO:0052033





ADD REPLYlink written 8 months ago by Kaperuza0

It's not possible because that's not how the function was written. When people write code they envision a particular use case, and then write the function to fulfill that use case. printGenes was written specifically to extract the gene annotation data from a ChipDb package, and if you give it something different from that, it will fail, because the code isn't written to take any arbitrary input and 'do the right thing' with it. Your question is sort of like asking why you can't open an Excel document with Word. It's because that's not how it works.

ADD REPLYlink written 8 months ago by James W. MacDonald50k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 142 users visited in the last hour