Question

How do you specify "chip" in printGenes of TopGO for maize data

0

Entering edit mode

Kaperuza • 0

@kaperuza-16382

Last seen 5.7 years ago

Germany

Hello!

I am using TopGO to analyze different lists of DEGs from maize obtained via analysis of RNA-seq data.

With the script below I get the list of overrepresented terms for the CC annotations. So far so good.

Now I would like to print out the significant genes associated with each of the overrepresented terms.

I am trying to use printGenes, but run into trouble defining what I should put in “chip” for my maize data.

In the script you will see the error message I get.

The first lines of the input data I am using looks like this:

Data to define the gene Universe maize_v3.agg.nr_CC.txt

Feature ID    GO cellular component
GRMZM2G104390    GO:0000228, GO:0005737, GO:0005667, GO:0000118
GRMZM2G104394    GO:0009534, GO:0048046, GO:0005783, GO:0009506, GO:0005739, GO:0009505, GO:0005774
GRMZM2G104396    GO:0016602, GO:0044451
GRMZM2G104397    GO:0012505, GO:0071944, GO:0005654, GO:0009507, GO:0005815, GO:0016020

List of DEGs UPREGULATED.txt

GRMZM2G167669
GRMZM2G078279
GRMZM2G588623
GRMZM2G179092
GRMZM2G481440
GRMZM2G460594
GRMZM2G010356

Below you will also find my session Info.

I would appreciate your help figuring this one out.

Thanks in advance and looking forwards to your response

Kaperuza

# Analysis of cellular component GO Term enrichment
library("topGO")
geneID2GO <-readMappings(file="/Users/Kaperuza/maize_v3.agg.nr_CC.txt")
geneUniverse <- names(geneID2GO)


# Analysis list of DEGs
genesOfInterest <-read.table("/Users/Kaperuza/UPREGULATED.txt", header=FALSE)
genesOfInterest <- as.character(genesOfInterest$V1)
geneList <- factor(as.integer(geneUniverse %in% genesOfInterest))
names(geneList) <- geneUniverse


# Only use those ones which are annotated with at least 10 terms
myGOdata <- new("topGOdata", description="UPREGULATED DESeq2", ontology="CC", allGenes=geneList, annot=annFUN.gene2GO, gene2GO=geneID2GO, nodeSize=10)

sg <- sigGenes(myGOdata)
str(sg)
numSigGenes(myGOdata)

test.stat <- new("weightCount", testStatistic = GOFisherTest, name = "Fisher test", sigRatio = "ratio")

resultWeight <- getSigGroups(myGOdata, test.stat)

allRes <- GenTable(myGOdata, weight = resultWeight, orderBy = "weight", ranksOf= "weight", topNodes =300) write.csv(as.data.frame(allRes), file="/Users/Kaperuza/UPREGULATED_DESeq2_topGO300_CC_Weight.csv”)

#Trying to get the genes corresponding to each goid in the results
gt <- printGenes(myGOdata, whichTerms = allRes$GO.ID, chip = "geneUniverse", numChar = 40)

#Error in get(paste(chip, "ENTREZID", sep = "")) :

# object 'geneUniverseENTREZID' not found

#traceback ()

5: get(paste(chip, "ENTREZID", sep = ""))

4: get(paste(chip, "ENTREZID", sep = ""))

3: .local(object, whichTerms, ...)

2: printGenes(myGOdata, whichTerms = allRes$GO.ID, chip = "geneUniverse",

numChar = 40)

1: printGenes(myGOdata, whichTerms = allRes$GO.ID, chip = "geneUniverse",

numChar = 40)

#The same happens if I use “GeneID2GO”

sessionInfo()

R version 3.5.0 (2018-04-23)

Platform: x86_64-apple-darwin15.6.0 (64-bit)

Running under: macOS High Sierra 10.13.6



Matrix products: default

BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib

LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib



locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8



attached base packages:

[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  

[9] base     



other attached packages:

[1] topGO_2.32.0         SparseM_1.77         GO.db_3.6.0          AnnotationDbi_1.42.1

[5] IRanges_2.14.10      S4Vectors_0.18.3     Biobase_2.40.0       graph_1.58.0        

[9] BiocGenerics_0.26.0



loaded via a namespace (and not attached):

[1] Rcpp_0.12.17                locfit_1.5-9.1              lattice_0.20-35            

[4] digest_0.6.15               GenomeInfoDb_1.16.0         plyr_1.8.4                 

[7] backports_1.1.2             acepack_1.4.1               RSQLite_2.1.1              

[10] ggplot2_2.2.1               pillar_1.2.3                zlibbioc_1.26.0            

[13] rlang_0.2.1                 lazyeval_0.2.1              rstudioapi_0.7             

[16] data.table_1.11.4           annotate_1.58.0             blob_1.1.1                 

[19] rpart_4.1-13                Matrix_1.2-14               checkmate_1.8.5            

[22] splines_3.5.0               BiocParallel_1.14.1         geneplotter_1.58.0         

[25] stringr_1.3.1               foreign_0.8-70              htmlwidgets_1.2            

[28] RCurl_1.95-4.10             bit_1.1-14                  munsell_0.5.0              

[31] DelayedArray_0.6.1          compiler_3.5.0              pkgconfig_2.0.1            

[34] base64enc_0.1-3             htmltools_0.3.6             nnet_7.3-12                

[37] SummarizedExperiment_1.10.1 tibble_1.4.2                gridExtra_2.3              

[40] htmlTable_1.12              GenomeInfoDbData_1.1.0      Hmisc_4.1-1                

[43] matrixStats_0.53.1          XML_3.98-1.11               bitops_1.0-6               

[46] grid_3.5.0                  xtable_1.8-2                gtable_0.2.0               

[49] DBI_1.0.0                   magrittr_1.5                scales_0.5.0               

[52] stringi_1.2.3               XVector_0.20.0              genefilter_1.62.0          

[55] latticeExtra_0.6-28         Formula_1.2-3               RColorBrewer_1.1-2         

[58] tools_3.5.0                 bit64_0.9-7                 DESeq2_1.20.0              

[61] yaml_2.1.19                 survival_2.42-4             colorspace_1.3-2           

[64] cluster_2.0.7-1             GenomicRanges_1.32.3        memoise_1.1.0              

[67] knitr_1.20

TopGo printGenes • 1.7k views

ADD COMMENT • link updated 5.7 years ago by James W. MacDonald 65k • written 5.7 years ago by Kaperuza • 0

score 0 · Answer 1 · 2018-08-22

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 3 hours ago

United States

The 'chip' argument is expecting a ChipDb package, which is an R package that contains a SQLite database that maps manufacturer IDs to other things like Gene IDs and GO terms. These packages are normally based on a commercial microarray (hence the 'chip' argument name).

Trying to get this to work would be a bit of an adventure, as it appears you have Maize DB IDs, and the normal infrastructure for Bioconductor packages is to use NCBI Gene IDs or GI numbers. There is a OrgDb on AnnotationHub:

> library(AnnotationHub)

> hub <- AnnotationHub()
updating metadata: retrieving 1 resource
  |======================================================================| 100%

snapshotDate(): 2018-04-30
> query(hub, c("zea mays","orgdb"))
AnnotationHub with 2 records
# snapshotDate(): 2018-04-30
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Zea mays, Zea mays_var._japonica
# $rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH61838"]]'

            title                               
  AH61838 | org.Zea_mays.eg.sqlite              
  AH61839 | org.Zea_mays_var._japonica.eg.sqlite

> z <-  hub[["AH61838"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

> z
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Zea mays
| SPECIES: Zea mays
| CENTRALID: GID
| Taxonomy ID: 4577
| Db type: OrgDb
| Supporting package: AnnotationDbi

And you could hypothetically create a ChipDb package that would be based on this OrgDb, and then re-run your code after re-mapping all your IDs, but that sounds like a drag.

An alternative would be to fork topGO and change how it uses the ChipDb so it can use a data.frame of mappings instead. That might be easier, but would still take a bit of work and might be more trouble than it is worth.

ADD COMMENT • link 5.7 years ago James W. MacDonald 65k

0

Entering edit mode

Another alternative would be to map your Maize DB IDs to NCBI GI numbers, and then you could use the AnnotationHub OrgDb directly with GOstats, which does include facilities to get the genes that are significant for each GO term. That would probably be much less work.

ADD REPLY • link 5.7 years ago James W. MacDonald 65k

0

Entering edit mode

Thank you James for your feedback.

The original file of the new GO annotation lists all the GO terms associated to each gene, as shown below. To assign the gene names to my significant results I have to first parse out from this annotation the genes (and their GO terms) for each of my lists of DEGs and then use this subset to assign gene names to the go terms in my TopGO results.

I am still intrigued why this is not possible from inside TopGO...or is it and I just missed a turn?

GRMZM2G127031	GO:0000394
GRMZM2G127031	GO:0002679
GRMZM2G127031	GO:0004674
GRMZM2G127031	GO:0005524
GRMZM2G127031	GO:0005730
GRMZM2G127031	GO:0005856
GRMZM2G127031	GO:0005886
GRMZM2G127031	GO:0007275
GRMZM2G127031	GO:0009506
GRMZM2G127031	GO:0009507
GRMZM2G127031	GO:0009743
GRMZM2G127031	GO:0009863
GRMZM2G127031	GO:0010200
GRMZM2G127031	GO:0010363
GRMZM2G127031	GO:0012505
GRMZM2G127031	GO:0042742
GRMZM2G127031	GO:0046777
GRMZM2G127031	GO:0050832
GRMZM2G127031	GO:0052033

ADD REPLY • link 5.7 years ago Kaperuza • 0

0

Entering edit mode

It's not possible because that's not how the function was written. When people write code they envision a particular use case, and then write the function to fulfill that use case. printGenes was written specifically to extract the gene annotation data from a ChipDb package, and if you give it something different from that, it will fail, because the code isn't written to take any arbitrary input and 'do the right thing' with it. Your question is sort of like asking why you can't open an Excel document with Word. It's because that's not how it works.

ADD REPLY • link 5.7 years ago James W. MacDonald 65k