Question: AnnBuilder with custmerized GO annotation
0
gravatar for Xinxia Peng
12.3 years ago by
Xinxia Peng120
Xinxia Peng120 wrote:
Hi All, I am trying to reproduce the example given in 'Basic Functions of AnnBuilder'. Here is what I got for the GO part of the annotation: GO [1,] "GO:0004060 at E" [2,] "GO:0004060 at E" [3,] "NA" [4,] "NA" [5,] "GO:0008320 at NR;GO:0004866 at NR;GO:0006886 at NR" [6,] "NA" [7,] "NA" [8,] "GO:0004060 at E" [9,] "NA" What do these after a GO term mean, '@E' or '@NR'? What I am trying to do is to build an annotation package for GO enrichment analysis using GOstats. The GO annotation is from InterProScan. I plan to create a data frame with three columns: probeid, geneid and GO, then build the annotation package. Any suggestions? Thanks, Xinxia
annotation go gostats • 747 views
ADD COMMENTlink modified 12.3 years ago by Nianhua Li870 • written 12.3 years ago by Xinxia Peng120
Answer: AnnBuilder with custmerized GO annotation
0
gravatar for John Zhang
12.3 years ago by
John Zhang2.9k
John Zhang2.9k wrote:
> >What do these after a GO term mean, '@E' or '@NR'? They are evidence code by GO. Please read the description available from GO web site for details. > >What I am trying to do is to build an annotation package for GO >enrichment analysis using GOstats. The GO annotation is from >InterProScan. I plan to create a data frame with three columns: probeid, >geneid and GO, then build the annotation package. Any suggestions? > >Thanks, >Xinxia > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Jianhua Zhang Department of Medical Oncology Dana-Farber Cancer Institute 44 Binney Street Boston, MA 02115-6084
ADD COMMENTlink written 12.3 years ago by John Zhang2.9k
Thanks. I thought about it, but I did not find the 'E' in this example from GO website: http://www.geneontology.org/GO.evidence.shtml Another question is for GOstats: is the evidence code involved in the enrichment analysis? ======================= Here I repeat part of the annotation that I got: ENTREZID PROBE ACCNUM UNIGENE [1,] "10" "36512_at" "L32179" "Hs.2" [2,] "10" "38912_at" "D90042" "Hs.2" [3,] "1084" "32468_f_at" "D90278;M16652" "NA" [4,] "125" "35730_at" "X03350" "NA" [5,] "2" "32469_at" "L00693" "Hs.74561" [6,] "63036" "38936_at" "M16652" "NA" [7,] "7051" "32481_at" "AL031663" "NA" [8,] "9" "33825_at" "X68733" "NA" [9,] "NA" "39368_at" "AL031668" "NA" GO OMIM [1,] "GO:0004060 at E" "NA" [2,] "GO:0004060 at E" "NA" [3,] "NA" "NA" [4,] "NA" "NA" [5,] "GO:0008320 at NR;GO:0004866 at NR;GO:0006886 at NR" "NA" [6,] "NA" "NA" [7,] "NA" "NA" [8,] "GO:0004060 at E" "NA" [9,] "NA" "NA" ======================= Here is the script that I used while following the example: library(AnnBuilder); pkgpath <- .find.package("AnnBuilder"); # test dataset pkgdir <- "/nethome/xpeng/linux/analysis/array/scripts/pkgs"; setwd(pkgdir); geneNMap <- matrix(c("32468_f_at", "D90278;M16652", "32469_at", "L00693", "32481_at", "AL031663", "33825_at", "X68733", "35730_at", "X03350", "36512_at", "L32179", "38912_at", "D90042", "38936_at", "M16652", "39368_at", "AL031668"), ncol = 2, byrow = TRUE) write.table(geneNMap, file = "geneNMap", sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE) # get annotation info. makeSrcInfo() srcObjs <- list() egUrl <- "http://www.bioconductor.org/datafiles/wwwsources" ugUrl <- "http://www.bioconductor.org/datafiles/wwwsources/Ths.data.gz" eg <- EG(srcUrl = egUrl, parser = file.path(pkgpath, "scripts", "gbLLParser"), baseFile = "geneNMap", accession = "Tll_tmpl.gz", built = "N/A", fromWeb = TRUE) ug <- UG(srcUrl = ugUrl, parser = file.path(pkgpath, "scripts", "gbUGParser"), baseFile = "geneNMap", organism = "Homo sapiens", built = "N/A", fromWeb = TRUE) srcObjs[["eg"]] <- eg srcObjs[["ug"]] <- ug if(.Platform$OS.type != "windows"){ llMapping <- parseData(eg, eg at accession) colnames(llMapping) <- c("PROBE", "EG") ugMapping <- parseData(ug) colnames(ugMapping) <- c("PROBE", "UG") } # This portion only runs after the previous code has been # executed under windows if(.Platform$OS.type != "windows"){ llMapping ugMapping } # This portion only runs interactively under Windows (copy/paste) base <- matrix(scan("geneNMap", what = "", sep = "\t", quote = "", quiet = TRUE), ncol = 2, byrow = TRUE) colnames(base) <- c("PROBE", "ACC") merged <- merge(base, llMapping, by = "PROBE", all.x = TRUE) merged <- merge(merged, ugMapping, by = "PROBE", all.x = TRUE) unified <- AnnBuilder:::resolveMaps(merged, trusted = c("EG", "UG"), srcs = c("EG", "UG")) unified read.table(unified, sep = "\t", header = FALSE) if(.Platform$OS.type != "windows"){ # these two do not work for me # parser(eg) <- file.path(.path.package("AnnBuilder"), "scripts", "llParser") # baseFile(eg) <- unified attr(eg, "parser") <- file.path(.path.package("AnnBuilder"), "scripts", "llParser") attr(eg, "baseFile") <- unified annotation <- parseData(eg, eg at accession, ncol = 14) colnames(annotation) <- c("PROBE", "ACCNUM", "ENTREZID", "UNIGENE", "GENENAME", "SYMBOL","CHR", "MAP", "PMID", "GRIF", "SUMFUNC", "GO", "OMIM", "REFSEQ") } annotation gpUrl <- "http://www.bioconductor.org/datafiles/wwwsources/" goUrl <- "http://www.bioconductor.org/datafiles/wwwsources/Tgo.xml" gp <- GP(srcUrl = gpUrl, organism = "Homo sapiens", fromWeb = TRUE) go <- GO(srcUrl = goUrl, fromWeb = TRUE) strand <- AnnBuilder:::getChroLocation(srcUrl(gp), AnnBuilder:::gpLinkNGene(TRUE)); strand annotation <- merge(annotation, strand, by = "ENTREZID", all.x = TRUE); pkgName <- "myTestPkg" pkgPath <- getwd() createEmptyDPkg("myTestPkg", getwd(), force = TRUE) annotation <- as.matrix(annotation) annotation AnnBuilder:::writeAnnData2Pkg(annotation, pkgName, pkgPath) list.files(file.path(getwd(), "myTestPkg")) repList <- AnnBuilder:::getRepList("all", srcObjs) repList[["PKGNAME"]] <- pkgName AnnBuilder:::writeOrganism(pkgName, pkgPath, "Homo sapiens") AnnBuilder:::writeDocs("geneNMap", pkgName, pkgPath, "1.1.0", list(author = "annonymous", maintainer = "annonymous <annonymous at="" net.com="">"), repList, "PKGNAME") # clean up #unlink(c(unified, XMLOut, "geneNMap", "test.xml", "testByNum.xml")) #unlink(file.path(getwd(), "test"), TRUE) sessionInfo() ======================= This is the session info: R version 2.5.0 (2007-04-23) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .U TF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UT F- 8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_ ID ENTIFICATION=C attached base packages: [1] "tools" "stats" "graphics" "grDevices" "utils" "datasets" [7] "methods" "base" other attached packages: AnnBuilder annotate XML Biobase "1.14.0" "1.14.1" "1.7-3" "1.14.0" Best, Xinxia -----Original Message----- From: John Zhang [mailto:jzhang@jimmy.harvard.edu] Sent: Tuesday, May 01, 2007 5:51 AM To: bioconductor at stat.math.ethz.ch; Xinxia Peng Subject: Re: [BioC] AnnBuilder with custmerized GO annotation > >What do these after a GO term mean, '@E' or '@NR'? They are evidence code by GO. Please read the description available from GO web site for details. > >What I am trying to do is to build an annotation package for GO >enrichment analysis using GOstats. The GO annotation is from >InterProScan. I plan to create a data frame with three columns: probeid, >geneid and GO, then build the annotation package. Any suggestions? > >Thanks, >Xinxia > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Jianhua Zhang Department of Medical Oncology Dana-Farber Cancer Institute 44 Binney Street Boston, MA 02115-6084
ADD REPLYlink written 12.3 years ago by Xinxia Peng120
Answer: AnnBuilder with custmerized GO annotation
0
gravatar for Seth Falcon
12.3 years ago by
Seth Falcon7.4k
Seth Falcon7.4k wrote:
"Xinxia Peng" <xinxia.peng at="" sbri.org=""> writes: > Thanks. I thought about it, but I did not find the 'E' in this example > from GO website: That looks like a bug of some sort. The evidence codes should all be three letters I believe. > http://www.geneontology.org/GO.evidence.shtml > > Another question is for GOstats: is the evidence code involved in the > enrichment analysis? No, they are not used at present. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org
ADD COMMENTlink written 12.3 years ago by Seth Falcon7.4k
In the particular example, the corresponding protein on GO website is 'ARY2_HUMAN'. It has one associated GO term GO:0004060, the same one as I got from the example. But the evidence code is 'TAS'. I am wondering if it is related to the data stored on the bioconductor site, or something happened along the file parsing or something else. Xinxia -----Original Message----- From: Seth Falcon [mailto:sfalcon@fhcrc.org] Sent: Tuesday, May 01, 2007 12:10 PM To: Xinxia Peng Cc: John Zhang; bioconductor at stat.math.ethz.ch Subject: Re: [BioC] AnnBuilder with custmerized GO annotation "Xinxia Peng" <xinxia.peng at="" sbri.org=""> writes: > Thanks. I thought about it, but I did not find the 'E' in this example > from GO website: That looks like a bug of some sort. The evidence codes should all be three letters I believe. > http://www.geneontology.org/GO.evidence.shtml > > Another question is for GOstats: is the evidence code involved in the > enrichment analysis? No, they are not used at present. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org
ADD REPLYlink written 12.3 years ago by Xinxia Peng120
Answer: AnnBuilder with custmerized GO annotation
0
gravatar for Nianhua Li
12.3 years ago by
Nianhua Li870
Nianhua Li870 wrote:
Hi, Xinxia, I believe the problem is related to the data. The data is was created a couple of years ago. I guess maybe we updated the parser according to the changes on GO website, but didn't update the data. The data is mean to be an example and definitely won't be consistent with latest GO data. We will try to update the vignette in the future. In regarding to your task, I would modify AnnBuilder code if I were you. The related codes are in function getAnnData() in file AnnBuilder/R/ABPkgBuilder.R: # Parse gene2go.gz parser(srcObjs[["eg"]]) <- getBaseParsers("eggo") go <- try(parseData(srcObjs[["eg"]], srcObjs[["eg"]]@go, ncol = 2, mergeKey = FALSE)) colnames(go) <- c("PROBE", "GO") options(show.error.messages = TRUE) if(inherits(annotation, "try-error")){ stop(paste("Parsing Entrez Gene gene2go.gz failed because of:\n\n", annotation)) } if(nrow(go) > 0) annotation <- merge(annotation, go, by = "PROBE", all.x = TRUE) options(show.error.messages = FALSE) Remove the first 2 lines. Create a data frame of 2 columns. The first column is probeset id, the second column is go id. Assign this data frame to variable "go". Then you are good to go. When you invoke ABPkgBuilder, give your probeset to Entrez Gene mapping as base mapping file and "ll" as baseMapType. Maybe you want to insert "browser()" into the code (say at the beginning of getAnnData()) to help you on debugging. I find it very useful when I work with AnnBuilder. hope this helps nianhua
ADD COMMENTlink written 12.3 years ago by Nianhua Li870
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 333 users visited in the last hour