AnnBuilder with custmerized GO annotation

0

Entering edit mode

Xinxia Peng ▴ 120

@xinxia-peng-1881

Last seen 11.5 years ago

Hi All, I am trying to reproduce the example given in 'Basic Functions of AnnBuilder'. Here is what I got for the GO part of the annotation: GO [1,] "GO:0004060 at E" [2,] "GO:0004060 at E" [3,] "NA" [4,] "NA" [5,] "GO:0008320 at NR;GO:0004866 at NR;GO:0006886 at NR" [6,] "NA" [7,] "NA" [8,] "GO:0004060 at E" [9,] "NA" What do these after a GO term mean, '@E' or '@NR'? What I am trying to do is to build an annotation package for GO enrichment analysis using GOstats. The GO annotation is from InterProScan. I plan to create a data frame with three columns: probeid, geneid and GO, then build the annotation package. Any suggestions? Thanks, Xinxia

Annotation GO GOstats Annotation GO GOstats • 2.2k views

ADD COMMENT • link updated 18.8 years ago by Nianhua Li ▴ 870 • written 18.8 years ago by Xinxia Peng ▴ 120

0

Entering edit mode

John Zhang ★ 2.9k

@john-zhang-6

Last seen 11.5 years ago

> >What do these after a GO term mean, '@E' or '@NR'? They are evidence code by GO. Please read the description available from GO web site for details. > >What I am trying to do is to build an annotation package for GO >enrichment analysis using GOstats. The GO annotation is from >InterProScan. I plan to create a data frame with three columns: probeid, >geneid and GO, then build the annotation package. Any suggestions? > >Thanks, >Xinxia > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Jianhua Zhang Department of Medical Oncology Dana-Farber Cancer Institute 44 Binney Street Boston, MA 02115-6084

ADD COMMENT • link 18.8 years ago John Zhang ★ 2.9k

0

Entering edit mode

Thanks. I thought about it, but I did not find the 'E' in this example from GO website: http://www.geneontology.org/GO.evidence.shtml Another question is for GOstats: is the evidence code involved in the enrichment analysis? ======================= Here I repeat part of the annotation that I got: ENTREZID PROBE ACCNUM UNIGENE [1,] "10" "36512_at" "L32179" "Hs.2" [2,] "10" "38912_at" "D90042" "Hs.2" [3,] "1084" "32468_f_at" "D90278;M16652" "NA" [4,] "125" "35730_at" "X03350" "NA" [5,] "2" "32469_at" "L00693" "Hs.74561" [6,] "63036" "38936_at" "M16652" "NA" [7,] "7051" "32481_at" "AL031663" "NA" [8,] "9" "33825_at" "X68733" "NA" [9,] "NA" "39368_at" "AL031668" "NA" GO OMIM [1,] "GO:0004060 at E" "NA" [2,] "GO:0004060 at E" "NA" [3,] "NA" "NA" [4,] "NA" "NA" [5,] "GO:0008320 at NR;GO:0004866 at NR;GO:0006886 at NR" "NA" [6,] "NA" "NA" [7,] "NA" "NA" [8,] "GO:0004060 at E" "NA" [9,] "NA" "NA" ======================= Here is the script that I used while following the example: library(AnnBuilder); pkgpath <- .find.package("AnnBuilder"); # test dataset pkgdir <- "/nethome/xpeng/linux/analysis/array/scripts/pkgs"; setwd(pkgdir); geneNMap <- matrix(c("32468_f_at", "D90278;M16652", "32469_at", "L00693", "32481_at", "AL031663", "33825_at", "X68733", "35730_at", "X03350", "36512_at", "L32179", "38912_at", "D90042", "38936_at", "M16652", "39368_at", "AL031668"), ncol = 2, byrow = TRUE) write.table(geneNMap, file = "geneNMap", sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE) # get annotation info. makeSrcInfo() srcObjs <- list() egUrl <- "http://www.bioconductor.org/datafiles/wwwsources" ugUrl <- "http://www.bioconductor.org/datafiles/wwwsources/Ths.data.gz" eg <- EG(srcUrl = egUrl, parser = file.path(pkgpath, "scripts", "gbLLParser"), baseFile = "geneNMap", accession = "Tll_tmpl.gz", built = "N/A", fromWeb = TRUE) ug <- UG(srcUrl = ugUrl, parser = file.path(pkgpath, "scripts", "gbUGParser"), baseFile = "geneNMap", organism = "Homo sapiens", built = "N/A", fromWeb = TRUE) srcObjs[["eg"]] <- eg srcObjs[["ug"]] <- ug if(.Platform$OS.type != "windows"){ llMapping <- parseData(eg, eg at accession) colnames(llMapping) <- c("PROBE", "EG") ugMapping <- parseData(ug) colnames(ugMapping) <- c("PROBE", "UG") } # This portion only runs after the previous code has been # executed under windows if(.Platform$OS.type != "windows"){ llMapping ugMapping } # This portion only runs interactively under Windows (copy/paste) base <- matrix(scan("geneNMap", what = "", sep = "\t", quote = "", quiet = TRUE), ncol = 2, byrow = TRUE) colnames(base) <- c("PROBE", "ACC") merged <- merge(base, llMapping, by = "PROBE", all.x = TRUE) merged <- merge(merged, ugMapping, by = "PROBE", all.x = TRUE) unified <- AnnBuilder:::resolveMaps(merged, trusted = c("EG", "UG"), srcs = c("EG", "UG")) unified read.table(unified, sep = "\t", header = FALSE) if(.Platform$OS.type != "windows"){ # these two do not work for me # parser(eg) <- file.path(.path.package("AnnBuilder"), "scripts", "llParser") # baseFile(eg) <- unified attr(eg, "parser") <- file.path(.path.package("AnnBuilder"), "scripts", "llParser") attr(eg, "baseFile") <- unified annotation <- parseData(eg, eg at accession, ncol = 14) colnames(annotation) <- c("PROBE", "ACCNUM", "ENTREZID", "UNIGENE", "GENENAME", "SYMBOL","CHR", "MAP", "PMID", "GRIF", "SUMFUNC", "GO", "OMIM", "REFSEQ") } annotation gpUrl <- "http://www.bioconductor.org/datafiles/wwwsources/" goUrl <- "http://www.bioconductor.org/datafiles/wwwsources/Tgo.xml" gp <- GP(srcUrl = gpUrl, organism = "Homo sapiens", fromWeb = TRUE) go <- GO(srcUrl = goUrl, fromWeb = TRUE) strand <- AnnBuilder:::getChroLocation(srcUrl(gp), AnnBuilder:::gpLinkNGene(TRUE)); strand annotation <- merge(annotation, strand, by = "ENTREZID", all.x = TRUE); pkgName <- "myTestPkg" pkgPath <- getwd() createEmptyDPkg("myTestPkg", getwd(), force = TRUE) annotation <- as.matrix(annotation) annotation AnnBuilder:::writeAnnData2Pkg(annotation, pkgName, pkgPath) list.files(file.path(getwd(), "myTestPkg")) repList <- AnnBuilder:::getRepList("all", srcObjs) repList[["PKGNAME"]] <- pkgName AnnBuilder:::writeOrganism(pkgName, pkgPath, "Homo sapiens") AnnBuilder:::writeDocs("geneNMap", pkgName, pkgPath, "1.1.0", list(author = "annonymous", maintainer = "annonymous <annonymous at="" net.com="">"), repList, "PKGNAME") # clean up #unlink(c(unified, XMLOut, "geneNMap", "test.xml", "testByNum.xml")) #unlink(file.path(getwd(), "test"), TRUE) sessionInfo() ======================= This is the session info: R version 2.5.0 (2007-04-23) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .U TF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UT F- 8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_ ID ENTIFICATION=C attached base packages: [1] "tools" "stats" "graphics" "grDevices" "utils" "datasets" [7] "methods" "base" other attached packages: AnnBuilder annotate XML Biobase "1.14.0" "1.14.1" "1.7-3" "1.14.0" Best, Xinxia -----Original Message----- From: John Zhang [mailto:jzhang@jimmy.harvard.edu] Sent: Tuesday, May 01, 2007 5:51 AM To: bioconductor at stat.math.ethz.ch; Xinxia Peng Subject: Re: [BioC] AnnBuilder with custmerized GO annotation > >What do these after a GO term mean, '@E' or '@NR'? They are evidence code by GO. Please read the description available from GO web site for details. > >What I am trying to do is to build an annotation package for GO >enrichment analysis using GOstats. The GO annotation is from >InterProScan. I plan to create a data frame with three columns: probeid, >geneid and GO, then build the annotation package. Any suggestions? > >Thanks, >Xinxia > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Jianhua Zhang Department of Medical Oncology Dana-Farber Cancer Institute 44 Binney Street Boston, MA 02115-6084

ADD REPLY • link 18.8 years ago Xinxia Peng ▴ 120

0

Entering edit mode

Seth Falcon ★ 7.4k

@seth-falcon-992

Last seen 11.5 years ago

"Xinxia Peng" <xinxia.peng at="" sbri.org=""> writes: > Thanks. I thought about it, but I did not find the 'E' in this example > from GO website: That looks like a bug of some sort. The evidence codes should all be three letters I believe. > http://www.geneontology.org/GO.evidence.shtml > > Another question is for GOstats: is the evidence code involved in the > enrichment analysis? No, they are not used at present. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org

ADD COMMENT • link 18.8 years ago Seth Falcon ★ 7.4k

0

Entering edit mode

In the particular example, the corresponding protein on GO website is 'ARY2_HUMAN'. It has one associated GO term GO:0004060, the same one as I got from the example. But the evidence code is 'TAS'. I am wondering if it is related to the data stored on the bioconductor site, or something happened along the file parsing or something else. Xinxia -----Original Message----- From: Seth Falcon [mailto:sfalcon@fhcrc.org] Sent: Tuesday, May 01, 2007 12:10 PM To: Xinxia Peng Cc: John Zhang; bioconductor at stat.math.ethz.ch Subject: Re: [BioC] AnnBuilder with custmerized GO annotation "Xinxia Peng" <xinxia.peng at="" sbri.org=""> writes: > Thanks. I thought about it, but I did not find the 'E' in this example > from GO website: That looks like a bug of some sort. The evidence codes should all be three letters I believe. > http://www.geneontology.org/GO.evidence.shtml > > Another question is for GOstats: is the evidence code involved in the > enrichment analysis? No, they are not used at present. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org

ADD REPLY • link 18.8 years ago Xinxia Peng ▴ 120

0

Entering edit mode

Nianhua Li ▴ 870

@nianhua-li-1606

Last seen 11.5 years ago

Hi, Xinxia, I believe the problem is related to the data. The data is was created a couple of years ago. I guess maybe we updated the parser according to the changes on GO website, but didn't update the data. The data is mean to be an example and definitely won't be consistent with latest GO data. We will try to update the vignette in the future. In regarding to your task, I would modify AnnBuilder code if I were you. The related codes are in function getAnnData() in file AnnBuilder/R/ABPkgBuilder.R: # Parse gene2go.gz parser(srcObjs[["eg"]]) <- getBaseParsers("eggo") go <- try(parseData(srcObjs[["eg"]], srcObjs[["eg"]]@go, ncol = 2, mergeKey = FALSE)) colnames(go) <- c("PROBE", "GO") options(show.error.messages = TRUE) if(inherits(annotation, "try-error")){ stop(paste("Parsing Entrez Gene gene2go.gz failed because of:\n\n", annotation)) } if(nrow(go) > 0) annotation <- merge(annotation, go, by = "PROBE", all.x = TRUE) options(show.error.messages = FALSE) Remove the first 2 lines. Create a data frame of 2 columns. The first column is probeset id, the second column is go id. Assign this data frame to variable "go". Then you are good to go. When you invoke ABPkgBuilder, give your probeset to Entrez Gene mapping as base mapping file and "ll" as baseMapType. Maybe you want to insert "browser()" into the code (say at the beginning of getAnnData()) to help you on debugging. I find it very useful when I work with AnnBuilder. hope this helps nianhua

ADD COMMENT • link 18.8 years ago Nianhua Li ▴ 870

Login before adding your answer.