athPkgBuilder
3
0
Entering edit mode
Tine Casneuf ▴ 80
@tine-casneuf-1773
Last seen 10.3 years ago
Hi, I am trying to use AnnBuilder to build a annotation package for the Arabidopsis ATH1 array. The athPkgBuilder function in AnnBuilder doesn't work properly because one of the files is no longer available from TAIRs ftp server. Of the files that this function looks for: fileExt = list( estAssign = "Genes/est_mapping/est.Assignment.Locus", seqGenes = "Genes/TAIR_sequenced_genes", go = "Ontologies/Gene_Ontology/ATH_GO_GOSLIM.20050827.txt", aliases = "Genes/gene_aliases.20041105", pathway = "Pathways/aracyc_dump_20050412", pmid = "Genes/Gene_Anatomy/ATH_Anatomy.20040209.txt"), the last one, ATH_Anatomy.20040209.txt, is no longer available. Because the PMIDs are also available in the GO file, I removed the pmid parameters from the list and changed the go parameters to these yesterday: fileExt = list(.... go = "Ontologies/Gene_Ontology/ATH_GO_GOSLIM.20060805.txt" ...) ncols = list(... go = 12...) cols2Keep = list( ... go = c(1, 5, 9, 10)... ) colNames = list(.... go = c("ACCNUM", "GO", "EVID", "PMID"), ...) but some of the environments in the package didn't build properly, so I can't use those. Does anyone by any change maybe have an updated, correct version of this function? Many thanks in advance, Tine
Annotation GO AnnBuilder Annotation GO AnnBuilder • 1.0k views
ADD COMMENT
0
Entering edit mode
Nianhua Li ▴ 870
@nianhua-li-1606
Last seen 10.3 years ago
Good question, Tine, Change pmid url to "Ontologies/Plant_Ontology/stru-060505.txt". FYI, this is the script for building ath1121501 in the current release: library(AnnBuilder) ath1121501 <- function(pkgPath, version) { athPkgBuilder(baseName="ath1121501.GeneBankID", pkgName="ath1121501", pkgPath=pkgPath, version=version, author=list( authors="Ting-Yuan Liu, ChenWei Lin, Seth Falcon, Jianhua Zhang, James W. MacDonald", maintainer="Biocore Data Team <biocannotation at="" lists.fhcrc.org="">" ), fileExt = list( estAssign = "Genes/est_mapping/est.Assignment.Locus", seqGenes = "Genes/TAIR_sequenced_genes", go = "Ontologies/Gene_Ontology/ATH_GO_GOSLIM.20060318.txt", ##aliases = "Genes/gene_aliases.20041105", aliases = "Genes/gene_aliases.20051208", pathway = "Pathways/aracyc_dump_20060214", ##pmid = "Ontologies/TAIR_Ontology/ATH_anatomy.20050119.txt") pmid = "Ontologies/Plant_Ontology/stru-060309.txt") ) } hope it is helpful PS: please subscribe to this mailing list. thanks. nianhua
ADD COMMENT
0
Entering edit mode
Nianhua Li ▴ 870
@nianhua-li-1606
Last seen 10.3 years ago
Dear All, As promised, I updated athPkgBuilder (AnnBuilder v 1.11.8), just commit to devel svn repository. Here are the changes: 1. Previously the URL of data sources were specified in parameter fileExt. This remain unchanged. But now you can use function getFileExt to generate a list, and feed it directly to parameter fileExt: > getFileExt("AG") $base [1] "Microarrays/Affymetrix/affy_AG_array_elements-2006-07-14.txt" $estAssign [1] "Genes/est_mapping/est.Assignment.Locus" $seqGenes [1] "Genes/TAIR_sequenced_genes" $go [1] "Ontologies/Gene_Ontology/ATH_GO_GOSLIM.20060815.txt" $aliases [1] "Genes/gene_aliases.20060620" $aracyc [1] "Pathways/aracyc_dump_20060214" $kegg [1] "/ath/ath_gene_map.tab" $pmid [1] "User_Requests/LocusPublished.08012006.txt" Function getFileExt takes the chip name as input, either "ATH" or "AG". 2. If you compare the above list with the one athPkgBuilder had before, there are a few changes: $base: a new slot, the URL of the probe-to-gene mapping file. It is used only when no baseName is given. In another word, users can give their own mapping or set baseName=NULL (default) and use TAIR's. $aracyc: the slot name was $path before. I changed it to $aracyc to clarify that the data comes from AraCyc. The enzyme annotation from AraCyc is stored in environment "ENZYME" in the final package. The pathway annotation is stored in environment "ARACYC" in the final package. $kegg: a new slot, the URL of KEGG's pathway data. The pathway annotation is stored in environment "PATH" in the final package. Noticed that environment "PATH" was obtained from AraCyc before. So, this is a change. The main reason for the change is that we get pathway data from KEGG for all other annotation packages. $pmid: use a different file from TAIR now. Thanks for Tine's contribution. 3. when a probeset ID matches multiple genes: There is a new parameter "indexby" for function "athPkgBuilder". The value is either "PROBE" (default) or "ACCNUM". (1) If indexby="PROBE": If a probeset ID matches multiple genes, it is annotated with character string "multiple" in all annotations (e.g. agACCNUM, agGO, etc). But there is a new environment "MULTIHIT" (e.g. agMULTIHIT), whose key are probeset IDs, and values are AGI locus ID. All probeset IDs are included. If the probeset matches one or none gene, its value in "MULTIHIT" is NA, otherwise is a vector of all matching AGI locus IDs. (2) If indexby="ACCNUM": All annotations are indexed by AGI locus IDs rather than probeset IDs. For example, environment "agGO" uses AGI locus ID as key, and GO annotation as value. All the AGI locus IDs ever occur in the probe-to-gene mapping file are included. Then environment "ACCNUM" (e.g. agACCNUM) provide probe-to-gene mapping: key is probeset ID, and value is AGI locus ID. 4. other issues: (1) GO annotation: Thomas suggest to get GO annotation from GO.org instead of TAIR. I contacted TAIR, and here is the reply: The 2 files should be the same at same point in time. The GO database is more up to date because it gets updates from our curation database every night, whereas the TAIR database is updated every 2 weeks, at least for the time being. However, as an exception, last night there was a problem with the update in the GO database and the data there is incomplete. After tonight's update the data in GO should be fine. It seems the two files are almost the same. Therefore, I prefer not to change them, just because I am lazy :) (2) CHRLOC: currently obtained from ftp://ftp.arabidopsis.org/home/tair/Genes/est_mapping/est.Assignment.L ocus . Maybe we should change the source. But again, I will follow your suggestions. I didn't update the data package ath1121501 and ag, because the source data of all annotation packages were suppose to be obtained in April, 2006. Any feedback/bug report for the changes are highly appreciated. Thanks! nianhua
ADD COMMENT
0
Entering edit mode
Nianhua Li ▴ 870
@nianhua-li-1606
Last seen 10.3 years ago
Sorry, here is an example script: library(AnnBuilder) fileExt <- getFileExt("ATH1") ath1121501 <- function(pkgPath, version) { athPkgBuilder( pkgName="ath1121501", pkgPath=pkgPath, version=version, author=list( authors="Ting-Yuan Liu, ChenWei Lin, Seth Falcon, Jianhua Zhang, James W. MacDonald", maintainer="Biocore Data Team <biocannotation at="" lists.fhcrc.org="">" ), fileExt = fileExt ) } ath1121501(getwd(), "1.12.2") To index the annotation with AGI locus ID, add indexby="ACCNUM". thanks nianhua
ADD COMMENT

Login before adding your answer.

Traffic: 429 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6