Question

AnnBuilder package: problem with gbNRef

0

Entering edit mode

Craddock, Richard C. CDC/NCID/VR CTR ▴ 50

@craddock-richard-c-cdcncidvr-ctr-1816

Last seen 9.6 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20060725/ ab39efbb/attachment.pl

• 625 views

ADD COMMENT • link updated 17.7 years ago by Nianhua Li ▴ 870 • written 17.7 years ago by Craddock, Richard C. CDC/NCID/VR CTR ▴ 50

score 0 · Answer 1 · 2006-07-25

Sorry to send the original in html. Hopefully this works better .... -cameron ---------------------------------------------------------------------- -- ---- Hello all, I am having a similar problem to Weijun's but the posted fixes are not working for me. I am trying to build a "gbNRef" package but the annotation information is missing. I have upgraded to AnnBuilder 1.11.5 from svn, and I have tried both the two and three column base file. I have intermittent problems connecting to the NCBI ftp site, so I copied the necessary files locally. Here is the code I am using: myBase <- "/home/cameron/microarray_data/mwg40ka_basname_named.tdf" myBaseType <- "gbNRef" mySrcUrls <- getSrcUrl("all", "Homo sapiens") mySrcUrls[[7]]<-"file:///home/cameron/microarray_data/annotate" mySrcUrls[[2]]<-"file:///home/cameron/microarray_data/annotate/UniGene " mySrcUrls[[4]]<-"file:///home/cameron/microarray_data/annotate/KEGG/pa th ways" myDir <- "/home/cameron/microarray_data/mwgAnnotate" ABPkgBuilder( baseName=myBase, srcUrls=mySrcUrls, baseMapType=myBaseType, pkgName="mwg40kA", pkgPath=myDir, organism="Homo sapiens", version="0.10", author=list(authors="R. Cameron Craddock", maintainer="R. Cameron Craddock <cmi5 at="" cdc.gov="">"), fromWeb=TRUE) Here is the output from ABPkgBuilder: Attaching package: 'GO' The following object(s) are masked from package:AnnBuilder : GO Read 1 item Read 1 item Failed to get data from URL: ftp://ftp.genome.ad.jp/pub/kegg/pathways/hsa/hsa00195.gene Failed to get data from URL: ftp://ftp.genome.ad.jp/pub/kegg/pathways/hsa/hsa00231.gene Failed to get data from URL: ftp://ftp.genome.ad.jp/pub/kegg/pathways/hsa/hsa00253.gene ... ... ( removed a bunch of others ) ... Failed to get data from URL: ftp://ftp.genome.ad.jp/pub/kegg/pathways/hsa/hsa07217.gene [1] "4028 2 2" The following data sets have been added to the database and will be removed: [1] "/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kAACCNUM. rd a" [2] "/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kACHRLENG TH S.rda" [3] "/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kACHRLOC. rd a" [4] "/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kAENZYME. rd a" [5] "/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kALOCUSID .r da" [6] "/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kAMAPCOUN TS .rda" [7] "/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kAORGANIS M. rda" [8] "/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kAPATH.rd a" [9] "/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kAPFAM.rd a" [10] "/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kAPROSITE .r da" [11] "/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kAQCDATA. rd a" [12] "/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kAQC.rda" None of the files listed in the above warnings exist at the specified location. I verified this using ncftp. After ABPkgBuilder finishes I perform the following steps: R CMD check mwg40kA/ (only warning is that data directory is empty) R CMD build mwg40kA/ (no errors, no warnings) R CMD INSTALL mwg30kA_0.10.tar.gz * Installing *source* package 'mwg40kA' ... ** R ** data ** moving datasets to lazyload DB ** help >>> Building/Updating help pages for package 'mwg40kA' Formats: text html latex example mwg40kA text html latex mwg40kAACCNUM text html latex example mwg40kACHRLENGTHS text html latex example mwg40kACHRLOC text html latex example mwg40kAENZYME text html latex example mwg40kALOCUSID text html latex example mwg40kAORGANISM text html latex example mwg40kAPATH text html latex example mwg40kAPFAM text html latex example mwg40kAPROSITE text html latex example mwg40kAQC text html latex mwg40kAQCDATA text html latex ** building package indices ... * DONE (mwg40kA) This is what I get when I load the library: > library(mwg40kA) > mwg40kA() Quality control information for mwg40kA Date built: Created: Tue Jul 25 16:48:06 2006 Number of probes: 20160 Probe number missmatch: None Probe missmatch: None Mappings found for probe based rda files: mwg40kAACCNUM found 19760 of 20160 mwg40kACHRLOC found 0 of 20160 mwg40kAENZYME found 0 of 20160 mwg40kALOCUSID found 0 of 20160 mwg40kAPATH found 0 of 20160 Mappings found for non-probe based rda files: mwg40kACHRLENGTHS found 25 mwg40kAORGANISM found 1 mwg40kAPFAM found 0 mwg40kAPROSITE found 0 The mwg40kAACCNUM environment matches my basefile. Can anyone suggest a solution to my problem? Thanks for your help, Cameron

score 0 · Answer 2 · 2006-07-25

Hi, Richard, First to clarify, the patch in AnnBuilder v1.11.5 is useful only when the baseFile is probe-to-Refseq mapping. If that is the case for your baseFile, your baseType should be "refseq". You use baseType "gbNRef" only when your baseFile is probeset ID to GenBank accession mapping. If your baseFile is probe-to-GenBank mapping, then AnnBuilder should be able to generate the correct result. In fact, all the annotation packages provided by bioc core team are generated by AnnBuilder and the baseType are all "gbNRef". I guess the reason you didn't get the expected output is because you didn't set the local mirror of source data correctly. There is an instruction at AnnBuilder/inst/doc/mirroringDataResources.rst. If you still have problem, please include the following information in your next post so that your problem can be reproduced: (1) A small part of your baseFile, just like what Weijun did. (2) The file structure of your local mirror (file:///home/cameron/microarray_data/annotate). thanks nianhua

score 0 · Answer 3 · 2006-07-26

Hi, Cameron, Maybe you want to try baseType="refseq". I used the sample baseFile from your email with this script: ================================================================== library(AnnBuilder) mySrcUrls <- getSrcUrl("all", "Homo sapiens") mySrcUrls[[7]]<- "file:///home/cameron/microarray_data/annotate" mypkg <- function(pkgPath, version) { ABPkgBuilder(baseName="mybase.txt", baseMapType="refseq", srcUrls=mySrcUrls, pkgName="mypkg", pkgPath=pkgPath, organism="Homo sapiens", version=version, author=list( authors="R. Cameron Craddock", maintainer="R. Cameron Craddock <email at="" email.email="">" ) ) } mypkg(getwd(), "1.0.0") ================================================================== And here is the result: ================================================================== >ibrary(mypkg) >mypkg() Quality control information for mypkg Date built: Created: Wed Jul 26 12:18:11 2006 Number of probes: 22 Probe number missmatch: None Probe missmatch: None Mappings found for probe based rda files: mypkgACCNUM found 21 of 22 mypkgCHRLOC found 20 of 22 mypkgCHR found 20 of 22 mypkgENZYME found 0 of 22 mypkgGENENAME found 20 of 22 mypkgGO found 17 of 22 mypkgLOCUSID found 20 of 22 mypkgMAP found 19 of 22 mypkgOMIM found 18 of 22 mypkgPATH found 5 of 22 mypkgPMID found 20 of 22 mypkgREFSEQ found 20 of 22 mypkgSUMFUNC found 0 of 22 mypkgSYMBOL found 20 of 22 mypkgUNIGENE found 20 of 22 Mappings found for non-probe based rda files: mypkgCHRLENGTHS found 25 mypkgGO2ALLPROBES found 269 mypkgGO2PROBE found 73 mypkgORGANISM found 1 mypkgPATH2PROBE found 17 mypkgPFAM found 15 mypkgPMID2PROBE found 595 mypkgPROSITE found 13 ======================================================== What AnnBuilder does for your inputs is: (1) Use your "mixture of GenBank Accession and Ref Seq" to find the Entrez Gene ID (2) Use the Entrez Gene ID to find other annotations. If your base type is "gbNRef", it use ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz for GB to EZ mapping. If your base type is "refseq", it use ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz for mapping. You may want to check those files manually to see whether all your input IDs are included. If your input has mix ID types, then you have to get Entrez Gene IDs manually. hope it helps nianhua

score 0 · Answer 4 · 2006-07-26

Good morning, Thank you for your reply Nianhua. The base file that I have created is probeset ID to a mixture of GenBank Accession and Ref Seq, thus presumably "gbNRef" is the appropriate base type. I have tried updating my annotations to the latest version supplied by the vendor, and still haven't had any luck. Here is the file structure that I have created for the local copies of the files: /home/cameron/microarray_data/annotate contains the EG files /home/cameron/microarray_data/annotate/UniGene/Homo_sapiens contains the UniGene files /home/cameron/microarray_data/annotate/KEGG/pathways contains the contents of the KEGG pathway.tar.gz file. It would seem to me that if there were a problem with finding the appropriate date files that I would receive an error message. I have verified that the readURL and loadFromUrl functions work with the URLs I have supplied. Here is a sample from my basefile: a=read.delim('/home/cameron/microarray_data/mwg40ka_basefile.tdf',sep= '\ t', head=F)[119:140,] > a V1 V2 119 mwghum40K:A#09699 NG_002679 120 mwghum40K:A#10779 NM_014191 121 mwghum40K:A#00108 NM_016258 122 mwghum40K:A#00228 NM_005462 123 mwghum40K:A#00481 BC000631 124 mwghum40K:A#00652 NM_001167 125 mwghum40K:A#09199 NM_033341 126 mwghum40K:A#09493 NM_003310 127 mwgaracontrol#011-r1 <na> 128 mwghum40K:A#09703 BT019423 129 mwghum40K:A#00277 CR614804 130 mwghum40K:A#00396 NM_002307 131 mwghum40K:A#00487 NM_004488 132 mwghum40K:A#00591 U43148 133 mwghum40K:A#05083 NM_012282 134 mwghum40K:A#05232 NM_006933 135 mwghum40K:A#05372 NM_003156 136 mwghum40K:A#05445 BC016055 137 mwghum40K:A#10328 NM_014254 138 mwghum40K:A#10675 NM_003263 139 mwghum40K:A#10903 NM_003794 140 mwghum40K:A#10992 NM_002456 Thanks for your help, -Cameron