Sorry to send the original in html. Hopefully this works better ....
-cameron
----------------------------------------------------------------------
--
----
Hello all,
I am having a similar problem to Weijun's but the posted fixes are not
working for me. I am trying to build a "gbNRef" package but the
annotation information is missing. I have upgraded to AnnBuilder
1.11.5
from svn, and I have tried both the two and three column base file. I
have intermittent problems connecting to the NCBI ftp site, so I
copied
the necessary files locally. Here is the code I am using:
myBase <- "/home/cameron/microarray_data/mwg40ka_basname_named.tdf"
myBaseType <- "gbNRef"
mySrcUrls <- getSrcUrl("all", "Homo sapiens")
mySrcUrls[[7]]<-"file:///home/cameron/microarray_data/annotate"
mySrcUrls[[2]]<-"file:///home/cameron/microarray_data/annotate/UniGene
"
mySrcUrls[[4]]<-"file:///home/cameron/microarray_data/annotate/KEGG/pa
th
ways"
myDir <- "/home/cameron/microarray_data/mwgAnnotate"
ABPkgBuilder( baseName=myBase, srcUrls=mySrcUrls,
baseMapType=myBaseType, pkgName="mwg40kA", pkgPath=myDir,
organism="Homo
sapiens", version="0.10", author=list(authors="R. Cameron Craddock",
maintainer="R. Cameron Craddock <cmi5 at="" cdc.gov="">"), fromWeb=TRUE)
Here is the output from ABPkgBuilder:
Attaching package: 'GO'
The following object(s) are masked from package:AnnBuilder :
GO
Read 1 item
Read 1 item
Failed to get data from URL:
ftp://ftp.genome.ad.jp/pub/kegg/pathways/hsa/hsa00195.gene
Failed to get data from URL:
ftp://ftp.genome.ad.jp/pub/kegg/pathways/hsa/hsa00231.gene
Failed to get data from URL:
ftp://ftp.genome.ad.jp/pub/kegg/pathways/hsa/hsa00253.gene
...
... ( removed a bunch of others )
...
Failed to get data from URL:
ftp://ftp.genome.ad.jp/pub/kegg/pathways/hsa/hsa07217.gene
[1] "4028 2 2"
The following data sets have been added to the database and will be
removed:
[1]
"/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kAACCNUM.
rd
a"
[2]
"/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kACHRLENG
TH
S.rda"
[3]
"/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kACHRLOC.
rd
a"
[4]
"/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kAENZYME.
rd
a"
[5]
"/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kALOCUSID
.r
da"
[6]
"/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kAMAPCOUN
TS
.rda"
[7]
"/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kAORGANIS
M.
rda"
[8]
"/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kAPATH.rd
a"
[9]
"/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kAPFAM.rd
a"
[10]
"/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kAPROSITE
.r
da"
[11]
"/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kAQCDATA.
rd
a"
[12]
"/home/cameron/microarray_data/mwgAnnotate/mwg40kA/data/mwg40kAQC.rda"
None of the files listed in the above warnings exist at the specified
location. I verified this using ncftp. After ABPkgBuilder finishes I
perform the following steps:
R CMD check mwg40kA/
(only warning is that data directory is empty)
R CMD build mwg40kA/
(no errors, no warnings)
R CMD INSTALL mwg30kA_0.10.tar.gz
* Installing *source* package 'mwg40kA' ...
** R
** data
** moving datasets to lazyload DB
** help
>>> Building/Updating help pages for package 'mwg40kA'
Formats: text html latex example
mwg40kA text html latex
mwg40kAACCNUM text html latex example
mwg40kACHRLENGTHS text html latex example
mwg40kACHRLOC text html latex example
mwg40kAENZYME text html latex example
mwg40kALOCUSID text html latex example
mwg40kAORGANISM text html latex example
mwg40kAPATH text html latex example
mwg40kAPFAM text html latex example
mwg40kAPROSITE text html latex example
mwg40kAQC text html latex
mwg40kAQCDATA text html latex
** building package indices ...
* DONE (mwg40kA)
This is what I get when I load the library:
> library(mwg40kA)
> mwg40kA()
Quality control information for mwg40kA
Date built: Created: Tue Jul 25 16:48:06 2006
Number of probes: 20160
Probe number missmatch: None
Probe missmatch: None
Mappings found for probe based rda files:
mwg40kAACCNUM found 19760 of 20160
mwg40kACHRLOC found 0 of 20160
mwg40kAENZYME found 0 of 20160
mwg40kALOCUSID found 0 of 20160
mwg40kAPATH found 0 of 20160
Mappings found for non-probe based rda files:
mwg40kACHRLENGTHS found 25
mwg40kAORGANISM found 1
mwg40kAPFAM found 0
mwg40kAPROSITE found 0
The mwg40kAACCNUM environment matches my basefile. Can anyone suggest
a
solution to my problem?
Thanks for your help,
Cameron
Hi, Richard,
First to clarify, the patch in AnnBuilder v1.11.5 is useful only when
the
baseFile is probe-to-Refseq mapping. If that is the case for your
baseFile, your
baseType should be "refseq". You use baseType "gbNRef" only when your
baseFile
is probeset ID to GenBank accession mapping.
If your baseFile is probe-to-GenBank mapping, then AnnBuilder should
be able to
generate the correct result. In fact, all the annotation packages
provided by
bioc core team are generated by AnnBuilder and the baseType are all
"gbNRef". I
guess the reason you didn't get the expected output is because you
didn't set
the local mirror of source data correctly. There is an instruction at
AnnBuilder/inst/doc/mirroringDataResources.rst.
If you still have problem, please include the following information in
your next
post so that your problem can be reproduced:
(1) A small part of your baseFile, just like what Weijun did.
(2) The file structure of your local mirror
(file:///home/cameron/microarray_data/annotate).
thanks
nianhua
Hi, Cameron,
Maybe you want to try baseType="refseq". I used the sample baseFile
from
your
email with this script:
==================================================================
library(AnnBuilder)
mySrcUrls <- getSrcUrl("all", "Homo sapiens")
mySrcUrls[[7]]<- "file:///home/cameron/microarray_data/annotate"
mypkg <- function(pkgPath, version) {
ABPkgBuilder(baseName="mybase.txt",
baseMapType="refseq",
srcUrls=mySrcUrls,
pkgName="mypkg",
pkgPath=pkgPath,
organism="Homo sapiens",
version=version,
author=list(
authors="R. Cameron Craddock",
maintainer="R. Cameron Craddock <email at="" email.email="">"
)
)
}
mypkg(getwd(), "1.0.0")
==================================================================
And here is the result:
==================================================================
>ibrary(mypkg)
>mypkg()
Quality control information for mypkg
Date built: Created: Wed Jul 26 12:18:11 2006
Number of probes: 22
Probe number missmatch: None
Probe missmatch: None
Mappings found for probe based rda files:
mypkgACCNUM found 21 of 22
mypkgCHRLOC found 20 of 22
mypkgCHR found 20 of 22
mypkgENZYME found 0 of 22
mypkgGENENAME found 20 of 22
mypkgGO found 17 of 22
mypkgLOCUSID found 20 of 22
mypkgMAP found 19 of 22
mypkgOMIM found 18 of 22
mypkgPATH found 5 of 22
mypkgPMID found 20 of 22
mypkgREFSEQ found 20 of 22
mypkgSUMFUNC found 0 of 22
mypkgSYMBOL found 20 of 22
mypkgUNIGENE found 20 of 22
Mappings found for non-probe based rda files:
mypkgCHRLENGTHS found 25
mypkgGO2ALLPROBES found 269
mypkgGO2PROBE found 73
mypkgORGANISM found 1
mypkgPATH2PROBE found 17
mypkgPFAM found 15
mypkgPMID2PROBE found 595
mypkgPROSITE found 13
========================================================
What AnnBuilder does for your inputs is:
(1) Use your "mixture of GenBank Accession and Ref Seq" to find the
Entrez Gene ID
(2) Use the Entrez Gene ID to find other annotations.
If your base type is "gbNRef", it use
ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz for GB to EZ mapping.
If your
base type is "refseq", it use
ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz for
mapping. You may want to check those files manually to see whether all
your
input IDs are included. If your input has mix ID types, then you have
to get
Entrez Gene IDs manually.
hope it helps
nianhua
Good morning,
Thank you for your reply Nianhua. The base file that I have created
is
probeset ID to a mixture of GenBank Accession and Ref Seq, thus
presumably "gbNRef" is the appropriate base type. I have tried
updating
my annotations to the latest version supplied by the vendor, and still
haven't had any luck.
Here is the file structure that I have created for the local copies of
the files:
/home/cameron/microarray_data/annotate contains the EG files
/home/cameron/microarray_data/annotate/UniGene/Homo_sapiens contains
the UniGene files
/home/cameron/microarray_data/annotate/KEGG/pathways contains the
contents of the KEGG pathway.tar.gz file.
It would seem to me that if there were a problem with finding the
appropriate date files that I would receive an error message. I have
verified that the readURL and loadFromUrl functions work with the URLs
I
have supplied.
Here is a sample from my basefile:
a=read.delim('/home/cameron/microarray_data/mwg40ka_basefile.tdf',sep=
'\
t',
head=F)[119:140,]
> a
V1 V2
119 mwghum40K:A#09699 NG_002679
120 mwghum40K:A#10779 NM_014191
121 mwghum40K:A#00108 NM_016258
122 mwghum40K:A#00228 NM_005462
123 mwghum40K:A#00481 BC000631
124 mwghum40K:A#00652 NM_001167
125 mwghum40K:A#09199 NM_033341
126 mwghum40K:A#09493 NM_003310
127 mwgaracontrol#011-r1 <na>
128 mwghum40K:A#09703 BT019423
129 mwghum40K:A#00277 CR614804
130 mwghum40K:A#00396 NM_002307
131 mwghum40K:A#00487 NM_004488
132 mwghum40K:A#00591 U43148
133 mwghum40K:A#05083 NM_012282
134 mwghum40K:A#05232 NM_006933
135 mwghum40K:A#05372 NM_003156
136 mwghum40K:A#05445 BC016055
137 mwghum40K:A#10328 NM_014254
138 mwghum40K:A#10675 NM_003263
139 mwghum40K:A#10903 NM_003794
140 mwghum40K:A#10992 NM_002456
Thanks for your help,
-Cameron