Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 10.2 years ago
Hi,
I'm working with the latest annotation set from Ensembl (ens73) which
is based on the patched GRCh37.p12 assembly. I have retrieved the
transcript set from Ensembl biomart using
GenomicFeatures:makeTranscriptDbFromBiomart().
One of the things I'd like to do is create a DNAStringSet of sequences
for all the transcripts in my transcriptDB using the
GenomicFeatures:extractTranscriptsFromGenome() function. This takes a
TDB and a BSGenomes object as input. However, the latest BSGenomes
available for the human is UCSC.hg19, which is unpatched. When I run
the command, I get the error:
Error in .getOneSeqFromBSgenomeMultipleSequences(x, names[i],
start[i], :
sequence ^1$ not found
I'm pretty sure this is because the transcriptDB contains sequences
(patches/scaffolds) that are present in the patched assembly but not
the base GRCh37 assembly. Additionally the nomenclature is different
between UCSC and Ensembl (e.g. chr1 ; 1).
I see a few options here. One obvious one would be to stick with UCSC
hg19 and use the UCSC ensGene table, but others in my working group
are using ens73 so this is a suboptimal solution. Is there an updated
BSGenome available for GRCh37.p12, or an easy way to forge one? Have
others encountered this issue?
Thanks!
Tim Johnstone
-- output of sessionInfo():
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grDevices datasets splines tcltk utils parallel stats
graphics methods
[10] base
other attached packages:
[1] BSgenome.Hsapiens.UCSC.hg19_1.3.19 BiocInstaller_1.12.0
[3] data.table_1.8.10 Hmisc_3.12-2
[5] Formula_1.1-1 survival_2.37-4
[7] plyr_1.8 gdata_2.13.2
[9] ShortRead_1.20.0 lattice_0.20-24
[11] rtracklayer_1.22.0 Rsamtools_1.14.1
[13] BSgenome.Drerio.UCSC.danRer7_1.3.17 BSgenome_1.30.0
[15] Biostrings_2.30.0 lessR_2.9.7
[17] GenomicFeatures_1.14.0 AnnotationDbi_1.24.0
[19] Biobase_2.22.0 GenomicRanges_1.14.3
[21] XVector_0.2.0 IRanges_1.20.4
[23] BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] biomaRt_2.18.0 bitops_1.0-6 car_2.0-19
cluster_1.14.4
[5] DBI_0.2-7 foreign_0.8-57 grid_3.0.2
gtools_3.1.0
[9] hwriter_1.3 latticeExtra_0.6-26 leaps_2.9
MASS_7.3-29
[13] MBESS_3.3.3 nnet_7.3-7 RColorBrewer_1.0-5
RCurl_1.95-4.1
[17] rpart_4.1-3 RSQLite_0.11.4 stats4_3.0.2
tools_3.0.2
[21] XML_3.95-0.2 zlibbioc_1.8.0
--
Sent via the guest posting facility at bioconductor.org.