Dear all,
I try used the Biostrings/BSgenome utilities to extract DNA sequences
for Entrez genes. It worked fine till I am ready to output the
extracted sequence to a fasta file. Because writeXStringSet is the
only function for writing fasta files, which only works with an
XStringSet object. I need to convert my list of DNAString objects into
an XStringSet object. Unfortunately, the converter/constructor
BStringSet only works with lists of a few DNAString elements. It
produces error on larger lists as below. Not sure how to deal with the
issue. Thanks for any suggestions/inputs in advance!
Heyi
> exonSeq.set=BStringSet(exonSeq.list[1:30])
Error in .Call2("SharedVector_mcopy", dest, dest.offset, src,
src.start, :
subscript out of bounds
> exonSeq.set=BStringSet(exonSeq.list[1:25])
> exonSeq.set=BStringSet(exonSeq.list[1:26])
Error in .Call2("SharedVector_mcopy", dest, dest.offset, src,
src.start, :
subscript out of bounds
> exonSeq.set=BStringSet(exonSeq.list[26:30])
> exonSeq.set=BStringSet(exonSeq.list[26:40])
Error in .Call2("SharedVector_mcopy", dest, dest.offset, src,
src.start, :
subscript out of bounds
> head(exonSeq.list,3)
$`442993`
133057-letter "DNAString" instance
seq: TGAGACGGCTTTTATTCCTGAGCTTCTGCTGCTCAC...AAAGCTGTCATCAATGAAAAAAGGTA
AGAGAAAAAC
$`442994`
23917-letter "DNAString" instance
seq: CAGTTCTGACCCACTTCAAGGTTACATCTCCAAGGT...CTTACGATTTTTGCAGATAAAAAATT
TATCTGCAAA
$`442995`
21718-letter "DNAString" instance
seq: GTCTTCTCTCCTTGCTGCTCTCAGGTAGGGGCTGGG...GGAAGAAGCAGAATAAAGCAATTTTC
CTTGAAGTGA
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] BSgenome.Oaries.NCBI.Oar3.1_1.0 Biobase_2.21.6
[3] BSgenome_1.29.0 Biostrings_2.29.14
[5] GenomicRanges_1.13.35 XVector_0.1.0
[7] IRanges_1.19.19 BiocGenerics_0.7.3
loaded via a namespace (and not attached):
[1] stats4_3.0.1 tools_3.0.1
Hi,
I cannot reproduce this.
I've tried to call BStringSet() on a list of 100 DNAString
objects of 25 million letters each and it worked.
Can you please provide a self-contained reproducible example?
Thanks,
H.
On 10/18/2013 10:46 AM, heyi xiao wrote:
> Dear all,
> I try used the Biostrings/BSgenome utilities to extract DNA
sequences for Entrez genes. It worked fine till I am ready to output
the extracted sequence to a fasta file. Because writeXStringSet is the
only function for writing fasta files, which only works with an
XStringSet object. I need to convert my list of DNAString objects into
an XStringSet object. Unfortunately, the converter/constructor
BStringSet only works with lists of a few DNAString elements. It
produces error on larger lists as below. Not sure how to deal with the
issue. Thanks for any suggestions/inputs in advance!
> Heyi
>
>> exonSeq.set=BStringSet(exonSeq.list[1:30])
> Error in .Call2("SharedVector_mcopy", dest, dest.offset, src,
src.start, :
> subscript out of bounds
>> exonSeq.set=BStringSet(exonSeq.list[1:25])
>> exonSeq.set=BStringSet(exonSeq.list[1:26])
> Error in .Call2("SharedVector_mcopy", dest, dest.offset, src,
src.start, :
> subscript out of bounds
>> exonSeq.set=BStringSet(exonSeq.list[26:30])
>> exonSeq.set=BStringSet(exonSeq.list[26:40])
> Error in .Call2("SharedVector_mcopy", dest, dest.offset, src,
src.start, :
> subscript out of bounds
>
>> head(exonSeq.list,3)
> $`442993`
> 133057-letter "DNAString" instance
> seq: TGAGACGGCTTTTATTCCTGAGCTTCTGCTGCTCAC...AAAGCTGTCATCAATGAAAAAAGG
TAAGAGAAAAAC
>
> $`442994`
> 23917-letter "DNAString" instance
> seq: CAGTTCTGACCCACTTCAAGGTTACATCTCCAAGGT...CTTACGATTTTTGCAGATAAAAAA
TTTATCTGCAAA
>
> $`442995`
> 21718-letter "DNAString" instance
> seq: GTCTTCTCTCCTTGCTGCTCTCAGGTAGGGGCTGGG...GGAAGAAGCAGAATAAAGCAATTT
TCCTTGAAGTGA
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
methods
> [8] base
>
> other attached packages:
> [1] BSgenome.Oaries.NCBI.Oar3.1_1.0 Biobase_2.21.6
> [3] BSgenome_1.29.0 Biostrings_2.29.14
> [5] GenomicRanges_1.13.35 XVector_0.1.0
> [7] IRanges_1.19.19 BiocGenerics_0.7.3
>
> loaded via a namespace (and not attached):
> [1] stats4_3.0.1 tools_3.0.1
>
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319