Hi,
I'm trying to retrieve the sequences of a certain transcript represented as a GenomicRanges
object.
Here's the data.frame
of that transcript:
> exon.df
seqnames start end strand
NW_004624885.1 3278632 3279702 +
NW_004624885.1 3280386 3280638 +
NW_004624885.1 3280818 3280887 +
NW_004624885.1 3280993 3281162 +
NW_004624885.1 3281232 3281476 +
NW_004624885.1 3282194 3282260 +
NW_004624885.1 3282342 3282400 +
The genome fasta file was read into a DNAStringSet
object using Biostrings::readDNAStringSet
:
> class(genome.fa) [1] "DNAStringSet" attr(,"package") [1] "Biostrings" > head(genome.fa) A DNAStringSet instance of length 6 width seq names [1] 78885850 AAGCCCTAAAAGAAAATAACTTGCAACCTAGACTGTTATATCCAGCAAAATTATCTTTCAAAATTGATGGGAAAATTAGATACTTCCATGA...CCAGCTGGCATCTCCAAACCTTTCCAAACTGCTGTGGACCACTTCCACAGGCCTCTGAACCACCTGGTGAGGTACAGTCGCTGCAGTGACC NW_004624730.1 [2] 46765855 AGGAATGGGGAAAAAATAAGAACCAAGATGCATTATGTAGAGGTACAATTCCCAATGAGAAATGTAATTGTGTTGTTTACCTAAAATGTAC...ATATGAATTTAATAGGGATGGCCGTTATCCTGTAGAATTGGCATTCTTGTAAGTGATGAAAACACACACACACACACACACACACACACAC NW_004624731.1 [3] 47073470 TAATGAACAATGAAAAGATTTTAAAAATTGACTAAAAATAAAAGTTCCATATCAGTGAGTTACAGGTACATTACTTTAATAAAAATGAATC...TTTTGTTGAGCTTTTTAGATGTATAAATGAATGTTTTTCATTAAATTTAGGGAATTTTCTAGTTCTAATAACTGTTAAGCCCCTCTAATGA NW_004624732.1 [4] 43462546 TAAAGTGCAGTTTCTTCTCAGAGAGAGCAGTTAGATCGCTCTCACTGCAGCAGTACTCTGCACTCTGTGAATGTCATGGAACAATATTCAC...GACCAAGTCACTCCCCTCGAGGTCCTGCCCTTGACACCTGCAGGGCACGTGAGTGACACCTCTCGGGGGCAGGTCCAACCTTGGCTCTTGG NW_004624733.1 [5] 42019962 TAACATTATGGTCTCAAGGTGCATCTATTTTCCTACAAATTTCCTACCAAGTTTCCTTCAAGATAATCCTTCTATATGACATAGTAATAAT...AGGTTATTATTTTTATTTCAAAGATGATTAGTGATAGCATTTATTCATAGAAATGCTGTTGGCCATTTCTTTTTTTAAAAATTAATTTTAT NW_004624734.1 [6] 42145720 GGGCCCCTTGTCTATTCCTATGGACTGCCTCGGCCTTCCTGACCCCACCAAGGCGCATGTGTGCGCTGCGTTGGCCAATCCAGGCTCACCC...TGGTTATTGGGATTGGGTCCTATACCTTAGGGTTAGGGTTAGGGTTAGGTGTAGGTTTATGTTTAGGGTTCCAATTAATACACCTGACATT NW_004624735.1
Now, trying to run getSeq
:
BSgenome::getSeq(fa,as(exon.df,"GRanges"))
I get this error:
Error in .Call(.NAME, ..., PACKAGE = PACKAGE) : negative length vectors are not allowed
Is it because the genome fasta file was not read all the way through?
width(genome.fa)[which(names(genome.fa) == "NW_004624885.1")]
returns:
3740447
But then even trying to get the sequence of the last exon:
subseq(query.genome.fa, start=3282342, end=3282400)
returns an error:
Error in .Call2("solve_user_SEW", refwidths, start, end, width, translate.negative.coord, : solving row 167: 'allow.nonnarrowing' is FALSE and the supplied start (3282342) is > refwidth + 1
> sessionInfo()
R version 3.3.2 (2016-10-31) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 [8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] BSgenome_1.42.0 biomaRt_2.30.0 gplots_3.0.1 reshape2_1.4.2 plotrix_3.6-3 Hmisc_3.17-4 Formula_1.2-1 survival_2.40-1 lattice_0.20-34 annotationData_0.1.0 [11] plyr_1.8.4 magrittr_1.5 gtable_0.2.0 gridExtra_2.2.1 plotly_4.7.0 ggplot2_2.2.1.9000 eulerr_2.0.0 dplyr_0.5.0 rtracklayer_1.34.1 data.table_1.9.6 [21] doParallel_1.0.10 iterators_1.0.8 foreach_1.4.3 GenomicRanges_1.26.2 GenomeInfoDb_1.10.0 snpEnrichment_1.7.0 stringi_1.1.5 Biostrings_2.42.1 XVector_0.14.0 IRanges_2.8.1 [31] S4Vectors_0.12.1 BiocGenerics_0.20.0 loaded via a namespace (and not attached): [1] Biobase_2.34.0 httr_1.2.1 tidyr_0.6.3 jsonlite_1.4 viridisLite_0.2.0 splines_3.3.2 gtools_3.5.0 [8] assertthat_0.2.0 latticeExtra_0.6-28 Rsamtools_1.26.1 RSQLite_1.0.0 chron_2.3-47 digest_0.6.12 RColorBrewer_1.1-2 [15] colorspace_1.3-2 htmltools_0.3.6 Matrix_1.2-7.1 XML_3.98-1.4 zlibbioc_1.20.0 purrr_0.2.2.2 scales_0.4.1 [22] gdata_2.17.0 BiocParallel_1.8.1 tibble_1.3.3 SummarizedExperiment_1.2.3 nnet_7.3-12 lazyeval_0.2.0 foreign_0.8-67 [29] tools_3.3.2 stringr_1.2.0 munsell_0.4.3 cluster_2.0.5 AnnotationDbi_1.36.0 snpStats_1.24.0 caTools_1.17.1 [36] rlang_0.1.1 RCurl_1.95-4.8 htmlwidgets_0.8 bitops_1.0-6 codetools_0.2-15 DBI_0.5-1 R6_2.2.0 [43] GenomicAlignments_1.8.4 KernSmooth_2.23-15 Rcpp_0.12.11.1 rpart_4.1-10 acepack_1.4.1