extract subsequence(s) from DNAStringSet with getSeq
1
0
Entering edit mode
@timotheeflutre-6727
Last seen 5.6 years ago
France

I am writing a function which takes 2 inputs, a fasta file (with sequences) and a data.frame (with subsequence coordinates), and returning as output the given subsequences. Inside the function, the fasta file is loaded, the data.frame is converted into a GRanges, and the subsequences are extracted using getSeq.

Here is how it looks like on a quick example:

records <- Biostrings::readDNAStringSet(filepath=in.fa, format="fasta")
sub.info.gr <- GenomicRanges::GRanges(
    seqnames=Rle("chr2"),
    ranges=IRanges::IRanges(start=4890,
                            end=5040))
names(sub.info.gr) <- "sl"
sub.records <- Biostrings::getSeq(x=records, sub.info.gr)

Unfortunately, the last command returns an error:

Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘getSeq’ for signature ‘"DNAStringSet"’

This is surprising because of what is the following command is returning:

library(BSgenome)
Function: getSeq (package Biostrings)
x="BSgenome"
x="FaFile"
x="FaFileList"
x="TwoBitFile"
x="XStringSet"

Once I executed the last command (just above, that is after loading BSgenome), I re-executed the problematic command, and now it returns another error:

sub.records <- Biostrings::getSeq(x=records, sub.info.gr)
Error in .unlist_RL_subscript(i, x) : could not find function "shift"

Here are the package versions I am using:

sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
loaded via a namespace (and not attached)
[1] IRanges_2.4.8 XML_3.98-1.4
[3] Rsamtools_1.22.0 Biostrings_2.38.4
[5] GenomicAlignments_1.6.3 bitops_1.0-6
[7] GenomeInfoDb_1.6.3 futile.options_1.0.0
[9] stats4_3.2.2 zlibbioc_1.16.0
[11] XVector_0.10.0 S4Vectors_0.8.11
[13] futile.logger_1.4.1 lambda.r_1.1.7
[15] BiocParallel_1.4.3 tools_3.2.2
[17] Biobase_2.30.0 BSgenome_1.38.0
[19] RCurl_1.95-4.8 rtracklayer_1.30.4
[21] parallel_3.2.2 compiler_3.2.2
[23] BiocGenerics_0.16.1 GenomicRanges_1.22.4
[25] SummarizedExperiment_1.0.2

Is this error fixed in more recent versions? Which package(s) should I upgrade? Or most importantly, is there another way to do what I want with the versions of the packages I currently have installed?

dnastringset getseq • 5.7k views
ADD COMMENT
3
Entering edit mode
@herve-pages-1542
Last seen 9 hours ago
Seattle, WA, United States

Hi Tim,

You're using BioC 3.2, which is not current. The current release is BioC 3.3 (note that it requires R 3.3). We only support the current version so we highly recommend that you update your installation to BioC 3.3 / R 3.3.

Your code works for me (I'm using BioC 3.3). But yes, you need to load the BSgenome package before you try to run the code because that's where the getSeq method for DNAStringSet objects is defined.

Note that you can coerce a data frame to a GRanges object with as() so a simpler version of your code is:

library(BSgenome)

records <- readDNAStringSet(in.fa, format="fasta")
sub.info <- data.frame(chrom="chr2", start=4890, end=5040)
getSeq(records, as(sub.info, "GRanges"))

See my sessionInfo() below.

Cheers,

H.

> sessionInfo()
R version 3.3.0 beta (2016-03-30 r70404)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] BSgenome_1.40.1      rtracklayer_1.32.1   Biostrings_2.40.2   
[4] XVector_0.12.0       GenomicRanges_1.24.2 GenomeInfoDb_1.8.3  
[7] IRanges_2.6.1        S4Vectors_0.10.2     BiocGenerics_0.18.0 

loaded via a namespace (and not attached):
 [1] XML_3.98-1.4               Rsamtools_1.24.0          
 [3] bitops_1.0-6               GenomicAlignments_1.8.4   
 [5] zlibbioc_1.18.0            BiocParallel_1.6.2        
 [7] tools_3.3.0                Biobase_2.32.0            
 [9] RCurl_1.95-4.8             SummarizedExperiment_1.2.3
ADD COMMENT

Login before adding your answer.

Traffic: 674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6