Question: What is the best way to subset a DNAStringSet by a GRange ?
gravatar for reuscher.stefan
14 months ago by
reuscher.stefan0 wrote:

Hi All,

I recently had some task which I thought should be easy to accomplish. I wanted to subset a DNAStringSet (a whole genome) using the information in a GRanges object (output of tileGenome()). I searched quite a bit in the help and here but could not find a "bioconductor" way of doing it and ended up for-looping through the ranges.

There is an exampe below.

My question is: Am I missing something or is there really no connection between GRanges and BioStrings ?

Best Regards,



Reproducible example:


testRanges <- GRanges(seqnames = c("seq1","seq1","seq2"),
                    ranges = IRanges(start = c(10,20,10),
                                     width = c(5,5,5))

testSeqs <- DNAStringSet(c("tagtagctagctacgatgcatgctacgatgcatgc",
names(testSeqs) <- c("seq1","seq2")

#what I want
#newTestSeqs <- subseq(testSeqs, by = testRanges)

#what I do
testMask <- data.frame(seq = as.character(seqnames(testRanges)),
                       start = start(testRanges),
                       end = end(testRanges))

newTestSeqs <- DNAStringSet()
for (i in 1:nrow(testMask)){

  temp <- testSeqs[testMask[i,"seq"]]

  temp <- subseq(temp,
                 start = testMask[i,"start"],
                 end = testMask[i,"end"])

  newTestSeqs <- c(newTestSeqs,temp)


ADD COMMENTlink modified 14 months ago by Michael Lawrence10.0k • written 14 months ago by reuscher.stefan0
gravatar for Michael Lawrence
14 months ago by
Michael Lawrence10.0k
United States
Michael Lawrence10.0k wrote:

subseq() acts like substring(): it is vectorized over its input (x), so the seqnames of the GRanges wouldn't apply. Instead, just use this syntax:

ADD COMMENTlink written 14 months ago by Michael Lawrence10.0k

Thanks Michael, that is exactly what is was looking for and could not find on any of the help files, (or I overlooked it).

Since this syntax is not obvious to me, do you mind explaining what is going on here. Suppose I want to isolate one subsequence from a DNAStringSet, I can put in a GRanges, but how can I do this using more low-level objects. For e.g. dataframes I do:

x[row, column]

So for DNAStringSets it must be somethings like:

testSeqs["seq1", 10, 5]

to achieve the same thing as the first range in the example above.

Best Regards,



ADD REPLYlink written 14 months ago by reuscher.stefan0

A DNAStringSet is one dimensional, so it should only take a single subscript. As a list, it is also hierarchical. The GRanges subscripting is essentially a shortcut for a "nested" subscript (like looping x[seqnames], extracting the corresponding range).

ADD REPLYlink written 14 months ago by Michael Lawrence10.0k

A hierarchical, one-dimensional subscript. That's exactly what I need to know to. I will figure it out from here.

Thanks for your great advice.

ADD REPLYlink written 14 months ago by reuscher.stefan0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 102 users visited in the last hour