Question

Extract sequence from DNAStringSet object

5

Entering edit mode

komal.rathi ▴ 120

@komalrathi-9163

Last seen 10 months ago

United States

Hi,

I am using Rsamtools to generate pileup from a list of positions. Rsamtools doesn't give the reference base in the output so I am trying to import my reference fasta and retrieve the reference bases from it. Here is my code:

positions_file <- read.delim('positions.txt',header=F)

head(positions_file)
V1       V2 
10  1156771 
10 37484026 
10 78483209 
10 82960189 
10  9551751 
11 19256468 

fa <- FaFile(file='gr37.fasta')
idx <- scanFaIndex(fa)
refbase <- getSeq(fa,GRanges(positions_file$V1,IRanges(start=as.numeric(positions_file$V2),end=as.numeric(positions_file$V2))))

head(refbase)
A DNAStringSet instance of length 185
width seq                                             names               
[1]     1 C                                               10
[2]     1 C                                               10
[3]     1 T                                               10
[4]     1 A                                               10
[5]     1 G                                               10
...   ... ...
[181]     1 T                                               3
[182]     1 A                                               3
[183]     1 A                                               3
[184]     1 A                                               3
[185]     1 C                                               4

class(refbase)
[1] "DNAStringSet"
attr(,"package")
[1] "Biostrings"

REF <- as.data.frame(refbase)$x # right now I am doing something like this to extract sequences

I can retrieve the width and names using width(refbase) and names(refbase) but I am unable to retrieve the sequences using a single function. I can retrieve it by converting it into a dataframe and extracting that column. Just wanted to know if there is an inbuilt function for that.

dnastringset • 18k views

ADD COMMENT • link updated 8.7 years ago by Martin Morgan 25k • written 8.7 years ago by komal.rathi ▴ 120

7

Entering edit mode

James W. MacDonald 66k

@james-w-macdonald-5106

Last seen 2 hours ago

United States

Use as.character().

ADD COMMENT • link 8.7 years ago James W. MacDonald 66k

score 3 · Accepted Answer · 2015-11-12

3

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 1 day ago

United States

The DNAStringSet is the sequence; work with that. For instance, create a DataFrame() from the result returned by pileup(), (the equivalent of df = DataFrame(pileup())) and add the refbase column to it (DataFrame handles DNAStringSet) - - df$refbase = refbase.

ADD COMMENT • link 8.7 years ago Martin Morgan 25k