Extract sequence from DNAStringSet object
2
5
Entering edit mode
komal.rathi ▴ 80
@komalrathi-9163
Last seen 7 weeks ago
United States

Hi,

I am using Rsamtools to generate pileup from a list of positions. Rsamtools doesn't give the reference base in the output so I am trying to import my reference fasta and retrieve the reference bases from it. Here is my code:

positions_file <- read.delim('positions.txt',header=F)

head(positions_file)
V1       V2 
10  1156771 
10 37484026 
10 78483209 
10 82960189 
10  9551751 
11 19256468 

fa <- FaFile(file='gr37.fasta')
idx <- scanFaIndex(fa)
refbase <- getSeq(fa,GRanges(positions_file$V1,IRanges(start=as.numeric(positions_file$V2),end=as.numeric(positions_file$V2))))

head(refbase)
A DNAStringSet instance of length 185
width seq                                             names               
[1]     1 C                                               10
[2]     1 C                                               10
[3]     1 T                                               10
[4]     1 A                                               10
[5]     1 G                                               10
...   ... ...
[181]     1 T                                               3
[182]     1 A                                               3
[183]     1 A                                               3
[184]     1 A                                               3
[185]     1 C                                               4

class(refbase)
[1] "DNAStringSet"
attr(,"package")
[1] "Biostrings"

REF <- as.data.frame(refbase)$x # right now I am doing something like this to extract sequences

I can retrieve the width and names using width(refbase) and names(refbase) but I am unable to retrieve the sequences using a single function. I can retrieve it by converting it into a dataframe and extracting that column. Just wanted to know if there is an inbuilt function for that.

dnastringset • 12k views
ADD COMMENT
3
Entering edit mode
@martin-morgan-1513
Last seen 15 days ago
United States

The DNAStringSet is the sequence; work with that. For instance, create a DataFrame() from the result returned by pileup(), (the equivalent of df = DataFrame(pileup())) and add the refbase column to it (DataFrame handles DNAStringSet) - - df$refbase = refbase.

ADD COMMENT
6
Entering edit mode
@james-w-macdonald-5106
Last seen 19 hours ago
United States

Use as.character().

 

ADD COMMENT

Login before adding your answer.

Traffic: 196 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6