The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: Extract sequence from DNAStringSet object
5
gravatar for komal.rathi
3.3 years ago by
komal.rathi60
United States
komal.rathi60 wrote:

Hi,

I am using Rsamtools to generate pileup from a list of positions. Rsamtools doesn't give the reference base in the output so I am trying to import my reference fasta and retrieve the reference bases from it. Here is my code:

positions_file <- read.delim('positions.txt',header=F)

head(positions_file)
V1       V2 
10  1156771 
10 37484026 
10 78483209 
10 82960189 
10  9551751 
11 19256468 

fa <- FaFile(file='gr37.fasta')
idx <- scanFaIndex(fa)
refbase <- getSeq(fa,GRanges(positions_file$V1,IRanges(start=as.numeric(positions_file$V2),end=as.numeric(positions_file$V2))))

head(refbase)
A DNAStringSet instance of length 185
width seq                                             names               
[1]     1 C                                               10
[2]     1 C                                               10
[3]     1 T                                               10
[4]     1 A                                               10
[5]     1 G                                               10
...   ... ...
[181]     1 T                                               3
[182]     1 A                                               3
[183]     1 A                                               3
[184]     1 A                                               3
[185]     1 C                                               4

class(refbase)
[1] "DNAStringSet"
attr(,"package")
[1] "Biostrings"

REF <- as.data.frame(refbase)$x # right now I am doing something like this to extract sequences

I can retrieve the width and names using width(refbase) and names(refbase) but I am unable to retrieve the sequences using a single function. I can retrieve it by converting it into a dataframe and extracting that column. Just wanted to know if there is an inbuilt function for that.

dnastringset • 7.3k views
ADD COMMENTlink modified 3.3 years ago by Martin Morgan ♦♦ 22k • written 3.3 years ago by komal.rathi60
Answer: Extract sequence from DNAStringSet object
3
gravatar for Martin Morgan
3.3 years ago by
Martin Morgan ♦♦ 22k
United States
Martin Morgan ♦♦ 22k wrote:

The DNAStringSet is the sequence; work with that. For instance, create a DataFrame() from the result returned by pileup(), (the equivalent of df = DataFrame(pileup())) and add the refbase column to it (DataFrame handles DNAStringSet) - - df$refbase = refbase.

ADD COMMENTlink written 3.3 years ago by Martin Morgan ♦♦ 22k
Answer: Extract sequence from DNAStringSet object
6
gravatar for James W. MacDonald
3.3 years ago by
United States
James W. MacDonald49k wrote:

Use as.character().

 

ADD COMMENTlink written 3.3 years ago by James W. MacDonald49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 239 users visited in the last hour