Hi,
I am using Rsamtools to generate pileup from a list of positions. Rsamtools doesn't give the reference base in the output so I am trying to import my reference fasta and retrieve the reference bases from it. Here is my code:
positions_file <- read.delim('positions.txt',header=F)
head(positions_file)
V1 V2
10 1156771
10 37484026
10 78483209
10 82960189
10 9551751
11 19256468
fa <- FaFile(file='gr37.fasta')
idx <- scanFaIndex(fa)
refbase <- getSeq(fa,GRanges(positions_file$V1,IRanges(start=as.numeric(positions_file$V2),end=as.numeric(positions_file$V2))))
head(refbase)
A DNAStringSet instance of length 185
width seq names
[1] 1 C 10
[2] 1 C 10
[3] 1 T 10
[4] 1 A 10
[5] 1 G 10
... ... ...
[181] 1 T 3
[182] 1 A 3
[183] 1 A 3
[184] 1 A 3
[185] 1 C 4
class(refbase)
[1] "DNAStringSet"
attr(,"package")
[1] "Biostrings"
REF <- as.data.frame(refbase)$x # right now I am doing something like this to extract sequences
I can retrieve the width and names using width(refbase) and names(refbase) but I am unable to retrieve the sequences using a single function. I can retrieve it by converting it into a dataframe and extracting that column. Just wanted to know if there is an inbuilt function for that.
