Rsamtools pileup does not return reference base information
1
0
Entering edit mode
komal.rathi ▴ 120
@komalrathi-9163
Last seen 7 months ago
United States

Hi everyone,

I am using Rsamtools for generating a pileup from my bam files. The following is the code I use:

# read bam file
bf <- BamFile(file = 'sample.bam')

# read positions file
positions_file <- read.delim('positions.txt')

head(positions_file)
V1        V2
 1 151553417
 1 165114162
 1 170522088

# positions to generate pileup
param <- ScanBamParam(which=GRanges(positions_file$V1,IRanges(start=as.numeric(positions_file$V2), end=as.numeric(positions_file$V2))))

# pileup parameters
p_param <- PileupParam(distinguish_strand = TRUE, distinguish_nucleotides = TRUE)

# call pileup
res <- pileup(bf, scanBamParam = param, pileupParam = p_param)

# no reference base output
head(res)
 seqnames       pos strand nucleotide count           which_label
        1 151553417      +          G    18 1:151553417-151553417
        1 151553417      -          G    34 1:151553417-151553417
        1 165114162      +          A    24 1:165114162-165114162
        1 165114162      -          A    19 1:165114162-165114162
        1 170522088      +          G    30 1:170522088-170522088
        1 170522088      -          G    32 1:170522088-170522088

 

So as you can see, if I do not have the reference base information in the positions file, I can't get that information from the pileup function in Rsamtools. Is there a way to get the reference base information? I also noticed you cannot even provide a reference genome, is that true?

Any help would be much appreciated.

r rsamtools • 1.4k views
ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 3 hours ago
United States

Yes, that's correct, it does not consult a reference genome. The approach is to query your reference genome over the relevant coordinate, typically using BSgenome::getSeq() on an BSgenome, TwoBit (.2bit), DNAStringSet, or an (indexed, see ?FaFile) FaFile. getSeq typically accepts (see, e.g., ?"getSeq,FaFile-method") a second GRanges argument specifying coordinates; this would be the 'which' argument to ScanBamParam().

> library(BSgenome)
> methods(getSeq)
[1] getSeq,BSgenome-method   getSeq,FaFileList-method getSeq,FaFile-method    
[4] getSeq,TwoBitFile-method getSeq,XStringSet-method

 

ADD COMMENT
0
Entering edit mode

Thank you, that helped!

ADD REPLY

Login before adding your answer.

Traffic: 855 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6