Search
Question: Affect ShortRead readAligned behavior with ScanBamParam's reverseComplement parameter
0
gravatar for jdavison
3.8 years ago by
jdavison10
United States
jdavison10 wrote:

Hi, I thought one can affect how ShortRead's readAligned function handles reverse strand BAM reads: to reverse complement them, or not, with the ScanBamParam reverseComplement.  I'm not able to do that, can you set me straight?  Thank you, see the code below.


library(ShortRead)

b = readAligned('my.bam', type='BAM')
sread(b)
[1]   151 CATTGCTTCCGTTTGTGCTCGATAAAAATAAGAAT...GTCTGGTTCATCATCTGTAAGAATGGCTTCCAGAG

# reverseComplement: A logical(1) vectors. BAM files store reads mapping
#          to the minus strand as though they are on the plus strand.
#          Rsamtools obeys this convention by default
#          (‘reverseComplement=FALSE’), but when this value is set to
#          TRUE returns the sequence and quality scores of reads mapped
#          to the minus strand in the reverse complement (sequence) and
#          reverse (quality) of the read as stored in the BAM file. This
#          might be useful if wishing to recover read and quality scores
#          as represented in fastq files, but is NOT appropriate for
#          variant calling or other alignment-based operations.

param.T = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
     what = ShortRead:::.readAligned_bamWhat())

b.SBP.TRUE = readAligned('my.bam', type='BAM', param=param.T)

sread(b.SBP.TRUE)
[1]   151 CATTGCTTCCGTTTGTGCTCGATAAAAATAAGAAT...GTCTGGTTCATCATCTGTAAGAATGGCTTCCAGAG

param.F = ScanBamParam(simpleCigar = TRUE, reverseComplement = FALSE,
     what = ShortRead:::.readAligned_bamWhat())

b.SBP.FALSE = readAligned('my.bam', type='BAM', param=param.F)
Warning message:
UserArgumentMismatch
  using 'TRUE' for 'bamReverseComplement(param)' 

sread(b.SBP.FALSE)
[1]   151 CATTGCTTCCGTTTGTGCTCGATAAAAATAAGAAT...GTCTGGTTCATCATCTGTAAGAATGGCTTCCAGAG

> version
               _                                                 
platform       x86_64-unknown-linux-gnu                          
arch           x86_64                                            
os             linux-gnu                                         
system         x86_64, linux-gnu                                 
status         Under development (unstable)                      
major          3                                                 
minor          2.0                                               
year           2015                                              
month          01                                                
day            11                                                
svn rev        67421                                             
language       R                                                 
version.string R Under development (unstable) (2015-01-11 r67421)
nickname       Unsuffered Consequences

                          
> sessionInfo()
R Under development (unstable) (2015-01-11 r67421)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] ShortRead_1.25.8         GenomicAlignments_1.3.23 Rsamtools_1.19.26       
 [4] GenomicRanges_1.19.33    GenomeInfoDb_1.3.12      Biostrings_2.35.7       
 [7] XVector_0.7.3            IRanges_2.1.35           S4Vectors_0.5.16        
[10] BiocParallel_1.1.11      BiocGenerics_0.13.4      BiocInstaller_1.17.3    

loaded via a namespace (and not attached):
 [1] base64enc_0.1-2     BatchJobs_1.5       BBmisc_1.8         
 [4] Biobase_2.27.1      bitops_1.0-6        brew_1.0-6         
 [7] checkmate_1.5.1     codetools_0.2-9     DBI_0.3.1          
[10] digest_0.6.8        fail_1.2            foreach_1.4.2      
[13] grid_3.2.0          hwriter_1.3.2       iterators_1.0.7    
[16] lattice_0.20-29     latticeExtra_0.6-26 RColorBrewer_1.1-2 
[19] RSQLite_1.0.0       sendmailR_1.2-1     stringr_0.6.2      
[22] tools_3.2.0         zlibbioc_1.13.0    

 

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by jdavison10

Oops, I forgot to note

b = readAligned('my.bam', type='BAM')

strand(b)

[1] -
Levels: + - *

 

Also, I can get what I want with e.g.

bsr = sread(b)

ndx = strand(b)=='-' & !is.na(strand(b))

bsr[ndx] = reverseComplement(bsr[ndx])

 

 

ADD REPLYlink written 3.8 years ago by jdavison10
0
gravatar for Martin Morgan
3.8 years ago by
Martin Morgan ♦♦ 22k
United States
Martin Morgan ♦♦ 22k wrote:

Use GenomicAlignments::readGAlignments() instead; readAligned is really only for legacy formats.

> fl = system.file(package="Rsamtools", "extdata", "ex1.bam")
> readGAlignments(fl, param=ScanBamParam(what="seq"))[9]
GAlignments object with 1 alignment and 1 metadata column:
      seqnames strand       cigar    qwidth     start       end
         <Rle>  <Rle> <character> <integer> <integer> <integer>
  [1]     seq1      -         35M        35        18        52
          width     njunc |                                 seq
      <integer> <integer> |                      <DNAStringSet>
  [1]        35         0 | AAATGTGTGGTTTAACTCGTCCATGGCCCAGCATT
  -------
  seqinfo: 2 sequences from an unspecified genome
> readGAlignments(fl, param=ScanBamParam(what="seq", reverseComplement=TRUE))[9]
GAlignments object with 1 alignment and 1 metadata column:
      seqnames strand       cigar    qwidth     start       end
         <Rle>  <Rle> <character> <integer> <integer> <integer>
  [1]     seq1      -         35M        35        18        52
          width     njunc |                                 seq
      <integer> <integer> |                      <DNAStringSet>
  [1]        35         0 | AATGCTGGGCCATGGACGAGTTAAACCACACATTT
  -------
  seqinfo: 2 sequences from an unspecified genome
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Martin Morgan ♦♦ 22k
0
gravatar for jdavison
3.8 years ago by
jdavison10
United States
jdavison10 wrote:

Sweet, thanks.  And I see you have videos, excellent!

http://bioconductor.org/packages/release/bioc/html/GenomicAlignments.html

 

ADD COMMENTlink written 3.8 years ago by jdavison10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 132 users visited in the last hour