Question: How to get GenomicAlignments to read all reads?
gravatar for mikko.arvas
2.3 years ago by
mikko.arvas0 wrote:


when looking at a bam with a genome viewer I see reads of:

-Mapping quality  over 60
-Insert size 170 kbp
-Cigar 100M
-Primary alingment TRUE

When trying to read them to R:

> sbf <- scanBamFlag(isPaired = NA, isProperPair = NA, isUnmappedQuery = NA,
+             hasUnmappedMate = NA, isMinusStrand = NA, isMateMinusStrand = NA,
+             isFirstMateRead = NA, isSecondMateRead = NA, isSecondaryAlignment  = NA, isNotPassingQualityControls = NA,
+             isDuplicate = NA)
> sbf2 <- ScanBamParam(flag = sbf, mapqFilter=0)
> bam  = readGAlignmentPairs(file="file.bam",param = sbf2)
> bam
GAlignmentPairs object with 0 pairs, strandMode=1, and 0 metadata columns:
   seqnames strand :    ranges --    ranges
      <Rle>  <Rle> : <IRanges> -- <IRanges>
  seqinfo: 84 sequences from an unspecified genome

I see no reads. What am I missing in here?

With scanBamFlag above I try set up things so that everything would be read in. Is there some further parameter I should look into?





R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=fi_FI.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=fi_FI.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=fi_FI.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GenomicAlignments_1.10.0   Rsamtools_1.26.1           Biostrings_2.42.1          XVector_0.14.0             SummarizedExperiment_1.4.0
 [6] Biobase_2.34.0             GenomicRanges_1.26.2       GenomeInfoDb_1.10.2        IRanges_2.8.1              S4Vectors_0.12.1          
[11] BiocGenerics_0.20.0       

loaded via a namespace (and not attached):
[1] lattice_0.20-34    bitops_1.0-6       grid_3.3.1         zlibbioc_1.20.0    Matrix_1.2-8       BiocParallel_1.8.1 tools_3.3.1      

ADD COMMENTlink written 2.3 years ago by mikko.arvas0

It's difficult to know what the issue may be without seeing an example of your data.  Is it possible for you to make a small subset of your bam file available?

I would also try running the function using the default parameters, which are pretty inclusive and should get most reads, e.g.


This should help identify if the problem is the function, the data, or the parameters.  Similarly you can try running readGAlignments() to make sure this isn't an issue with the pairing of your data.

ADD REPLYlink written 2.3 years ago by Mike Smith3.5k

Thank you Mike!

> bam <- readGAlignmentPairs("test.bam")
> cvg <- coverage(bam)
> max(cvg$`6`)
[1] 1


> bam <- readGAlignments("test.bam")
> cvg <- coverage(bam)
> max(cvg$`6`)
[1] 50

This is what I see with Baseplayer Genome browser.

It looks like it is something in the pairing of the reads. Browser sees the pairs but readGAlignmentPairs() misses them.

If somebody is interested to have a look I can send a test file.



ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by mikko.arvas0

readGAlignmentList() might be informative -- it can include unmated reads as well as reads that are ambiguously mated. Identify reads that you think should have been mated. Use ScanBamParam() to access the information used for mating and described in ?readGAlignmentsList to identify what criterion prevents them from being mated.

ADD REPLYlink written 2.3 years ago by Martin Morgan ♦♦ 23k

If the workflow Martin suggests doesn't shed any light I can look at the files for you. Make the files (or subsets) available to 


ADD REPLYlink written 2.3 years ago by Valerie Obenchain6.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 178 users visited in the last hour