Hi,
when looking at a bam with a genome viewer I see reads of:
-Mapping quality over 60
-Insert size 170 kbp
-Cigar 100M
-Paired
-Primary alingment TRUE
When trying to read them to R:
> sbf <- scanBamFlag(isPaired = NA, isProperPair = NA, isUnmappedQuery = NA,
+ hasUnmappedMate = NA, isMinusStrand = NA, isMateMinusStrand = NA,
+ isFirstMateRead = NA, isSecondMateRead = NA, isSecondaryAlignment = NA, isNotPassingQualityControls = NA,
+ isDuplicate = NA)
> sbf2 <- ScanBamParam(flag = sbf, mapqFilter=0)
> bam = readGAlignmentPairs(file="file.bam",param = sbf2)
> bam
GAlignmentPairs object with 0 pairs, strandMode=1, and 0 metadata columns:
seqnames strand : ranges -- ranges
<Rle> <Rle> : <IRanges> -- <IRanges>
-------
seqinfo: 84 sequences from an unspecified genome
I see no reads. What am I missing in here?
With scanBamFlag above I try set up things so that everything would be read in. Is there some further parameter I should look into?
Best,
Mikko
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=fi_FI.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=fi_FI.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=fi_FI.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicAlignments_1.10.0 Rsamtools_1.26.1 Biostrings_2.42.1 XVector_0.14.0 SummarizedExperiment_1.4.0
[6] Biobase_2.34.0 GenomicRanges_1.26.2 GenomeInfoDb_1.10.2 IRanges_2.8.1 S4Vectors_0.12.1
[11] BiocGenerics_0.20.0
loaded via a namespace (and not attached):
[1] lattice_0.20-34 bitops_1.0-6 grid_3.3.1 zlibbioc_1.20.0 Matrix_1.2-8 BiocParallel_1.8.1 tools_3.3.1
It's difficult to know what the issue may be without seeing an example of your data. Is it possible for you to make a small subset of your bam file available?
I would also try running the function using the default parameters, which are pretty inclusive and should get most reads, e.g.
This should help identify if the problem is the function, the data, or the parameters. Similarly you can try running
readGAlignments()
to make sure this isn't an issue with the pairing of your data.Thank you Mike!
> bam <- readGAlignmentPairs("test.bam")
> cvg <- coverage(bam)
> max(cvg$`6`)
[1] 1
Wrong.
> bam <- readGAlignments("test.bam")
> cvg <- coverage(bam)
> max(cvg$`6`)
[1] 50
OK!
This is what I see with Baseplayer https://www.cs.helsinki.fi/u/rkataine/BasePlayer/info.html Genome browser.
It looks like it is something in the pairing of the reads. Browser sees the pairs but readGAlignmentPairs() misses them.
If somebody is interested to have a look I can send a test file.
Best,
Mikko
readGAlignmentList()
might be informative -- it can include unmated reads as well as reads that are ambiguously mated. Identify reads that you think should have been mated. UseScanBamParam()
to access the information used for mating and described in?readGAlignmentsList
to identify what criterion prevents them from being mated.If the workflow Martin suggests doesn't shed any light I can look at the files for you. Make the files (or subsets) available to valerie.obenchain@roswellpark.org.
Valerie