Hello everyone,
I find inconsistent results from applyPileups and pileLettersAt. I have a placed a full example on github, so you can try the code with data that will reproduce what I am suspecting is a "bug".
#Repo with reproducible example
https://github.com/pappewaio/ExampleData
#Here a short summary of what is happening
One version requires the user to use readGAlignments first and then use pileLettersAt, here I can read in 450 reads into R, but after using pileLettersAt I am left with only 413 counts. What could possibly explain this difference? Is it because some reads are spanning the position I am piling letters at, but not having any sequence information for the position?
The second version uses applyPileups and summarize straight from the bam file, and here I would expect to have at least the 413 counts found by pileLettersAt. But instead I can only find 385 counts. Why is it so?
Until now I have been using version one to get allele counts. But version two is very much faster, so would be very nice to be able to switch to it, without losing counts.
looking forward to hearing your thoughts,
Jesper
> sessionInfo() R Under development (unstable) (2016-01-08 r69888) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.4 LTS locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=sv_SE.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=sv_SE.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=sv_SE.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=sv_SE.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] GenomicAlignments_1.7.20 SummarizedExperiment_1.1.22 [3] Biobase_2.31.3 Rsamtools_1.23.6 [5] Biostrings_2.39.12 XVector_0.11.7 [7] GenomicRanges_1.23.25 GenomeInfoDb_1.7.6 [9] IRanges_2.5.40 S4Vectors_0.9.44 [11] BiocGenerics_0.17.3 loaded via a namespace (and not attached): [1] zlibbioc_1.17.1 BiocParallel_1.5.21 bitops_1.0-6