Hi there, I am new to analyzing human genome data. I have 2 files: 1 - table of loci of interest (position and reference nucleotide) and 2 - a BAM for reads mapped to a chromosome.
I need to make a table relating the reads mapped to each of those loci and the base at each position. I have read in the BAM file using scanBam in Rsamtools, then I was writing a complicated script to search the table I generated from that using the loci of interest.
Q1 - I would like to populate a data.table with info from the BAM file, but am having trouble doing this, code attempts below, any help appreciated Q2 - I feel as if there should be other packages that could do this more efficiently, any advice on that.
Thank you for any advice,
bam<-scanBam(bamFile,param=param)
#
## get BAM file in table format
# custom unlist function from Rsamtool
.unlist <- function (x)
{
## do.call(c, ...) coerces factor to integer, which is undesired
x1 <- x[[1L]]
if (is.factor(x1)) {
structure(unlist(x), class = "factor", levels = levels(x1))
} else {
do.call(c, x)
}
}
elts <- setNames(bamWhat(param), bamWhat(param))
lst <- lapply(elts, function(elt) .unlist(lapply(bam, "[[", elt)))
##
df_bam<-(do.call("DataFrame", lst))
# I want this as a data.table so that I can take a file of loci of interest and then search the BAM output for the reads at that position
### none of these below work, and besides I think it will be more efficient to populate the data.table straight from the BAM file
# data.table(unlist(bam[[1]]))
# data.table(lst)
#
> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.12.8 Rsamtools_1.34.1 Biostrings_2.50.2 XVector_0.22.0 GenomicRanges_1.34.0
[6] GenomeInfoDb_1.18.2 IRanges_2.16.0 S4Vectors_0.20.1 BiocGenerics_0.28.0
loaded via a namespace (and not attached):
[1] bitops_1.0-6 zlibbioc_1.28.0 BiocParallel_1.16.6 tools_3.5.2 RCurl_1.98-1.1
[6] compiler_3.5.2 GenomeInfoDbData_1.2.0