Rsamtools and data.table OR advice of packages
0
0
Entering edit mode
sara • 0
@0d527ed6
Last seen 2.8 years ago
Canada

Hi there, I am new to analyzing human genome data. I have 2 files: 1 - table of loci of interest (position and reference nucleotide) and 2 - a BAM for reads mapped to a chromosome.

I need to make a table relating the reads mapped to each of those loci and the base at each position. I have read in the BAM file using scanBam in Rsamtools, then I was writing a complicated script to search the table I generated from that using the loci of interest.

Q1 - I would like to populate a data.table with info from the BAM file, but am having trouble doing this, code attempts below, any help appreciated Q2 - I feel as if there should be other packages that could do this more efficiently, any advice on that.

Thank you for any advice,

bam<-scanBam(bamFile,param=param) 
#

## get BAM file in table format 
# custom unlist function  from Rsamtool
.unlist <- function (x)
   {
     ## do.call(c, ...) coerces factor to integer, which is undesired
       x1 <- x[[1L]]
       if (is.factor(x1)) {
         structure(unlist(x), class = "factor", levels = levels(x1))
         } else {
           do.call(c, x)
           }
       }

elts <- setNames(bamWhat(param), bamWhat(param))
lst <- lapply(elts, function(elt) .unlist(lapply(bam, "[[", elt)))
##
df_bam<-(do.call("DataFrame", lst))
# I want this as a data.table so that I can take a file of loci of interest and then search the BAM output for the reads at that position
### none of these below work, and besides I think it will be more efficient to populate the data.table straight from the BAM file
# data.table(unlist(bam[[1]]))
# data.table(lst)
#

> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.8    Rsamtools_1.34.1     Biostrings_2.50.2    XVector_0.22.0       GenomicRanges_1.34.0
[6] GenomeInfoDb_1.18.2  IRanges_2.16.0       S4Vectors_0.20.1     BiocGenerics_0.28.0 

loaded via a namespace (and not attached):
[1] bitops_1.0-6           zlibbioc_1.28.0        BiocParallel_1.16.6    tools_3.5.2            RCurl_1.98-1.1        
[6] compiler_3.5.2         GenomeInfoDbData_1.2.0
data.table Rsamtools NGS • 840 views
ADD COMMENT

Login before adding your answer.

Traffic: 908 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6