How to read in bam file that is output from presplit_map.py using the GenomicAlignment package
1
1
Entering edit mode
mrk.carty ▴ 30
@mrkcarty-7442
Last seen 6.1 years ago
United States

Hi all,

I've been using the diffHiC package for awhile. I would like to read in the bam file, which is the output after running presplit_map.py, using Rsamtools or GenomicAlignment package. I cannot read in the bam file using either of these packages. However, I can read in the bam file using preparePairs function from diffHic. I've looked under the hood of preparePairs, and they are using two functions that are hidden. What are these speciality functions? Are they written in python/R?

Best,

Mark

diffhic • 847 views
1
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 1 hour ago
The city by the bay

The hidden functions aren't anything special, they're just standard R functions that are split off from the main function to facilitate re-use elsewhere in the package. The key thing to keep in mind is that the BAM file from presplit_map.py is not position-sorted. Rather, it is name-sorted, which makes sense because Hi-C data is most sensibly interpreted in terms of read pairs. Running scanBam with a which argument in ScanBamParam is likely to fail, as no index file can be constructed for a name-sorted BAM file.

For most purposes, you should be able to work through diffHic without loading in the raw alignments. However, if you really want to do that, I'd suggest loading in chunks of read pairs with the yield argument in the BamFile handle constructor (limiting the yield just avoids running out of memory when the entire file is loaded in at once). You can check out how to do this, based on the diffHic:::.innerPrepLoop function.