Question

diffHic preparePairs() does not identify paired-end reads

0

Entering edit mode

ilyco • 0

@ilyco-9716

Last seen 7.4 years ago

Hi,

I am trying to run diffHic on a single chromosome. I extracted the chromosome 1 alignment data and used the bam file as an input for diffHic. However, at the preparePairs() stage, none of the reads appear as paired.

This is the output from diffHic:

diagnostics <- preparePairs("chr1.bam", hs.param, file="chr1.h5", dedup=TRUE, minq=10)
> diagnostics
$pairs
   total   marked filtered   mapped 
       1        0        0        1 
$same.id
   dangling self.circle 
          0           0 
$singles
[1] 20248727
$chimeras
  total  mapped   multi invalid 
      0       0       0       0

When I checked the bam file with samtools flagstat, I get:

samtools flagstat chr1.bam 
20248729 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
20248729 + 0 mapped (100.00% : N/A)
20248729 + 0 paired in sequencing
10125154 + 0 read1
10123575 + 0 read2
20248729 + 0 properly paired (100.00% : N/A)
20248729 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
1240765 + 0 with mate mapped to a different chr
1240765 + 0 with mate mapped to a different chr (mapQ>=5)

So the reads are paired but preparePairs() does not seem to recognise this. Any suggestions?

Thank you very much!

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)
locale:
[1] C/UTF-8/C/C/C/C
attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     
other attached packages:
[1] diffHic_1.2.2        GenomicRanges_1.22.4 GenomeInfoDb_1.6.3  
[4] IRanges_2.4.7        S4Vectors_0.8.11     BiocGenerics_0.16.1 
loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.32.3       XVector_0.10.0            
 [3] edgeR_3.12.0               zlibbioc_1.16.0           
 [5] GenomicAlignments_1.6.3    BiocParallel_1.4.3        
 [7] lattice_0.20-33            BSgenome_1.38.0           
 [9] tools_3.2.3                grid_3.2.3                
[11] SummarizedExperiment_1.0.2 rhdf5_2.14.0              
[13] Biobase_2.30.0             csaw_1.4.1                
[15] DBI_0.3.1                  lambda.r_1.1.7            
[17] futile.logger_1.4.1        rtracklayer_1.30.2        
[19] futile.options_1.0.0       bitops_1.0-6              
[21] biomaRt_2.26.1             RCurl_1.95-4.7            
[23] RSQLite_1.0.0              limma_3.26.8              
[25] GenomicFeatures_1.22.13    Biostrings_2.38.4         
[27] Rsamtools_1.22.0           locfit_1.5-9.1            
[29] XML_3.98-1.3

diffHic preparePairs paired BAM • 2.0k views

ADD COMMENT • link updated 9.1 years ago by Aaron Lun ★ 28k • written 9.1 years ago by ilyco • 0

score 1 · Answer 1 · 2016-02-14

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 3 hours ago

The city by the bay

Check that the BAM file is name sorted. Also, preparePairs identifies read pairs based on having identical alignment names. If that's not the case (e.g., there's /1 or /2 appended to the name) the function will fail.

ADD COMMENT • link 9.1 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you very much.

It now recognises pairs even though there is still a large number of singles.

ADD REPLY • link 9.1 years ago ilyco • 0

0

Entering edit mode

Probably because if you've only taken chromosome 1 reads. If the mate isn't present in the same file (e.g., because it lies on another chromosome), then preparePairs will mark the read as a singleton. For all practical purposes, this classification is probably correct, because you can't easily interpret those reads without a mate in a Hi-C experiment.

ADD REPLY • link 9.1 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you very much. This was very informative.

ADD REPLY • link 9.1 years ago ilyco • 0

0

Entering edit mode

Thank you very much. This was very informative.

ADD REPLY • link 9.1 years ago ilyco • 0