Hi,
I am trying to run diffHic on a single chromosome. I extracted the chromosome 1 alignment data and used the bam file as an input for diffHic. However, at the preparePairs() stage, none of the reads appear as paired.
This is the output from diffHic:
diagnostics <- preparePairs("chr1.bam", hs.param, file="chr1.h5", dedup=TRUE, minq=10)
> diagnostics
$pairs
total marked filtered mapped
1 0 0 1
$same.id
dangling self.circle
0 0
$singles
[1] 20248727
$chimeras
total mapped multi invalid
0 0 0 0
When I checked the bam file with samtools flagstat, I get:
samtools flagstat chr1.bam20248729 + 0 in total (QC-passed reads + QC-failed reads)0 + 0 secondary0 + 0 supplementary0 + 0 duplicates20248729 + 0 mapped (100.00% : N/A)20248729 + 0 paired in sequencing10125154 + 0 read110123575 + 0 read220248729 + 0 properly paired (100.00% : N/A)20248729 + 0 with itself and mate mapped0 + 0 singletons (0.00% : N/A)1240765 + 0 with mate mapped to a different chr1240765 + 0 with mate mapped to a different chr (mapQ>=5)
So the reads are paired but preparePairs() does not seem to recognise this. Any suggestions?
Thank you very much!
> sessionInfo()R version 3.2.3 (2015-12-10)Platform: x86_64-apple-darwin13.4.0 (64-bit)Running under: OS X 10.10.5 (Yosemite)locale:[1] C/UTF-8/C/C/C/Cattached base packages:[1] stats4 parallel stats graphics grDevices utils datasets[8] methods baseother attached packages:[1] diffHic_1.2.2 GenomicRanges_1.22.4 GenomeInfoDb_1.6.3[4] IRanges_2.4.7 S4Vectors_0.8.11 BiocGenerics_0.16.1loaded via a namespace (and not attached):[1] AnnotationDbi_1.32.3 XVector_0.10.0[3] edgeR_3.12.0 zlibbioc_1.16.0[5] GenomicAlignments_1.6.3 BiocParallel_1.4.3[7] lattice_0.20-33 BSgenome_1.38.0[9] tools_3.2.3 grid_3.2.3[11] SummarizedExperiment_1.0.2 rhdf5_2.14.0[13] Biobase_2.30.0 csaw_1.4.1[15] DBI_0.3.1 lambda.r_1.1.7[17] futile.logger_1.4.1 rtracklayer_1.30.2[19] futile.options_1.0.0 bitops_1.0-6[21] biomaRt_2.26.1 RCurl_1.95-4.7[23] RSQLite_1.0.0 limma_3.26.8[25] GenomicFeatures_1.22.13 Biostrings_2.38.4[27] Rsamtools_1.22.0 locfit_1.5-9.1[29] XML_3.98-1.3

Thank you very much.
It now recognises pairs even though there is still a large number of singles.
Probably because if you've only taken chromosome 1 reads. If the mate isn't present in the same file (e.g., because it lies on another chromosome), then
preparePairswill mark the read as a singleton. For all practical purposes, this classification is probably correct, because you can't easily interpret those reads without a mate in a Hi-C experiment.Thank you very much. This was very informative.
Thank you very much. This was very informative.