Hi,
I am trying to run diffHic on a single chromosome. I extracted the chromosome 1 alignment data and used the bam file as an input for diffHic. However, at the preparePairs() stage, none of the reads appear as paired.
This is the output from diffHic:
diagnostics <- preparePairs("chr1.bam", hs.param, file="chr1.h5", dedup=TRUE, minq=10)
> diagnostics
$pairs
total marked filtered mapped
1 0 0 1
$same.id
dangling self.circle
0 0
$singles
[1] 20248727
$chimeras
total mapped multi invalid
0 0 0 0
When I checked the bam file with samtools flagstat
, I get:
samtools flagstat chr1.bam
20248729 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
20248729 + 0 mapped (100.00% : N/A)
20248729 + 0 paired in sequencing
10125154 + 0 read1
10123575 + 0 read2
20248729 + 0 properly paired (100.00% : N/A)
20248729 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
1240765 + 0 with mate mapped to a different chr
1240765 + 0 with mate mapped to a different chr (mapQ>=5)
So the reads are paired but preparePairs()
does not seem to recognise this. Any suggestions?
Thank you very much!
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)
locale:
[1] C/UTF-8/C/C/C/C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] diffHic_1.2.2 GenomicRanges_1.22.4 GenomeInfoDb_1.6.3
[4] IRanges_2.4.7 S4Vectors_0.8.11 BiocGenerics_0.16.1
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.32.3 XVector_0.10.0
[3] edgeR_3.12.0 zlibbioc_1.16.0
[5] GenomicAlignments_1.6.3 BiocParallel_1.4.3
[7] lattice_0.20-33 BSgenome_1.38.0
[9] tools_3.2.3 grid_3.2.3
[11] SummarizedExperiment_1.0.2 rhdf5_2.14.0
[13] Biobase_2.30.0 csaw_1.4.1
[15] DBI_0.3.1 lambda.r_1.1.7
[17] futile.logger_1.4.1 rtracklayer_1.30.2
[19] futile.options_1.0.0 bitops_1.0-6
[21] biomaRt_2.26.1 RCurl_1.95-4.7
[23] RSQLite_1.0.0 limma_3.26.8
[25] GenomicFeatures_1.22.13 Biostrings_2.38.4
[27] Rsamtools_1.22.0 locfit_1.5-9.1
[29] XML_3.98-1.3
Thank you very much.
It now recognises pairs even though there is still a large number of singles.
Probably because if you've only taken chromosome 1 reads. If the mate isn't present in the same file (e.g., because it lies on another chromosome), then
preparePairs
will mark the read as a singleton. For all practical purposes, this classification is probably correct, because you can't easily interpret those reads without a mate in a Hi-C experiment.Thank you very much. This was very informative.
Thank you very much. This was very informative.