Hi, I am trying to learn (and apply) the tools of the package "diffHic" (Differential analysisi of Hi-C data) to analyze Hi-C datasets and compare them. I am following the user's guide. I am working with a genome that is not available as Biostrings object.
To obtain a restriction-fragmented genome object I run:
genome <- "./genome.fasta"
genome.frag <- cutGenome(genome, "AAGCTT", 4)
genome.param <- pairParam(genome.frag)
Until here everything seems to be fine ( when I run "genome.param" i get as output "Genome contains 5450 restriction fragments across 32 chromosomes ..." which seems to be reasonable and the number of chromosomes is right), but when I run the function that matches the mapping location of each read to a restriction fragment in the reference genome(preparePairs), I got the following error:
diagnostics <- preparePairs("output.fixed.bam", genome.param, file="test.h5", dedup=TRUE, minq=10)
Error in preparePairs("output.sorted.bam", genome.param, file = "test.h5", : missing chromosomes in cut site list
the bam file was obtained in the following way:
bowtie2-build genome.fasta genome # "genome.fasta" is the file with the genome sequence in fasta format
python presplit_map.py -G genome -1 reads_R1.fastq -2 reads_R2.fastq -P 33 --sig "AAGCTAGCTT" -o output.bam # "presplit_map.py" is the python script that is in the diffHic package that does the mapping of Hi-C pair-end reads to the reference genome. samtools fixmate -O bam output.bam output.fixed.bam
I have also tried to follow this pipeline with one genome that is available as Biostrings object ("BSgenome.Celegans.UCSC.ce10"), and a Hi-C dataset available online from this organism, and it worked.. so I think that the problem should be related to the input of my genome as fasta file, but CutGenome function and the storing of the fragments in a pairParam object seems to have worked fine.
Somebody has an idea of what the problem could be and how could I solve it?