I wanted to use diffHic just to perform peak calling on Hi-C data. Do I necessarily have to start from fastq alignment with the script included in the package or I can use an alternative input (e.g. HiC-Pro filtered read pairs)? If so, which function can I use and how have I to format the input file(s)? Thanks in advance.
There are several points at which you can enter the diffHic analysis:
From the FASTQ files, as you've mentioned. This involves aligning the files to the reference genome with presplit_map.py, and then proceeding as described in the user's guide.
From name-sorted paired-end BAM files. These can be directly used in preparePairs.
From restriction fragment assignments. If you have a data frame where each row corresponds to a read pair and specifies the indices of the restriction fragments to which each read in the pair was mapped (and you also have a GRanges object containing the coordinates of those restriction fragments), you can use savePairs to save this information into a HDF5 object for further analysis.
Thank you very much! Is there any chance to obtain an InteractionSet from an interaction matrix (containing read pair counts from merged replicates) to be used as input for the enrichedPairs function? Because I tried to convert it to an InteractionSet object but when I run enrichedPairs I get the following warning:
Warning messages:
1: In .local(object, ...) :
library sizes not found in 'totals', setting to NULL
You need to set the total number of read pair counts in each sample. This would automatically be done for you if you used the output from squareCounts, but you'll have to do it yourself if you make your own InteractionSet. Something like:
iset$totals <- colSums(assay(iset))
... should be sufficient, provided that every interaction with non-zero counts is present in iset (and you didn't count read pairs into more than one interaction). Or, if you already know your sequencing depth from some other source, you can set it to that.
You need to set the total number of read pair counts in each sample. This would automatically be done for you if you used the output from
squareCounts
, but you'll have to do it yourself if you make your ownInteractionSet
. Something like:... should be sufficient, provided that every interaction with non-zero counts is present in
iset
(and you didn't count read pairs into more than one interaction). Or, if you already know your sequencing depth from some other source, you can set it to that.