dba.count binding matrix coordinates
Thanks for this great package. I just have a question about the output of dba.count.

I'm trying to generate a count matrix with dba.count, but noticed a strange mixup with the coordinates. Attached below is a toy example that I hope clearly shows the issue.

consensus_peakset is a GRanges object of a peak set I retrieved using dba.peakset. I ran dba.count on a few peaks from 2 chromosomes (chr1, chr15). The resulting db$binding matrix has the same start and end coordinates, but the CHR column shows chromosomes 1 and 2. One of these must be incorrect.

I noticed the chromosome name format changed from 'chrX' to just 'X', and I imagine that has something to do with the reordering of factor levels. Am I doing something wrong, or is this a bug?

Thanks again for your help!

The issue is how you are retrieving the binding matrix. Accessing it directly (db$binding) results in an internal dataframe being returned. The CHR column in this is an index into a separate set of chromosome names (db$chrmap), not a chromosomes "number" . So the index of the chromosome labelled "chr2" is not necessarily the number 2!

The officially supported way of retrieving the binding matrix is to use dba.peakset() with bRetrieve=TRUE. This will do the chromosome name translation and the matrix will be returned as a GRanges object (or as a dataframe if you set DataType=DBA_DATA_FRAME).


