dba.count binding matrix coordinates
1
1
Entering edit mode
jingyaq ▴ 10
@jingyaq-18218
Last seen 2.8 years ago

Hi,

Thanks for this great package. I just have a question about the output of dba.count.

I'm trying to generate a count matrix with dba.count, but noticed a strange mixup with the coordinates. Attached below is a toy example that I hope clearly shows the issue.

consensus_peakset is a GRanges object of a peak set I retrieved using dba.peakset. I ran dba.count on a few peaks from 2 chromosomes (chr1, chr15). The resulting db$binding matrix has the same start and end coordinates, but the CHR column shows chromosomes 1 and 2. One of these must be incorrect.

I noticed the chromosome name format changed from 'chrX' to just 'X', and I imagine that has something to do with the reordering of factor levels. Am I doing something wrong, or is this a bug?

Thanks again for your help!

DiffBind • 345 views
ADD COMMENT
2
Entering edit mode
Rory Stark ★ 4.1k
@rory-stark-5741
Last seen 16 days ago
CRUK, Cambridge, UK

The issue is how you are retrieving the binding matrix. Accessing it directly (db$binding) results in an internal dataframe being returned. The CHR column in this is an index into a separate set of chromosome names (db$chrmap), not a chromosomes "number" . So the index of the chromosome labelled "chr2" is not necessarily the number 2!

The officially supported way of retrieving the binding matrix is to use dba.peakset() with bRetrieve=TRUE. This will do the chromosome name translation and the matrix will be returned as a GRanges object (or as a dataframe if you set DataType=DBA_DATA_FRAME).

ADD COMMENT

Login before adding your answer.

Traffic: 354 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6