Question

DiffBind dba.count error

0

Entering edit mode

mdidish ▴ 10

@cbee3b30

Last seen 3 months ago

France

Dear,

I would like to determine summit value for a list of enhancers from a public database.

I use these lines:

# load features table: 3 columns Chr, Start, End
features <- read.table("GeneCards_GeneHancer", header = T)
# create exp.dba object
exp.dba <- dba(sampleSheet = "ATAC_sampleSheet.csv")
# count with summits = T
exp.dba <- dba.count(DBA = exp.dba, peaks = features, minOverlap = 1, summits = T)

I obtain this message error :

Error in if (sum(tokeep) < length(tokeep)) { : 
  missing value where TRUE/FALSE needed

I read in several post that this problem can occurs on WindowsOSX machine and should be solved on Linux machine, but I run on Ubuntu 20.04, R 4.0.3.

Have you any suggestion?

Thanks,
Marc

DiffBind • 978 views

ADD COMMENT • link 3.9 years ago mdidish ▴ 10

score 0 · Answer 1 · 2021-01-20

It would be useful to see what is actually in the features object. My guess is that the table of enhancer regions is not in a valid format. Optimally this is a GRanges object, and the chromosome names must match those in the reference genome used for alignment.

FYI, there is no need to set minOverlap=1 in this call, as this argument is ignored when you supply a consensus dataset using the peaks parameter.

score 0 · Answer 2 · 2021-01-20

0

Entering edit mode

mdidish ▴ 10

@cbee3b30

Last seen 3 months ago

France

Thank you for your reply.

Finally, I just identified the problem: in the table of enhancer regions, some features has a length of 0 (!), with chromStart = chromEnd. After removing these features, dba.count runs without error.

Regarding minOverlap=1, I have to set this argument during my tests where I kept only one sample in ATAC_sampleSheet.csv (to decrease computation time), otherwise I had an error message minOverlap can not be greater than the number of peaksets [1].

But how retrieve summit value for each peak? I read:

summit heights (read pileup) and locations will be calculated for each peak. The values can retrieved using dba.peakset

When i do
peaks.list <- dba.peakset(DBA = exp.dba, bRetrieve = T, DataType = "DBA_DATA_FRAME")
it seems that retrieved values are more of a coverage (total reads number including in each feature) than the summit of the peak?

Best,

ADD COMMENT • link 3.9 years ago mdidish ▴ 10

1

Entering edit mode

I'm checking in a fix for the minOverlap error message, thanks for pointing it out!

You need to set score = DBA_SCORE_SUMMIT_POS to have the scores be the summit position. You can either do this in the original call to dba.count(), or call it a second time (by setting peaks=NULL, it will not re-count):

exp.dba <- dba.count(DBA = exp.dba, peaks = NULL, minOverlap = 1, 
                     summits = TRUE,  score = DBA_SCORE_SUMMIT_POS)
summit.positions <- dba.peakset(exp.dba, bRetrieve=TRUE)