DiffBind dba.count error
2
0
Entering edit mode
mdidish ▴ 10
@cbee3b30
Last seen 2.5 years ago
France

Dear,

I would like to determine summit value for a list of enhancers from a public database.

I use these lines:

# load features table: 3 columns Chr, Start, End
features <- read.table("GeneCards_GeneHancer", header = T)
# create exp.dba object
exp.dba <- dba(sampleSheet = "ATAC_sampleSheet.csv")
# count with summits = T
exp.dba <- dba.count(DBA = exp.dba, peaks = features, minOverlap = 1, summits = T)

I obtain this message error :

Error in if (sum(tokeep) < length(tokeep)) { : 
  missing value where TRUE/FALSE needed

I read in several post that this problem can occurs on WindowsOSX machine and should be solved on Linux machine, but I run on Ubuntu 20.04, R 4.0.3.

Have you any suggestion?

Thanks,
Marc

DiffBind • 820 views
ADD COMMENT
0
Entering edit mode
Rory Stark ★ 5.1k
@rory-stark-5741
Last seen 7 days ago
Cambridge, UK

It would be useful to see what is actually in the features object. My guess is that the table of enhancer regions is not in a valid format. Optimally this is a GRanges object, and the chromosome names must match those in the reference genome used for alignment.

FYI, there is no need to set minOverlap=1 in this call, as this argument is ignored when you supply a consensus dataset using the peaks parameter.

ADD COMMENT
0
Entering edit mode
mdidish ▴ 10
@cbee3b30
Last seen 2.5 years ago
France

Thank you for your reply.

Finally, I just identified the problem: in the table of enhancer regions, some features has a length of 0 (!), with chromStart = chromEnd. After removing these features, dba.count runs without error.

Regarding minOverlap=1, I have to set this argument during my tests where I kept only one sample in ATAC_sampleSheet.csv (to decrease computation time), otherwise I had an error message minOverlap can not be greater than the number of peaksets [1].

But how retrieve summit value for each peak? I read:

  • summit heights (read pileup) and locations will be calculated for each peak. The values can retrieved using dba.peakset

When i do
peaks.list <- dba.peakset(DBA = exp.dba, bRetrieve = T, DataType = "DBA_DATA_FRAME")
it seems that retrieved values are more of a coverage (total reads number including in each feature) than the summit of the peak?

Best,

ADD COMMENT
1
Entering edit mode

I'm checking in a fix for the minOverlap error message, thanks for pointing it out!

You need to set score = DBA_SCORE_SUMMIT_POS to have the scores be the summit position. You can either do this in the original call to dba.count(), or call it a second time (by setting peaks=NULL, it will not re-count):

exp.dba <- dba.count(DBA = exp.dba, peaks = NULL, minOverlap = 1, 
                     summits = TRUE,  score = DBA_SCORE_SUMMIT_POS)
summit.positions <- dba.peakset(exp.dba, bRetrieve=TRUE)
ADD REPLY
0
Entering edit mode

ok thank you for this precision

ADD REPLY

Login before adding your answer.

Traffic: 551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6