Question

DiffBind - Error in dba.count() when passing mask or peaks

0

Entering edit mode

gaia.zaffaroni • 0

@gaiazaffaroni-9815

Last seen 8.1 years ago

Hello,

I have a problem with the dba.count() function.

I have two conditions with 3 replicates each, and when I do

sample = dba(sampleSheet=sampleSheet, peakFormat='bed')
sample=dba.count(sample)

or even

sample=dba.count(sample,minOverlap=6)

I have no problem. Anyway, when I try to find consensus peaksets for the two condition separately and then count with their overlap, dba.count() is not working.

prova=dba.peakset(prova,consensus=DBA_CONDITION,minOverlap=2)
peaks=dba.peakset(prova,prova$masks$Consensus,bRetrieve=T)

prova = dba.count(prova,peaks=peaks) ###NOT WORKING
prova = dba.count(prova,peaks=prova$masks$Consensus)    #NOT WORKING

Error in if (is.unsorted(unique(pv$vectors[, 1]))) { :
  missing value where TRUE/FALSE needed

traceback()
5: pv.vectors(model, mask = mask, minOverlap = minOverlap, bKeepAll = bKeepAll,
       bAnalysis = bAnalysis, attributes = attributes)
4: pv.model(spare)
3: pv.CalledMasks(pv, res, bed)
2: pv.counts(DBA, peaks = peaks, minOverlap = minOverlap, defaultScore = score,
       bLog = bLog, insertLength = fragmentSize, bOnlyCounts = T,
       bCalledMasks = TRUE, minMaxval = filter, bParallel = bParallel,
       bUseLast = bUseLast, bWithoutDupes = bRemoveDuplicates, bScaleControl = bScaleControl,
       filterFun = filterFun, bLowMem = bUseSummarizeOverlaps, readFormat = readFormat,
       summits = summits, minMappingQuality = mapQCth)

What is wrong?

Thanks in advance,

Gaia

diffbind • 4.5k views

ADD COMMENT • link updated 8.0 years ago by kmavrommatis • 0 • written 8.1 years ago by gaia.zaffaroni • 0

score 0 · Answer 1 · 2016-02-29

0

Entering edit mode

Rory Stark ★ 5.1k

@rory-stark-5741

Last seen 7 days ago

Cambridge, UK

Hi Gaia-

Three things that may be useful to help in tracking this down:

Can you let me know which version of DiffBind you are working with (via sessionInfo())?
Run the dba.count() call with bParallel=FALSE; in serial mode it prints out each file as it counts so we can identify which file is the problem.
If you could sen me a copy of the "prova" DBA object (or link where I can download) , I can have a look to see if something is going wrong with the consensus peaks.

Cheers-

Rory

ADD COMMENT • link 8.1 years ago Rory Stark ★ 5.1k

0

Entering edit mode

Hi Rory, thanks for your help.

I am using DiffBind_1.16.3. This is what I get without parallel:

prova = dba.count(prova,peaks=prova$masks$Consensus,bParallel=FALSE)
Sample: CHN080-alignedreads.bam125
Sample: CHN081-alignedreads.bam125
Sample: CHN082-alignedreads.bam125
Sample: CHN083-alignedreads.bam125
Sample: CHN084-alignedreads.bam125
Sample: CHN085-alignedreads.bam125
Error in if (is.unsorted(unique(pv$vectors[, 1]))) { :
  missing value where TRUE/FALSE needed

You can find the "prova" object here:https://drive.google.com/file/d/0B5haNz0A0-UVOW11TnlzcGhGTDA/view?usp=sharing

ADD REPLY • link 8.1 years ago gaia.zaffaroni • 0

score 0 · Answer 2 · 2016-02-29

Hi Gaia-

If I understand correctly, you want to use a consensus peaksets made up of all the peaks that appear in two out of three replicates of each condition, right?

The trick here is to compute the consensus peaksets in a separate DBA object than the "main" one:

> prova.cons <- dba.peakset(prova,consensus=DBA_CONDITION,minOverlap=2)
> peaks <- dba.peakset(prova.cons,prova.cons$masks$Consensus,bRetrieve=T)

Then use the main DBA object to continue the counting and analysis:

> prova <- dba.count(prova,peaks=peaks) ##SHOULD WORK

If you want a consensus set that includes peaks identified in at least two replicates in either condition (the more usual consensus for this sort of analysis, as peaks that are identified consistently in one condition but not the either are likely to be of interest in a differential binding analysis), you would do:

> peaks <- dba.peakset(prova.cons,prova.cons$masks$Consensus,minOverlap=1,bRetrieve=T)
> prova <- dba.count(prova,peaks=peaks)

Hope this helps!

Cheers-

Rory

P.S. I've added a more clear error message in the development version to replace the obscure error if someone tries to run dba.count() using a DBA object that contains Consensus peaksets(s).

score 0 · Answer 3 · 2016-04-19

Hi,

I have the same problem,

I load the dba object with:

dbaObj=dba(sampleSheet = df.dba.ss,bCorPlot=FALSE, minOverlap = 1, attributes=c(DBA_CONDITION,DBA_ID,DBA_CONTROL))

then calculate the consensus peaks

dbaObj=dba.peakset( dbaObj, consensus=DBA_CONDITION,minOverlap = 3)

peaksworth=dba.peakset( dba( dbaObj,mask=dbaObj3$masks$Consensus), bRetrieve=TRUE)

dbaObj.reads=dba.count( dbaObj, peaks=unique(peaksworth),summits = TRUE)

and I get:

Error in if (is.unsorted(unique(pv$vectors[, 1]))) { : missing value where TRUE/FALSE needed 5 pv.vectors(model, mask = mask, minOverlap = minOverlap, bKeepAll = bKeepAll, bAnalysis = bAnalysis, attributes = attributes) 4 pv.model(spare) 3 pv.CalledMasks(pv, res, bed) 2 pv.counts(DBA, peaks = peaks, minOverlap = minOverlap, defaultScore = score, bLog = bLog, insertLength = fragmentSize, bOnlyCounts = T, bCalledMasks = TRUE, minMaxval = filter, bParallel = bParallel, bUseLast = bUseLast, bWithoutDupes = bRemoveDuplicates, bScaleControl = bScaleControl, ... 1 dba.count(dbaObj, peaks = unique(peaksworth), summits = TRUE)

Any advice welcome.

Thanks

Kostas