DiffBind - Error in dba.count() when passing mask or peaks
3
0
Entering edit mode
@gaiazaffaroni-9815
Last seen 5.7 years ago

Hello,

I have a problem with the dba.count() function.

I have two conditions with 3 replicates each, and when I do

sample = dba(sampleSheet=sampleSheet, peakFormat='bed')
sample=dba.count(sample)

or even

sample=dba.count(sample,minOverlap=6)

I have no problem. Anyway, when I try to find consensus peaksets for the two condition separately and then count with their overlap, dba.count() is not working.

prova=dba.peakset(prova,consensus=DBA_CONDITION,minOverlap=2)
peaks=dba.peakset(prova,prova$masks$Consensus,bRetrieve=T)

prova = dba.count(prova,peaks=peaks) ###NOT WORKING
prova = dba.count(prova,peaks=prova$masks$Consensus)    #NOT WORKING

Error in if (is.unsorted(unique(pv$vectors[, 1]))) { : missing value where TRUE/FALSE needed traceback() 5: pv.vectors(model, mask = mask, minOverlap = minOverlap, bKeepAll = bKeepAll, bAnalysis = bAnalysis, attributes = attributes) 4: pv.model(spare) 3: pv.CalledMasks(pv, res, bed) 2: pv.counts(DBA, peaks = peaks, minOverlap = minOverlap, defaultScore = score, bLog = bLog, insertLength = fragmentSize, bOnlyCounts = T, bCalledMasks = TRUE, minMaxval = filter, bParallel = bParallel, bUseLast = bUseLast, bWithoutDupes = bRemoveDuplicates, bScaleControl = bScaleControl, filterFun = filterFun, bLowMem = bUseSummarizeOverlaps, readFormat = readFormat, summits = summits, minMappingQuality = mapQCth) What is wrong? Thanks in advance, Gaia diffbind • 2.7k views ADD COMMENT 0 Entering edit mode Rory Stark ★ 4.1k @rory-stark-5741 Last seen 20 days ago CRUK, Cambridge, UK Hi Gaia- Three things that may be useful to help in tracking this down: • Can you let me know which version of DiffBind you are working with (via sessionInfo())? • Run the dba.count() call with bParallel=FALSE; in serial mode it prints out each file as it counts so we can identify which file is the problem. • If you could sen me a copy of the "prova" DBA object (or link where I can download) , I can have a look to see if something is going wrong with the consensus peaks. Cheers- Rory ADD COMMENT 0 Entering edit mode Hi Rory, thanks for your help. I am using DiffBind_1.16.3. This is what I get without parallel: prova = dba.count(prova,peaks=prova$masksConsensus,bParallel=FALSE) Sample: CHN080-alignedreads.bam125 Sample: CHN081-alignedreads.bam125 Sample: CHN082-alignedreads.bam125 Sample: CHN083-alignedreads.bam125 Sample: CHN084-alignedreads.bam125 Sample: CHN085-alignedreads.bam125 Error in if (is.unsorted(unique(pvvectors[, 1]))) { :
missing value where TRUE/FALSE needed

You can find the "prova" object here:https://drive.google.com/file/d/0B5haNz0A0-UVOW11TnlzcGhGTDA/view?usp=sharing

0
Entering edit mode
Rory Stark ★ 4.1k
@rory-stark-5741
Last seen 20 days ago
CRUK, Cambridge, UK

Hi Gaia-

If I understand correctly, you want to use a consensus peaksets made up of all the peaks that appear in two out of three replicates of each condition, right?

The trick here is to compute the consensus peaksets in a separate DBA object than the "main" one:

> prova.cons <- dba.peakset(prova,consensus=DBA_CONDITION,minOverlap=2)
> peaks <- dba.peakset(prova.cons,prova.cons$masks$Consensus,bRetrieve=T)

Then use the main DBA object to continue the counting and analysis:

> prova <- dba.count(prova,peaks=peaks) ##SHOULD WORK

If you want a consensus set that includes peaks identified in at least two replicates in either condition (the more usual consensus for this sort of analysis, as peaks that are identified consistently in one condition but not the either are likely to be of interest in a differential binding analysis), you would do:

> peaks <- dba.peakset(prova.cons,prova.cons$masks$Consensus,minOverlap=1,bRetrieve=T)
> prova <- dba.count(prova,peaks=peaks)

Hope this helps!

Cheers-

Rory

P.S. I've added a more clear error message in the development version to replace the obscure error if someone tries to run dba.count() using a DBA object that contains Consensus peaksets(s).

0
Entering edit mode

Hi Rory,

yes that was my idea. Thanks for pointing out that it is not the usual approach, I will go for the usual consensus then.

For sake of completeness, I went on trying your suggestion, but the same error comes up, even if I use a different DBA object to compute the consensus peaks.

Cheers,

Gaia

0
Entering edit mode

Let me think on this a bit!

0
Entering edit mode
@kmavrommatis-10137
Last seen 5.5 years ago

Hi,

I have the same problem,

I load the dba object with:

dbaObj=dba(sampleSheet = df.dba.ss,bCorPlot=FALSE, minOverlap = 1, attributes=c(DBA_CONDITION,DBA_ID,DBA_CONTROL))

then calculate the consensus peaks

dbaObj=dba.peakset( dbaObj, consensus=DBA_CONDITION,minOverlap = 3)

peaksworth=dba.peakset(  dba( dbaObj,mask=dbaObj3$masks$Consensus),  bRetrieve=TRUE)

dbaObj.reads=dba.count( dbaObj, peaks=unique(peaksworth),summits = TRUE)

and I get:

Error in if (is.unsorted(unique(pv\$vectors[, 1]))) { :    missing value where TRUE/FALSE needed  5 pv.vectors(model, mask = mask, minOverlap = minOverlap, bKeepAll = bKeepAll,      bAnalysis = bAnalysis, attributes = attributes)  4 pv.model(spare)  3 pv.CalledMasks(pv, res, bed)  2 pv.counts(DBA, peaks = peaks, minOverlap = minOverlap, defaultScore = score,      bLog = bLog, insertLength = fragmentSize, bOnlyCounts = T,      bCalledMasks = TRUE, minMaxval = filter, bParallel = bParallel,      bUseLast = bUseLast, bWithoutDupes = bRemoveDuplicates, bScaleControl = bScaleControl,  ...  1 dba.count(dbaObj, peaks = unique(peaksworth),      summits = TRUE) 

Thanks

Kostas

0
Entering edit mode

I am answering my own question for future reference.

As it turns out the problem was the presence of non standard chromosomes in the peaksets.

After I processed the input files and kept only chr1.. chr22, chrX, chrY the program worked without problems.

R version 3.2.2 (2015-08-14)

DiffBind_1.16.3

0
Entering edit mode

It may be worth trying this again with DiffBind 2.0, which should be released in the next few days as part of Bioconductor 3.3. The code for dealing with chromosome names has been completely re-written and will (hopefully) deal with non-standard cases now.

-R