DiffBind - Error in dba.count() when passing mask or peaks
3
0
Entering edit mode
@gaiazaffaroni-9815
Last seen 8.8 years ago

Hello,

I have a problem with the dba.count() function.

I have two conditions with 3 replicates each, and when I do

sample = dba(sampleSheet=sampleSheet, peakFormat='bed')
sample=dba.count(sample)

or even

sample=dba.count(sample,minOverlap=6)

I have no problem. Anyway, when I try to find consensus peaksets for the two condition separately and then count with their overlap, dba.count() is not working.

prova=dba.peakset(prova,consensus=DBA_CONDITION,minOverlap=2)
peaks=dba.peakset(prova,prova$masks$Consensus,bRetrieve=T)

prova = dba.count(prova,peaks=peaks) ###NOT WORKING
prova = dba.count(prova,peaks=prova$masks$Consensus)    #NOT WORKING

Error in if (is.unsorted(unique(pv$vectors[, 1]))) { :
  missing value where TRUE/FALSE needed

traceback()
5: pv.vectors(model, mask = mask, minOverlap = minOverlap, bKeepAll = bKeepAll,
       bAnalysis = bAnalysis, attributes = attributes)
4: pv.model(spare)
3: pv.CalledMasks(pv, res, bed)
2: pv.counts(DBA, peaks = peaks, minOverlap = minOverlap, defaultScore = score,
       bLog = bLog, insertLength = fragmentSize, bOnlyCounts = T,
       bCalledMasks = TRUE, minMaxval = filter, bParallel = bParallel,
       bUseLast = bUseLast, bWithoutDupes = bRemoveDuplicates, bScaleControl = bScaleControl,
       filterFun = filterFun, bLowMem = bUseSummarizeOverlaps, readFormat = readFormat,
       summits = summits, minMappingQuality = mapQCth)

What is wrong?

Thanks in advance,

Gaia

diffbind • 4.9k views
ADD COMMENT
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 5 weeks ago
Cambridge, UK

Hi Gaia-

Three things that may be useful to help in tracking this down:

  • Can you let me know which version of DiffBind you are working with (via sessionInfo())?
  • Run the dba.count() call with bParallel=FALSE; in serial mode it prints out each file as it counts so we can identify which file is the problem.
  • If you could sen me a copy of the "prova" DBA object (or link where I can download) , I can have a look to see if something is going wrong with the consensus peaks.

Cheers-

Rory

ADD COMMENT
0
Entering edit mode

Hi Rory, thanks for your help.

I am using DiffBind_1.16.3. This is what I get without parallel:

prova = dba.count(prova,peaks=prova$masks$Consensus,bParallel=FALSE)
Sample: CHN080-alignedreads.bam125
Sample: CHN081-alignedreads.bam125
Sample: CHN082-alignedreads.bam125
Sample: CHN083-alignedreads.bam125
Sample: CHN084-alignedreads.bam125
Sample: CHN085-alignedreads.bam125
Error in if (is.unsorted(unique(pv$vectors[, 1]))) { :
  missing value where TRUE/FALSE needed

You can find the "prova" object here:https://drive.google.com/file/d/0B5haNz0A0-UVOW11TnlzcGhGTDA/view?usp=sharing

ADD REPLY
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 5 weeks ago
Cambridge, UK

Hi Gaia-

If I understand correctly, you want to use a consensus peaksets made up of all the peaks that appear in two out of three replicates of each condition, right?

The trick here is to compute the consensus peaksets in a separate DBA object than the "main" one:

> prova.cons <- dba.peakset(prova,consensus=DBA_CONDITION,minOverlap=2)
> peaks <- dba.peakset(prova.cons,prova.cons$masks$Consensus,bRetrieve=T)

Then use the main DBA object to continue the counting and analysis:

> prova <- dba.count(prova,peaks=peaks) ##SHOULD WORK

If you want a consensus set that includes peaks identified in at least two replicates in either condition (the more usual consensus for this sort of analysis, as peaks that are identified consistently in one condition but not the either are likely to be of interest in a differential binding analysis), you would do:

> peaks <- dba.peakset(prova.cons,prova.cons$masks$Consensus,minOverlap=1,bRetrieve=T)
> prova <- dba.count(prova,peaks=peaks)

Hope this helps!

Cheers-

Rory

P.S. I've added a more clear error message in the development version to replace the obscure error if someone tries to run dba.count() using a DBA object that contains Consensus peaksets(s).

ADD COMMENT
0
Entering edit mode

Hi Rory,

yes that was my idea. Thanks for pointing out that it is not the usual approach, I will go for the usual consensus then.

For sake of completeness, I went on trying your suggestion, but the same error comes up, even if I use a different DBA object to compute the consensus peaks.

Cheers,

Gaia

 

ADD REPLY
0
Entering edit mode

Let me think on this a bit!

ADD REPLY
0
Entering edit mode
@kmavrommatis-10137
Last seen 8.6 years ago

Hi,

I have the same problem,

I load the dba object with:

dbaObj=dba(sampleSheet = df.dba.ss,bCorPlot=FALSE, minOverlap = 1, attributes=c(DBA_CONDITION,DBA_ID,DBA_CONTROL))

then calculate the consensus peaks

dbaObj=dba.peakset( dbaObj, consensus=DBA_CONDITION,minOverlap = 3)

peaksworth=dba.peakset(  dba( dbaObj,mask=dbaObj3$masks$Consensus),  bRetrieve=TRUE)

dbaObj.reads=dba.count( dbaObj, peaks=unique(peaksworth),summits = TRUE)

and I get:

Error in if (is.unsorted(unique(pv$vectors[, 1]))) { : 
  missing value where TRUE/FALSE needed 
5 pv.vectors(model, mask = mask, minOverlap = minOverlap, bKeepAll = bKeepAll, 
    bAnalysis = bAnalysis, attributes = attributes) 
4 pv.model(spare) 
3 pv.CalledMasks(pv, res, bed) 
2 pv.counts(DBA, peaks = peaks, minOverlap = minOverlap, defaultScore = score, 
    bLog = bLog, insertLength = fragmentSize, bOnlyCounts = T, 
    bCalledMasks = TRUE, minMaxval = filter, bParallel = bParallel, 
    bUseLast = bUseLast, bWithoutDupes = bRemoveDuplicates, bScaleControl = bScaleControl,  ... 
1 dba.count(dbaObj, peaks = unique(peaksworth), 
    summits = TRUE) 

 

 

Any advice welcome.

Thanks

Kostas

ADD COMMENT
0
Entering edit mode

I am answering my own question for future reference.

As it turns out the problem was the presence of non standard chromosomes in the peaksets.

After I processed the input files and kept only chr1.. chr22, chrX, chrY the program worked without problems.

R version 3.2.2 (2015-08-14)

DiffBind_1.16.3   

 

ADD REPLY
0
Entering edit mode

It may be worth trying this again with DiffBind 2.0, which should be released in the next few days as part of Bioconductor 3.3. The code for dealing with chromosome names has been completely re-written and will (hopefully) deal with non-standard cases now.

-R

ADD REPLY

Login before adding your answer.

Traffic: 712 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6