DiffBind : More fragment sizes than libraries
3
0
Entering edit mode
ZheFrench ▴ 60
@zhefrench-11689
Last seen 3 months ago
France

How to correct this error More fragment sizes than libraries in DiffBind Package ? What does that means ?

diffbind • 2.2k views
ADD COMMENT
1
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 6 weeks ago
Cambridge, UK

This error occurs in a call to dba.count() if the fragmentSize parameter is a vector whose length is greater than the number of libraries (unique BAM files). Either you are specifying a fragmentSize value directly, or the default value has changed. Check:

> myDBA$config$fragmentSize

By default, this is set to 125.

-Rory

ADD COMMENT
0
Entering edit mode

 

# Here I want to keep a subset to make the count

dba_chip_masked <- dba.mask(dba_chip_init, DBA_FACTOR, c("K27AC") , bApply=T)
​


#' ChIPQCexperiment Object is returned so you need to coerce it to a new DBA object

dba_chip <- dba(dba_chip_masked)

print("Use a subset")  

dba.show(dba_chip)

# It seems fragmentSize keep all the samples. not only my subset , it's why i'm getting an error. 

dba_chip$config$fragmentSize
   L13S13    L142S14     L17S17    L182S18     L15S15    L162S16       L7S7 

       141        147        140        141        149        143        163 
      L8S8     L11S11     L12S12       L9S9     L10S10       L1S1       L2S2 
       150        152        155        142        143        149        143 
      L5S5       L6S6       L3S3       L4S4   T1_INPUT   T7_INPUT unT7_INPUT 
       140        137        145        154        135        136        136 

So I will try to do the count before creating the subset with dba.mask. But I did that to leverage the count process,to go faster , to count only the library I will process, that was the idea behind.

 

ADD REPLY
1
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 6 weeks ago
Cambridge, UK

You have identified bugs in both dba.mask() (not building the correct object) and dba() (not masking the fragmentSize vector). I have checked in fixes for these to DiffBind_2.4.5.

When the fixes are available in the next day or two, you don't need to use dba.mask() at all here. You can do this in one step directly using dba():

> dba_chip <- dba(dba_chip_init, mask=dba_chip_init$masks$K27AC)

If dba_chip_init is a ChIPQexperiment object, you can get the masked DBA object in a couple of ways. Either:

> dba_chip <- dba(dba_chip_init, mask=dba_chip_init@DBA$masks$K27AC)

or

> dba_chip_dba <- dba_chip_init@DBA
> dba_chip <- dba(dba_chip_dba, mask=dba_chip_dba@DBA$masks$K27AC)

If you need a workaround immediately, before the fixes are available, you can use:

> dba_chip <- dba(dba_chip_init, mask=dba_chip_init@DBA$masks$K27AC)
> dba_chip$config$fragmentSize <- dba_chip$config$fragmentSize[dba_chip_init@DBA$masks$K27AC]

Cheers-

Rory

 

ADD COMMENT
0
Entering edit mode

For the workaround : 

dba_chip <- dba(dba_chip_init, mask=dba_chip_init@DBA$masks$K27AC)In dba(dba_chip_init, mask = dba_chip_init@DBA$masks$K27AC) :

  Returning new DBA object (not ChIPQCexperiment object)

dba.show(dba_chip, bContrast=T)

[1] "contrast"
Warning message:
In checkQCobj(DBA$ChIPQCobj, res) :
  ChIPQexperiment out of sync -- returning new DBA object

There are warnings but it seems to work.

ADD REPLY
0
Entering edit mode

 I'm not sure what dba_chip_ctr is, how was that derived?

ADD REPLY
0
Entering edit mode
Update : _ctr removed.

dba_chip <- dba(dba_chip_init, mask=dba_chip_init@DBA$masks$K27AC)
dba_chip$config$fragmentSize <- dba_chip$config$fragmentSize[dba_chip_init@DBA$masks$K27AC]

print("contrast")
dba_chip <- dba.contrast(dba_chip,categories=DBA_CONDITION,minMembers=2) 

dba.show(dba_chip, bContrast=T)
dba.show(dba_chip)

 

Warning message:
In dba(dba_chip_init, mask = dba_chip_init@DBA$masks$K27AC) :
  Returning new DBA object (not ChIPQCexperiment object)
[1] "contrast"
Warning message:
In checkQCobj(DBA$ChIPQCobj, res) :
  ChIPQexperiment out of sync -- returning new DBA object
  Group1 Members1 Group2 Members2
1     T1        2     T7        2
2     T1        2   unT7        2
3     T7        2   unT7        2

....

ADD REPLY
0
Entering edit mode

As I am into it I continue to post my trouble in the same thread. 

[1] "analyse for 3 contrasts..."

dba_chip <- dba.analyze(dba_chip,bParallel=T)

UPDATE : bParallel=F correct the bug but still have warning at the end : 

In checkQCobj(DBA$ChIPQCobj, res) :
  ChIPQexperiment out of sync -- returning new DBA object


converting counts to integer mode
converting counts to integer mode
converting counts to integer mode
gene-wise dispersion estimates
gene-wise dispersion estimates
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
mean-dispersion relationship
mean-dispersion relationship
final dispersion estimates
final dispersion estimates
Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) : 
  long vectors not supported yet: fork.c:376
Calls: dba.analyze ... <Anonymous> -> mclapply -> lapply -> FUN -> sendMaster
Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) : 
  long vectors not supported yet: fork.c:376
Calls: dba.analyze ... <Anonymous> -> mclapply -> lapply -> FUN -> sendMaster
Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) : 
  long vectors not supported yet: fork.c:376
Calls: dba.analyze ... <Anonymous> -> mclapply -> lapply -> FUN -> sendMaster
Warning messages:
1: In mclapply(arglist, fn, ..., mc.preschedule = TRUE, mc.allow.recursive = TRUE) :
  all scheduled cores encountered errors in user code
2: In drec$counts <- NULL : Coercing LHS to a list
3: In drec$counts <- NULL : Coercing LHS to a list
4: In drec$counts <- NULL : Coercing LHS to a list
5: In checkQCobj(DBA$ChIPQCobj, res) :
  ChIPQexperiment out of sync -- returning new DBA object

 

ADD REPLY
0
Entering edit mode

By the way, I'm using DiffBind_2.2.12 R 3.3.1 On 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1 so maybe bugs have already been correct in 4.4. Do not know.

ADD REPLY
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 6 weeks ago
Cambridge, UK

You will need to update to Bioconductor 3.5 to get the fixes I am making, but I may have a workaround for you.

Basically what is happening relates to issues (mostly warnings) around working directly on the DBA object created within a ChIPQCexperiment object. The workaround is to extract a "clean" DBA object and just work on that in parallel. Here's workaround code that should eliminate the warnings:

> dba_chip_dba <- dba_chip_init@DBA
> dba_chip_dba$ChIPQCobj <- NULL
> dba_chip <- dba(dba_chip_dba, mask=dba_chip_dba$masks$K27AC)

At this point, the dba_chip object will be a pure DiffBind object without any link back to the ChIPQCexperiment object and it should work fine in the version you are using.

-Rory

ADD COMMENT
0
Entering edit mode

Thanks but still : 

Error: trying to get slot "DBA" from an object (class "DBA") that is not an S4 object 
Execution halted

I'm not root on this machine and Bioconductor is 3.4 and can't upgrade more only if I upgrade R itself...That's a bit boring...

But anyway I will go with warnings ...

ADD REPLY
0
Entering edit mode

In this case your dba_chip_init objects looks like it is a DBA object (not a ChIPQCexperiment object). In that case, you can skip the first line and just set dba_chip_init$ChIPQCobj <- NULL and continue on your way:

-R

ADD REPLY
0
Entering edit mode

I sent you a private email with some questions not to overload this thread...Hope you ll take 10 minutes to answer . Thanks !

Also notice something weird thing with :  

dba.report(dba_chip,contrast=1,bCalled=TRUE,bCounts=TRUE,bCalledDetail=TRUE,ext='tsv',file=filename,initString="")

If you want to write file not in the current directory you can't pass absolute path via file because even if you set initString empty, it will add "_" to your absolute path...and there is no dir option , so you are force to write in directory where Rscript is launched.

 

 

 

 

ADD REPLY
1
Entering edit mode

I've checked in a change so that if you set initString=NULL, it will not add the "_", so you can specify a full filepath in the "file" parameter.

ADD REPLY
0
Entering edit mode

The second line should return the first number of the list , no ?

dba.overlap(dba_chip,mode=DBA_OLAP_RATE)
[1] 186450 102988  78376  65514  53238  43346

dba.overlap(dba_chip,mode=DBA_OLAP_RATE,1)
[1] 94888

UPDATE : It returns the number of intervals for the sample in  line 1 from your dba object. In doc it is said that it should return intervals present at least the minimum value you give. no ?

       ID Tissue Factor Condition Treatment Replicate Caller Intervals
1  L13S13  MCF10  K27AC        T1 Tamoxifen         1   macs     94888
2 L142S14  MCF10  K27AC        T1 Tamoxifen         3   macs    100680

From doc :

MODE: Compute overlap rates at different stringency thresholds: dba.overlap(DBA, mask, mode=DBA_OLAP_RATE, minVal)

Finaly minVal is not valid option because I'm using an old version...and in fact it was passing 1 to mask...

 

ADD REPLY
1
Entering edit mode

Not sure what you are trying to do with the second line? the value of 1 will be assigned to the second argument, mask, so this will return the number of peaks in the first sample. It is the same as saying:

dba_first <- dba(dba_chip, mask=1)
dba.overlap(dba_first,mode=DBA_OLAP_RATE)

If you just want the first value, try:

dba.overlap(dba_chip,mode=DBA_OLAP_RATE)[1]

-R

ADD REPLY
0
Entering edit mode

Why total number of peaks per condition plotted in VennDiagram is not the same as showed in the dba object ?

dba_chip_consensus <- dba.peakset(dba_chip, consensus=c(DBA_CONDITION),minOverlap=0.99)
dba.show(dba_chip_consensus)

       ID Tissue Factor Condition Treatment Replicate Caller Intervals
1  L13S13  MCF10  K27AC        T1 Tamoxifen         1   macs     94888
2 L142S14  MCF10  K27AC        T1 Tamoxifen         3   macs    100680
3  L17S17  MCF10  K27AC        T7 Tamoxifen         1   macs    106629
4 L182S18  MCF10  K27AC        T7 Tamoxifen         3   macs     87323
5  L15S15  MCF10  K27AC      unT7 unTreated         1   macs    106930
6 L162S16  MCF10  K27AC      unT7 unTreated         3   macs    109774
7      T1  MCF10  K27AC        T1 Tamoxifen       1-3   macs     71453
8      T7  MCF10  K27AC        T7 Tamoxifen       1-3   macs     69588
9    unT7  MCF10  K27AC      unT7 unTreated       1-3   macs     65616

In the tab the values for each consensus peaks per condition are 71453,69588,65616.

In the plot, when I make the sum, I get 65824 for T1...What is the reason of this difference  ?  

 

ADD REPLY
1
Entering edit mode

See this discussion:

A: Binding site overlaps

-R

ADD REPLY

Login before adding your answer.

Traffic: 1024 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6