Entering edit mode
Hi Giuseppe-
Sorry about the delay, Gord and I were both off camping in the
Highlands,
far away from the intertubes...
While it would help if I could see the who sample sheet (esp. the
CONDITION and TISSUE values), I suspect that the batches are conflated
with the conditions (such that all the samples in some batches have
the
same condition). By default, DiffBind handles the most straightforward
cases where the blocking effect includes samples on both sides of the
primary contrast (such as a matched design like tumor/normal).
However you can control the "block" parameter by supplying a list of
vectors. In your case, each vector would represent a batch, and
contain
the numbers of the samples in that batch. The man page for
dba.contrast
documents this somewhat and includes a simple example.
If you want to send along the sample sheet, or even a DBA object with
the
metadata, I can take a look and offer suggestions.
Cheers-
Rory
>Date: Tue, 24 Jun 2014 11:34:07 +0200
>From: Giuseppe Gallone <giuseppe.gallone at="" dpag.ox.ac.uk="">
>To: bioconductor at r-project.org
>Subject: Re: [BioC] [DIFFBIND] batch effects and blocking factors
>Message-ID: <53A9460F.9020608 at dpag.ox.ac.uk>
>Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>Hi again
>
>Would anyone be willing to help with the issue below?
>
>Best wishes
>Giuseppe
>
>On 18/06/14 20:39, Giuseppe Gallone wrote:
>> Hi
>>
>> I have a group of samples for which I'd like to ascertain if
>> differential binding is detectable based on a "condition" binary
>> variable (stored in DBA_CONDITION).
>>
>> However, these samples have been processed in 4 batches (each batch
has
>> at least 3 samples). I would like to run a multifactorial analysis
to
>> regress the batch effect first, and then possibly analyse any
remaining
>> variance across the DBA_CONDITION contrast of interest.
>>
>> I understand it is possible to run such an analysis using blocking
>> factors in dba.contrast. Let's say I store the 4 batch labels in
>> DBA_TISSUE. The following:
>>
>> data = dba.contrast(data, categories=DBA_CONDITION,
block=DBA_TISSUE)
>>
>> returns the following warning messages:
>>
>> Warning messages:
>> 1: Blocking factor invalid for all contrasts:
>> 2: No blocking values are present in both groups
>>
>> and data will not contain blocking factor information.
>>
>> Am I wrong in thinking that multiple contrasts can be used for the
>> "block" argument? If I use only one contrast via mask (for example
>> BATCH_1 VS !BATCH_1) this works correctly:
>>
>> data = dba.contrast(data, categories=DBA_CONDITION,
>> block=data$masks$BATCH_1)
>>
>> however it will only block variance due to to this particular
contrast,
>> not all of them.
>>
>> A solution is, I suppose, do a differential analysis on all the
>> contrasts one wishes to block, and identify the one which produces
the
>> highest number of variant sites:
>>
>> data = dba.contrast(data, categories=DBA_TISSUE)
>> dba.analyze(data)
>> ...
>> #pick the contrast with the highest variance, eg BATCH_4, then do:
>>
>> data = dba.contrast(data, categories=DBA_CONDITION,
>> block=data$masks$BATCH_4)
>>
>> However I was still wondering if there is a way to model all the
>> variance due to the batch effects at once and the look at the
residual
>> variance for the real analysis.
>>
>> Thanks!
>> Giuseppe