[DIFFBIND] batch effects and blocking factors
2
0
Entering edit mode
@giuseppe-gallone-6092
Last seen 10.2 years ago
Hi I have a group of samples for which I'd like to ascertain if differential binding is detectable based on a "condition" binary variable (stored in DBA_CONDITION). However, these samples have been processed in 4 batches (each batch has at least 3 samples). I would like to run a multifactorial analysis to regress the batch effect first, and then possibly analyse any remaining variance across the DBA_CONDITION contrast of interest. I understand it is possible to run such an analysis using blocking factors in dba.contrast. Let's say I store the 4 batch labels in DBA_TISSUE. The following: data = dba.contrast(data, categories=DBA_CONDITION, block=DBA_TISSUE) returns the following warning messages: Warning messages: 1: Blocking factor invalid for all contrasts: 2: No blocking values are present in both groups and data will not contain blocking factor information. Am I wrong in thinking that multiple contrasts can be used for the "block" argument? If I use only one contrast via mask (for example BATCH_1 VS !BATCH_1) this works correctly: data = dba.contrast(data, categories=DBA_CONDITION, block=data$masks$BATCH_1) however it will only block variance due to to this particular contrast, not all of them. A solution is, I suppose, do a differential analysis on all the contrasts one wishes to block, and identify the one which produces the highest number of variant sites: data = dba.contrast(data, categories=DBA_TISSUE) dba.analyze(data) ... #pick the contrast with the highest variance, eg BATCH_4, then do: data = dba.contrast(data, categories=DBA_CONDITION, block=data$masks$BATCH_4) However I was still wondering if there is a way to model all the variance due to the batch effects at once and the look at the residual variance for the real analysis. Thanks! Giuseppe
• 2.2k views
ADD COMMENT
0
Entering edit mode
@giuseppe-gallone-6092
Last seen 10.2 years ago
Hi again Would anyone be willing to help with the issue below? Best wishes Giuseppe On 18/06/14 20:39, Giuseppe Gallone wrote: > Hi > > I have a group of samples for which I'd like to ascertain if > differential binding is detectable based on a "condition" binary > variable (stored in DBA_CONDITION). > > However, these samples have been processed in 4 batches (each batch has > at least 3 samples). I would like to run a multifactorial analysis to > regress the batch effect first, and then possibly analyse any remaining > variance across the DBA_CONDITION contrast of interest. > > I understand it is possible to run such an analysis using blocking > factors in dba.contrast. Let's say I store the 4 batch labels in > DBA_TISSUE. The following: > > data = dba.contrast(data, categories=DBA_CONDITION, block=DBA_TISSUE) > > returns the following warning messages: > > Warning messages: > 1: Blocking factor invalid for all contrasts: > 2: No blocking values are present in both groups > > and data will not contain blocking factor information. > > Am I wrong in thinking that multiple contrasts can be used for the > "block" argument? If I use only one contrast via mask (for example > BATCH_1 VS !BATCH_1) this works correctly: > > data = dba.contrast(data, categories=DBA_CONDITION, > block=data$masks$BATCH_1) > > however it will only block variance due to to this particular contrast, > not all of them. > > A solution is, I suppose, do a differential analysis on all the > contrasts one wishes to block, and identify the one which produces the > highest number of variant sites: > > data = dba.contrast(data, categories=DBA_TISSUE) > dba.analyze(data) > ... > #pick the contrast with the highest variance, eg BATCH_4, then do: > > data = dba.contrast(data, categories=DBA_CONDITION, > block=data$masks$BATCH_4) > > However I was still wondering if there is a way to model all the > variance due to the batch effects at once and the look at the residual > variance for the real analysis. > > Thanks! > Giuseppe > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 56 minutes ago
WEHI, Melbourne, Australia
Dear Giuseppe, I can't help with DiffBind syntax, but the dba code you give is running edgeR glm functions in the background. You could use the edgeR functions directly and adjust for batch and blocking factors in the usual way that this is done in the edgeR. edgeR allows multiple blocking factors. Best wishes Gordon > Date: Tue, 24 Jun 2014 11:34:07 +0200 > From: Giuseppe Gallone <giuseppe.gallone at="" dpag.ox.ac.uk=""> > To: bioconductor at r-project.org > Subject: Re: [BioC] [DIFFBIND] batch effects and blocking factors > > Hi again > > Would anyone be willing to help with the issue below? > > Best wishes > Giuseppe > > On 18/06/14 20:39, Giuseppe Gallone wrote: >> Hi >> >> I have a group of samples for which I'd like to ascertain if >> differential binding is detectable based on a "condition" binary >> variable (stored in DBA_CONDITION). >> >> However, these samples have been processed in 4 batches (each batch has >> at least 3 samples). I would like to run a multifactorial analysis to >> regress the batch effect first, and then possibly analyse any remaining >> variance across the DBA_CONDITION contrast of interest. >> >> I understand it is possible to run such an analysis using blocking >> factors in dba.contrast. Let's say I store the 4 batch labels in >> DBA_TISSUE. The following: >> >> data = dba.contrast(data, categories=DBA_CONDITION, block=DBA_TISSUE) >> >> returns the following warning messages: >> >> Warning messages: >> 1: Blocking factor invalid for all contrasts: >> 2: No blocking values are present in both groups >> >> and data will not contain blocking factor information. >> >> Am I wrong in thinking that multiple contrasts can be used for the >> "block" argument? If I use only one contrast via mask (for example >> BATCH_1 VS !BATCH_1) this works correctly: >> >> data = dba.contrast(data, categories=DBA_CONDITION, >> block=data$masks$BATCH_1) >> >> however it will only block variance due to to this particular contrast, >> not all of them. >> >> A solution is, I suppose, do a differential analysis on all the >> contrasts one wishes to block, and identify the one which produces the >> highest number of variant sites: >> >> data = dba.contrast(data, categories=DBA_TISSUE) >> dba.analyze(data) >> ... >> #pick the contrast with the highest variance, eg BATCH_4, then do: >> >> data = dba.contrast(data, categories=DBA_CONDITION, >> block=data$masks$BATCH_4) >> >> However I was still wondering if there is a way to model all the >> variance due to the batch effects at once and the look at the residual >> variance for the real analysis. >> >> Thanks! >> Giuseppe ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT

Login before adding your answer.

Traffic: 576 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6