Question: DiffBind Batch Effect
0
gravatar for niu.shengyong
23 months ago by
niu.shengyong0 wrote:

I'm now running ATAC-seq differential analysis by DiffBind with peaks from MACS2. I have 10 samples from batch 3 and 20 samples from batch 5. How could I eliminate batch effect and normalize them in DiffBind? Sorry, I'm quite new in this area. Thanks!

Simon

diffbind atac • 1.2k views
ADD COMMENTlink modified 22 months ago • written 23 months ago by niu.shengyong0
Answer: DiffBind Batch Effect
0
gravatar for Rory Stark
22 months ago by
Rory Stark2.8k
CRUK, Cambridge, UK
Rory Stark2.8k wrote:

If you see a batch effect (eg in the PCA plots), the best way to handle them is to model them. We don't want to adjust the count matrix directly as that violates the assumption of the differential analyses in DESeq2 and edgeR

To model the batch, use the block parameter in dba.contrast(). For example, if the batch numbers were stored in the Replicate field, you would say block=DBA_REPLICATE. Then when you run dba.analyze(), it will do the analysis two ways: as a single-factor analysis, and with the "blocking" factor representing the batch. To use the results from the blocked analysis, you need to specify method=DBA_DESEQ2_BLOCK (or method=DBA_EDGER_BLOCK) when calling dba.report() or any of the plotting functions.

-Rory

ADD COMMENTlink written 22 months ago by Rory Stark2.8k

I have 8 samples from 2 different batches without replicates. How could I use the block argument?

Thanks!

ADD REPLYlink written 22 months ago by niu.shengyong0
1

Hopefully you have some replicates in the sense that multiple samples represent alternative experiments of the same condition.

The first step is to mark the batch using one of the metadata fields. You could use for example Condition, Treatment, or Replicate. Create a column in the sample sheet with one of these names, and for the sample in the first batch set the value to "1" and for the samples in the second batch set them to "2" (or "Batch1"and "Batch2" if you are not using the Replicate field).

For example, if the comparison you want to make is between two conditions, you would set the condition labels for each sample in the Condition column, and the corresponding batch number in the Replicate column. Then, after you have loaded the sample sheet using dba() and computed the read counts for the consensus set using dba.count(), you call dba.contrast() and dba.analyze():

> myDBA <- dba.contrast(myDBA, categories=DBA_CONDITION, block=DBA_REPLICATE)
> myDBA <- dba.analyze(myDBA)
> myDBA

In the contrast listing for myDBA, you should see values for two analyses: DB.DESeq2 and DB.DESEq2.block. The first is re[presents the results for a single-factor analysis, while the second reflects the additional modelling of the batch effect.

To use the results of the multi-factor analysis, you need to specify method=DBA_DESEQ2_BLOCK, eg:

> dba.plotMA(myDBA, method=DBA_DESEQ2_BLOCK)
> DBresults <- dba.report(myDBA, method=DBA_DESEQ2_BLOCK)
> plot(myDBA, contrast=1, method=DBA_DESEQ2_BLOCK)
> dba.plotPCA(myDBA, contrast=1, method=DBA_DESEQ2_BLOCK)

Some other tips:

  • You can set the name of the metadata field you are using to represent he batch in subsequent plots by setting: myDBA$config$replicate <- "Batch"
  • You can use the multi-factor analysis by default (don't need to specify the method= parameter every time) by setting: myDBA$config$AnalysisMethod <- DBA_DESEQ2_BLOCK

-Rory

ADD REPLYlink modified 22 months ago • written 22 months ago by Rory Stark2.8k

Hi Rory,

When I follow your suggestions, it shows the error: "Error in pv.DBAreport(pv=DBA, contrast = contrast, method = method: DESeq2 analysis has not been run for this contrast."

My codes:

ta <- dba(sampleSheet = "diff.csv")
ta <- dba.count(ta, summits=75)
ta = dba.contrast(ta, categories=DBA_CONDITION, minMembers = 2, block=DBA_REPLICATE)
ta <- dba.contrast(ta)
ta = dba.analyze(ta, method=DBA_DESEQ2_BLOCK)
ta.DB = dba.report(ta, contrast=1, method=DBA_DESEQ2_BLOCK, bCounts=TRUE)

It also shows the following output when I type "ta" after the second dba.contrast()

Group1 Members1 Group2 Member2

 Control               3          Test           5

 

Simon

ADD REPLYlink written 22 months ago by niu.shengyong0

I have 2 batches. One is "batch 3" and another one is "batch 5", so I put numerical value 3 or 5 in the Replicate column. 

ADD REPLYlink written 22 months ago by niu.shengyong0

Hi Simon-

My example code had an error (I've fixed it above). The second call to dba.contrast() should be a call to dba.analyze(). So in your code you should remove the second call to dba.contrast() and remove the "method=DBA_DESEQ2_BLOCK" parameter from the call to dba.analyze():

ta <- dba(sampleSheet = "diff.csv")
ta <- dba.count(ta, summits=75)
ta <- dba.contrast(ta, categories=DBA_CONDITION, minMembers = 2, 
                   block=DBA_REPLICATE)
ta <- dba.analyze(ta)
ta.DB <- dba.report(ta, contrast=1, method=DBA_DESEQ2_BLOCK, bCounts=TRUE)
ADD REPLYlink modified 22 months ago • written 22 months ago by Rory Stark2.8k
Answer: DiffBind Batch Effect
0
gravatar for niu.shengyong
22 months ago by
niu.shengyong0 wrote:

Hi Rory,

I get the message after dba.report:

Error in `colnames<-`(`*tmp*`, value = c("Chr", "Start", "End", "Conc",  : 
  'names' attribute [9] must be the same length as the vector [7]

I use the version of 2.2.12 and I knew it may be the version issue. Is there any workaround for it if I still need to use version 2.2.12? Thanks!

ADD COMMENTlink written 22 months ago by niu.shengyong0

If you email the DBA object (ta.DB) to me (or send me a link where I can download it), I can see what is going on. Email is on Bioconductor landing page for DiffBind.

-Rory

ADD REPLYlink written 22 months ago by Rory Stark2.8k

Just emailed to you. Thanks!

ADD REPLYlink written 22 months ago by niu.shengyong0
Answer: DiffBind Batch Effect
0
gravatar for niu.shengyong
22 months ago by
niu.shengyong0 wrote:

It works when I tried in another version; however, there is another error now:

It shows the following error when I try to plot heatmap or PCA plot. "Error in pv.getPlotData(DBA, attributes = attributes, contrast = contrast,  :  Only one site to plot -- need at least 2!".

 

Thanks!

ADD COMMENTlink written 22 months ago by niu.shengyong0

See answer for: DiffBind error Error in pv.getPlotData

ADD REPLYlink written 22 months ago by Rory Stark2.8k

Thanks Rory! After checking the PCA plot and correlation plot, I find there is still quite significant batch effect. Do you have any advice? Thanks!

ADD REPLYlink written 22 months ago by niu.shengyong0

The PCA and correlation plots may be misleading in this case. Modelling the batch effect in the GLM does not change the read counts in any way. So while the model used to identify differentially bound sites takes the batch effect into account, when you plot these sites using the count matrix, they will still exhibit the batch effect.

Short of altering the read counts for the plots (eg by singular value decomposition, as is used in SVA), we can't account for the batch effect in the plots. As DiffBind uses differential analysis methods that rely on raw reads (DESeq2 and edgeR), it avoids such alterations to the count matrix.

-R

ADD REPLYlink written 22 months ago by Rory Stark2.8k

Hi Rory,

 

Thanks for your help! In this case, how could I evaluate if there is still significant batch effect in my dataset after use the "BLOCK" arguments you mentioned? Thanks!

ADD REPLYlink written 22 months ago by niu.shengyong0

Having modelled the batch effect, the resulting statistics should be valid. In particular, I would expect that False Positive statistics to be valid. Yo may have more False Negatives formt he increased variance, but he sites that have low FDR should be good.

-R

ADD REPLYlink written 22 months ago by Rory Stark2.8k

Thanks for your help!

ADD REPLYlink written 21 months ago by niu.shengyong0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 84 users visited in the last hour