DiffBind R package issue
2
0
Entering edit mode
@niushengyong-12910
Last seen 7.6 years ago

I'm now using DiffBind R package to conduct differential analysis for ATAC-seq dataset. 
In this analysis, I want to compare each cell type in two conditions, so I divided the general sheet into enclosed sheets with different cell names and put them into DiffBind analysis (script: diff.R, diff.sh). It shows the following errors: (enclosed diff.sh.e8513532)

diff.R


#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
library(DiffBind)
setwd("/Simon/project/ATAC-seq/DiffBind")
ta <- dba(sampleSheet = basename(args[1]))
ta <- dba.count(ta)
ta = dba.contrast(ta, categories=DBA_CONDITION)   
ta = dba.analyze(ta, method=DBA_DESEQ2)
ta.DB = dba.report(ta, contrast=1, th=.01, method=DBA_DESEQ2, bCounts=TRUE)
fname = basename(args[1])

pdf(file=paste(fname, "_plotCorr.pdf", sep = "",collapse = NULL ))

plot(ta)
dev.off()

diff.sh


#!/bin/bash -l

module load R/3.3.2

for filename in /ATAC-seq/DiffBind/*diffSheet.csv ; do

Rscript diff.R  "$filename"

done

diff.sh.e8513532


32_B Blood  D  1 narrow
33_B Blood  D  1 narrow
36_B Blood  D  1 narrow
37_B Blood  D  1 narrow
38_B Blood  ND  1 narrow
30_B Blood  ND  1 narrow
54_B Blood  D  1 narrow
58_B Blood  ND  1 narrow
63_B Blood  D  1 narrow
converting counts to integer mode
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
Warning message:
No sites above threshold 
Error in pv.getPlotData(DBA, attributes = attributes, contrast = contrast,  : 
  Unable to plot -- no sites within threshold
Calls: dba.plotHeatmap -> pv.getPlotData
Execution halted


 

 

 

 

Thanks!

diffbind atac-seq • 4.1k views
ADD COMMENT
1
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 9 weeks ago
Cambridge, UK

Yes that is indeed a funny looking MA plot. It could be interesting to see the non-normalized version (bNormalized=FALSE in dba.plotMA) to make sure the normalisation process isn't doing something strange, but I doubt it.

It is interesting that there are some clear outliers that are not identified as being significantly DB. specifically the only below the line on the diagonal with log fold changes of -6 through -9 (points on the diagonal are generally regions that have no reads in one sample group [ND] and higher read levels in the second [T2D]). I would think this could be due to very high variance for the read counts for the T2D samples, which you can see if you set bCounts=TRUE in the call to dba.report.

For ATAC, it is worth looking at the ENCODE pipeline, at least through the point of peak calling. You can find it at:

https://www.encodeproject.org/atac-seq/

and a useful document with the exact calling parameters for MACS at:

https://docs.google.com/document/d/1f0Cm4vRyDQDu0bMehHD7P7KOMxTOP-HiNoIvL1VcBt8/edit#heading=h.9ecc41kilcvq

One other idea you can try to stabilize the read calls is to use the "summits" parameter in dba.count. This will identify the locus of greatest read pileup and create a "peak" of uniform size centered on this point. You will be looking for consistent changes in read density in narrower regions within the called ATAC peaks. I'd suggest setting summits=100, which results in 200bp regions (200bp before and after the summit). For ATAC you could even try a smaller value -- summits=75 will look at 150bp regions.

Cheers-

Rory

ADD COMMENT
0
Entering edit mode

I'll try to recall peak with BAMPE argument and redo the DiffBind. Thanks!

ADD REPLY
0
Entering edit mode

Hi Rory,

Another question, how could I eliminate the batch effect between samples and normalize them in DiffBind? Thanks! 

ADD REPLY
0
Entering edit mode

Discussion in issue: A: DiffBind Batch Effect

ADD REPLY
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 9 weeks ago
Cambridge, UK

When you run the analysis, it may be the case that no sites are identified as being differentially bound (DB) at a certain threshold.

The warning message "No sites above threshold" is being generated by the call to dba.report(), indicating that there are no sites with FDR<- 0.01. So when you call plot() (which maps to dba.plotHeatmap), there are no DB sites, hence nothing to plot.

You can check this condition as follows:

> ta.DB = dba.report(ta, contrast=1, th=.01, method=DBA_DESEQ2, bCounts=TRUE)
> if(!is.null(ta.DB)) {
>  fname = basename(args[1])
>  pdf(file=paste(fname, "_T2D_ATAC_Pval_plotCorr.pdf", sep = "",collapse = NULL ))
>  plot(ta)
>  dev.off()
> }

To see why nothing is turning up as being DB, it may help to look at MA and volcano plots, and to get a dba.report with th=1 (and bCounts=TRUE) to examine the (normalized) read counts. Sites will only have low FDR if the read counts in the two sample groups are a) in internally consistent (low variance between replicates) and b) consistently different between the sample groups.

Cheers-

Rory

ADD COMMENT
0
Entering edit mode

Hi Rory,

Really appreciate for your reply! I just generate the MA plot as you mentioned and found that there is zero point which has FDR < 0.05. The MA plot looks quite strange and the DB are not significant. Here are some of my questions:

1. The samples come from two different batches. Should I normalize them by myself before conduct DiffBind and how? 

2. DiffBind is designed for CHIP-seq. I'm not sure if the peak calling step for ATAC-seq would influence the results. Here is how I conduct peak calling by MACS:  

macs2 callpeak -t "$filename"  -f BED -g hs -nomodel --nolambda --keep-dup all --call-summits --outdir /ATAC-seq/callPeaks/  -n $(basename "$filename" .bed) -B -q 0.01 --bdg  --shift -100 --extsize 200

Thanks!

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6