Question

DiffBind R package issue

0

Entering edit mode

niu.shengyong ▴ 10

@niushengyong-12910

Last seen 8.7 years ago

I'm now using DiffBind R package to conduct differential analysis for ATAC-seq dataset.
In this analysis, I want to compare each cell type in two conditions, so I divided the general sheet into enclosed sheets with different cell names and put them into DiffBind analysis (script: diff.R, diff.sh). It shows the following errors: (enclosed diff.sh.e8513532)

diff.R

#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
library(DiffBind)
setwd("/Simon/project/ATAC-seq/DiffBind")
ta <- dba(sampleSheet = basename(args[1]))
ta <- dba.count(ta)
ta = dba.contrast(ta, categories=DBA_CONDITION)   
ta = dba.analyze(ta, method=DBA_DESEQ2)
ta.DB = dba.report(ta, contrast=1, th=.01, method=DBA_DESEQ2, bCounts=TRUE)
fname = basename(args[1])

pdf(file=paste(fname, "_plotCorr.pdf", sep = "",collapse = NULL ))

plot(ta)
dev.off()

diff.sh

#!/bin/bash -l

module load R/3.3.2

for filename in /ATAC-seq/DiffBind/*diffSheet.csv ; do

Rscript diff.R  "$filename"

done

diff.sh.e8513532

32_B Blood  D  1 narrow
33_B Blood  D  1 narrow
36_B Blood  D  1 narrow
37_B Blood  D  1 narrow
38_B Blood  ND  1 narrow
30_B Blood  ND  1 narrow
54_B Blood  D  1 narrow
58_B Blood  ND  1 narrow
63_B Blood  D  1 narrow
converting counts to integer mode
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
Warning message:
No sites above threshold 
Error in pv.getPlotData(DBA, attributes = attributes, contrast = contrast,  : 
  Unable to plot -- no sites within threshold
Calls: dba.plotHeatmap -> pv.getPlotData
Execution halted

Thanks!

diffbind atac-seq • 4.7k views

ADD COMMENT • link updated 8.8 years ago by Rory Stark ★ 5.2k • written 8.8 years ago by niu.shengyong ▴ 10

score 1 · Answer 1 · 2017-04-27

Yes that is indeed a funny looking MA plot. It could be interesting to see the non-normalized version (bNormalized=FALSE in dba.plotMA) to make sure the normalisation process isn't doing something strange, but I doubt it.

It is interesting that there are some clear outliers that are not identified as being significantly DB. specifically the only below the line on the diagonal with log fold changes of -6 through -9 (points on the diagonal are generally regions that have no reads in one sample group [ND] and higher read levels in the second [T2D]). I would think this could be due to very high variance for the read counts for the T2D samples, which you can see if you set bCounts=TRUE in the call to dba.report.

For ATAC, it is worth looking at the ENCODE pipeline, at least through the point of peak calling. You can find it at:

https://www.encodeproject.org/atac-seq/

and a useful document with the exact calling parameters for MACS at:

https://docs.google.com/document/d/1f0Cm4vRyDQDu0bMehHD7P7KOMxTOP-HiNoIvL1VcBt8/edit#heading=h.9ecc41kilcvq

One other idea you can try to stabilize the read calls is to use the "summits" parameter in dba.count. This will identify the locus of greatest read pileup and create a "peak" of uniform size centered on this point. You will be looking for consistent changes in read density in narrower regions within the called ATAC peaks. I'd suggest setting summits=100, which results in 200bp regions (200bp before and after the summit). For ATAC you could even try a smaller value -- summits=75 will look at 150bp regions.

Cheers-

Rory

score 0 · Answer 2 · 2017-04-26

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 12 months ago

Cambridge, UK

When you run the analysis, it may be the case that no sites are identified as being differentially bound (DB) at a certain threshold.

The warning message "No sites above threshold" is being generated by the call to dba.report(), indicating that there are no sites with FDR<- 0.01. So when you call plot() (which maps to dba.plotHeatmap), there are no DB sites, hence nothing to plot.

You can check this condition as follows:

> ta.DB = dba.report(ta, contrast=1, th=.01, method=DBA_DESEQ2, bCounts=TRUE)
> if(!is.null(ta.DB)) {
>  fname = basename(args[1])
>  pdf(file=paste(fname, "_T2D_ATAC_Pval_plotCorr.pdf", sep = "",collapse = NULL ))
>  plot(ta)
>  dev.off()
> }

To see why nothing is turning up as being DB, it may help to look at MA and volcano plots, and to get a dba.report with th=1 (and bCounts=TRUE) to examine the (normalized) read counts. Sites will only have low FDR if the read counts in the two sample groups are a) in internally consistent (low variance between replicates) and b) consistently different between the sample groups.

Cheers-

Rory

ADD COMMENT • link 8.8 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Hi Rory,

Really appreciate for your reply! I just generate the MA plot as you mentioned and found that there is zero point which has FDR < 0.05. The MA plot looks quite strange and the DB are not significant. Here are some of my questions:

1. The samples come from two different batches. Should I normalize them by myself before conduct DiffBind and how?

2. DiffBind is designed for CHIP-seq. I'm not sure if the peak calling step for ATAC-seq would influence the results. Here is how I conduct peak calling by MACS:

macs2 callpeak -t "$filename"  -f BED -g hs -nomodel --nolambda --keep-dup all --call-summits --outdir /ATAC-seq/callPeaks/  -n $(basename "$filename" .bed) -B -q 0.01 --bdg  --shift -100 --extsize 200

Thanks!

ADD REPLY • link 8.8 years ago niu.shengyong ▴ 10

0

Entering edit mode

It's the MA plot . Thanks!

https://drive.google.com/file/d/0ByJ9Aiypgye9czVndENpUXFjTlk/view?usp=sharing

ADD REPLY • link 8.8 years ago niu.shengyong ▴ 10