Question

Differential binding of ATAC-seq data with no replicates

1

Entering edit mode

BioinfGuru ▴ 20

@yagalbi-11519

Last seen 5 weeks ago

Ireland

Hi,

I have ATAC-seq data for 8 time points of 4 conditions --> 32 samples in total. The only replicates I had were technical which were pooled early in the pipeline. There are no biological replicates. I have completed MACS2 peak calling and now want the differential binding (DB). I would like to use HTseq to create the counts table and then DEseq2 to analyze for DB.

I have read a number of posts all replied with good advice from Micheal Love (DEseq2 extended to open chromatin anlaysis: normalization, dispersion fit, and "too many" differences, Replicate for DESeq2 and A: GFOLD file as input for DESeq2) regarding DEseq2, ATAC-seq and replicates....but none that address what to do when there are no biological replicates. Can HTseq and DEseq2 run with no replicates?

Also I'm a bit confused as to what needs to be counted. I have the BAM files and the output of MACS2 I have filtered to remove overlapping peaks (as per guidance). So do I simply count the number of reads in the original data that align to the remaining non-overlapping uniquely named peaks (separately for each sample) - is that what the count table (of each individual sample) should consist of? For example:

peak_name read count

peak1 452

peak2 34

peak3 458

I will be monitoring this post so let me know if I can clarify question(s) or provide additional information.

Thank you all for your help,

Kenneth

ATAC differential binding analysis no replicates DEseq2 chromatin • 4.5k views

ADD COMMENT • link 7.1 years ago BioinfGuru ▴ 20

score 3 · Answer 1 · 2017-03-18

3

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 11 hours ago

WEHI, Melbourne, Australia

Actually the approach of calling peaks separately in each sample would lead to overestimation of significance in a DB analysis even if you did have replicates. See

https://doi.org/10.1093/nar/gku351

for an explanation of why this is so.

ADD COMMENT • link 7.1 years ago Gordon Smyth 50k

0

Entering edit mode

Thank you Gordon. That's my next read.

EDIT With 8 time points per condition, should I be pooling all 8 into a single file and peak call on that file?

ADD REPLY • link 7.1 years ago BioinfGuru ▴ 20

0

Entering edit mode

No, you would still be over-stating the differences between conditions. To follow the advice in our NAR article, you would pool all 32 libraries together for peak calling.

ADD REPLY • link 7.1 years ago Gordon Smyth 50k

0

Entering edit mode

Hi Gordan,

I really need to flesh this out a bit. It is quite a headache. So...

1) For downstream processing I need a consensus peak set. So ALL 32 samples must be merged. Then call peaks on the merged data to identify a consensus set of peaks. Then for each sample, count the number of 5’ read ends that are located within each peak region of the consensus peak set. Differential analysis is then carried out by comparing the counts of each individual sample.

2) To merge sample before peak calling - I assume I can just pass multiple files to MACS2 (-t A B C) rather than running bedtools merge.

3) Considering time and the fact that I have already called the peaks for each sample individually. Could I not just merge the peak files to get a consensus peak set instead as suggested by Devon Ryan and Sukhdeep Singh.

Kenneth.

ADD REPLY • link 7.1 years ago BioinfGuru ▴ 20

0

Entering edit mode

May I ask if you came to an answer to this question?

ADD REPLY • link 6.7 years ago ATpoint ★ 4.0k

score 1 · Answer 2 · 2017-03-17

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 4 hours ago

United States

Honestly, I don't think it's much use to run DESeq2 without replicates. The software allows for this analysis with large caveats (see the relevant section in ?results) that such an analysis is only vaguely exploratory. But you could just as well plot the log ratios of peak counts across conditions and skip running DESeq2.

ADD COMMENT • link 7.1 years ago Michael Love 41k

0

Entering edit mode

Thank you Micheal - I was quite alarmed when I was informed there were no replicates, but there is nothing I can do about it now except remind them to include a bioinformatician during the planning stage.

ADD REPLY • link 7.1 years ago BioinfGuru ▴ 20