Question

Mostly increased counts in differentially accessible peaks using DESeq2

0

Entering edit mode

Ricardo Iván • 0

@e3765820

Last seen 8 months ago

United States

Hello,

Looking for peace of mind really. We routinely perform bulk ATAC across multiple primary human cell types (fibros, immune cells). We recently did bulk ATAC-seq on human mammary epithelial cells and performed differential analysis with DESeq2. The design is 2 biological replicates of a 5 time-point time series. The counting step was done with summarizeOverlaps against an IDR-cleaned list of MACS3-detected peaks. We then contrasted each time point relative to the reference at Day 0. I have never seen this before, but DESeq2 only found mostly peaks with increased counts at one time point (Day 7), with close to 0 peaks with reduced counts. I have never seen such an unbalanced distribution (see MA plot and heatmap attached) heatmap . What is more interesting, is that when comparing against normalized signal tracks (by Deeptools), I can see several peaks with decreased counts relative to the Day 0 reference that DESeq2 did not catch (days going from top to bottom) browser . My padj cut-off is 0.05, which I think is still pretty relaxed. Has anyone seen something like this before? Not sure if I need to tweak something in the DESeq2 parameters or I should use something like featureCounts for the counting step.

Cheers for any insight,


# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )

ATAC-seq DESeq2 Oddresults • 1.0k views

ADD COMMENT • link updated 8 months ago by ATpoint ★ 4.9k • written 9 months ago by Ricardo Iván • 0

0

Entering edit mode

It is hard to comment here. My suggestion is:

label the IGV track, it's unclear which lane is which sample
normalize the track with the DESeq2 size factors (or has this been done already)
show the unshrunken MA-plot directly from results()
show a PCA
show actual DESeq2 code

ADD REPLY • link 8 months ago ATpoint ★ 4.9k

score 0 · Answer 1 · 2024-10-09

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 12 days ago

United States

The difference between tracks and DESeq2 (or any count method in Bioc) is that tracks use simple scaling that is not robust to large shifts, whereas the count tools compute the median ratio (or some other trimming/robust technique). This is not a perfect remedy, but the fact that you have a bimodal pattern in the LFC is worth extra care. Just dividing by millions of mapped reads won't work on this dataset for quantitative comparisons. I would do a plotMA but label the promoters of housekeeping genes, or some other a priori information you have about what may be stable here.

ADD COMMENT • link 8 months ago Michael Love 43k

0

Entering edit mode

Hi Michael,

Thanks for your reply. After reviewing the PCAs, maybe the issue that despite the replicates looking similar, there is still a lot of within-time point variability. PCA ATAC We will try adding two more replicates see if things get better.

Cheers,

ADD REPLY • link 8 months ago Ricardo Iván • 0

1

Entering edit mode

Keep in mind that if you add replication then you should do it for all conditions so you can meaningfully regress the inevitable batch effect.

ADD REPLY • link 8 months ago ATpoint ★ 4.9k