Hi, I am analyzing ATAC-Seq data to test for differentially open chromatin among two genotypes of mice (n = 5 per group). Following Lun & Smyth 2014, I combined reads from all 10 samples into a single BAM, called peaks (Genrich), and counted reads per peak region for each sample (featureCounts). I then ran limma following common guidance (example code below), and the resulting MA plot shows many more significantly negative logFCs (~4300) than positive logFCs (~900), especially at high mean logCPM.
To evaluate whether this pattern was a result of something specific in limma, I also tried edgeR and DESeq2 and get the same pattern (not shown). Is this something to be concerned about, and if so, do you have any suggestions to remedy it? Perhaps quantile normalization?
A related question is whether we can conclude that these patterns are due to inadequate data processing or if they reflect a true biological signal. We do not have any a priori reason to believe that openness would tend to decline in this comparison, nor do we have any reason to believe that there would be few or many significantly differentially open regions (we observe ~5k significant of 80k total peak regions in these tests).
Thank you for your help!
cnt = DGEList(counts.all) cnt$samples$genotype <- genotype design <- model.matrix(~0+genotype) keep.exprs <- filterByExpr(cnt, design) cnt <- cnt[keep.exprs, , keep.lib.sizes=FALSE] #Alternative filtering that also leads to same pattern ultimately #keep <- aveLogCPM(cnt) > 1 #cnt <- cnt[keep, , keep.lib.sizes=FALSE] cnt <- calcNormFactors(cnt, method = "TMM") v <- voom(cnt, design, plot=TRUE) contr.matrix <- makeContrasts( AvB = genotypeA-genotypeB, levels = colnames(design)) vfit = lmFit(v, v$design) vfit = contrasts.fit(vfit, contr.matrix) efit = eBayes(vfit, robust=TRUE) eres <- decideTests(efit, p.value = 0.05) plotMD(efit, column=1, status=abs(eres[,1]), main=colnames(efit), ylab="log2(fold-change)", xlab="Mean log(CPM)", legend=FALSE, cex=0.5) abline(h=0, col="darkgrey")