Hi,
I am using the Diffbind package to do the differential binding analysis for my ATAC-seq library. I followed the instruction to use blocking methods for my library, because I assumed there were substantial batch effects. Since I didn't know which one of DESeq2_block and Edger_block would generate the best result for me, I tried my library with both of the normalization methods. I characterized log(fold change) > 1 and p-value <0.05 as "positively enriched peaks", and log(fold change) < -1 and p-value<0.05 as "negatively enriched peaks". However, DESeq2_block generated 1530 positively enriched peaks and 556 negatively enriched peaks, while EdgeR_block generated 606 positively enriched peaks and 1811 negatively enriched peaks. The 606 positively enriched peaks from EdgeR_block is almost a subset of 1530 positively enriched peaks generated by DESeq2, and no negatively enriched peaks were characterized as positively enriched peaks in the other methods. Therefore I have the following questions:
1) What is the best way to make a judgement call between these two blocking methods?
2) I was advised to look at the normalized data through the "ReportingTools" library, but this library requires a DESeqDataSet or DESeqResults as input. How do I extract the DESeqDataSet from DBA object properly? I tried to follow the manual by typing: atac.analyze$contrasts[[1]]$DESeq2$block$DEdata, but it does not seem to have $DEdata.
3) If I also want to extract the EdgeR object from DBA what should I do?
Thank you so much!
Ergang
My code:
atac.list <- dba(sampleSheet=filename)
atac.count <- dba.count(atac.list)
atac.contrast <- dba.contrast(atac.count, categories=DBA_CONDITION, block=DBA_REPLICATE, minMembers = 2)
atac.analyze <- dba.analyze(atac.contrast, method=DBA_ALL_METHODS)
###try to see the normalized signal values for a peak
dds <-atac.analyze$contrasts[[1]]$DESeq2$block$DEdata
des2Report <- HTMLReport(shortName = "Differential_analysis_screening_library",
title = 'differential analysis of screening library using DESeq2', reportDirectory ="./DESeq2/reports")
publish(dds,des2Report, pvalueCutoff=0.05,annotation.db=NULL,
factor = colData(dds)$facs, reportDir="./DESeq2/reports")
finish(des2Report)
To expand on Rory's answer: I should note that it is easy to achieve library size normalization in edgeR, simply by not performing TMM normalization, i.e., not running
calcNormFactors
. There's nothing mandatory about runningcalcNormFactors
in order to use edgeR, it just happens to be what we do to normalize RNA-seq data. In fact, you can (and I often do) put other normalization factors in there, optimized for a given type of data, e.g., Hi-C or ChIP-seq.Hey Rory,
Thank you so much for your help!
I made three MA plots according to your suggestion. I think it is a little bit tricky to tell if the non-normalized plot is above/below the zero line. Do you mind taking a look of the three graphs? I can email those to you.
Best,
Ergang
Yes, I can look, send them to me (or post them here).