This question may have been asked before, I have searched the mailing list and I can not find an answer. The question is about the correct way of setting the SizeFactors() in DESeq2 in 3 situations. I would like to double check with you. Although the R code that I do list below refers to DESeq2, the question applies equally to edgeR or to limma / voom. I would appreciate having your suggestions.
I am working on 4 ChIP-seq datasets (2 DMSO samples and 2 TREAT samples). The input is a list of PEAKS with raw counts in all these 4 conditions (2 DMSO, and 2 TREAT). I am doing the analysis in 3 ways :
<> normalization to # Spike-ins (The Spike-ins numerical values have been estimated). I am setting the SizeFactors as the ratio between each Spike-in # , divided by the minimum # of all Spike-ins. Is it the correct way of doing it ?
spikeins=c(68924, 80282, 52023, 47090) spikeins_norm = spikeins / min(spikeins) sizeFactors(dds) = spikeins_norm dds <- estimateDispersions(dds) dds <- nbinomLRT(dds, reduced= ~1) res <- results(dds, name = "group_TZO_vs_DMSO")
<> normalization to Total # or Aligned Reads (in BAM files). I am setting the Size Factors() in the same way as I do it for Spike-Ins (Certainly, considering the Total # of Aligned Reads is not a recommended way to set the SizeFactors(), I am just doing it for comparison purposes).
counts_BAM=c(1489249, 1568028, 1616202, 1647090) counts_BAM_norm = counts_BAM / min(counts_BAM) sizeFactors(dds) = counts_BAM_norm dds <- estimateDispersions(dds) dds <- nbinomLRT(dds, reduced= ~1) res <- results(dds, name = "group_TZO_vs_DMSO")
<> normalization to # Reads in the Peaks. In this case, I follow the standard R workflow :
dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) dds <- nbinomWaldTest(dds)
Please let me know what you think. Thanks,