Dear all,
This question may have been asked before, I have searched the mailing list and I can not find an answer. The question is about the correct way of setting the SizeFactors() in DESeq2 in 3 situations. I would like to double check with you. Although the R code that I do list below refers to DESeq2, the question applies equally to edgeR or to limma / voom. I would appreciate having your suggestions.
I am working on 4 ChIP-seq datasets (2 DMSO samples and 2 TREAT samples). The input is a list of PEAKS with raw counts in all these 4 conditions (2 DMSO, and 2 TREAT). I am doing the analysis in 3 ways :
<> normalization to # Spike-ins (The Spike-ins numerical values have been estimated). I am setting the SizeFactors as the ratio between each Spike-in # , divided by the minimum # of all Spike-ins. Is it the correct way of doing it ?
spikeins=c(68924, 80282, 52023, 47090)
spikeins_norm = spikeins / min(spikeins)
sizeFactors(dds) = spikeins_norm
dds <- estimateDispersions(dds)
dds <- nbinomLRT(dds, reduced= ~1)
res <- results(dds, name = "group_TZO_vs_DMSO")
<> normalization to Total # or Aligned Reads (in BAM files). I am setting the Size Factors() in the same way as I do it for Spike-Ins (Certainly, considering the Total # of Aligned Reads is not a recommended way to set the SizeFactors(), I am just doing it for comparison purposes).
counts_BAM=c(1489249, 1568028, 1616202, 1647090)
counts_BAM_norm = counts_BAM / min(counts_BAM)
sizeFactors(dds) = counts_BAM_norm
dds <- estimateDispersions(dds)
dds <- nbinomLRT(dds, reduced= ~1)
res <- results(dds, name = "group_TZO_vs_DMSO")
<> normalization to # Reads in the Peaks. In this case, I follow the standard R workflow :
dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)
Please let me know what you think. Thanks,
Bogdan
Referring to limma and edgeR, I could add that the R code that has been posted on the web pages (listed below) works and gives results that make sense.
Any suggestions about the R code in limma/voom ?