I am using RUVseq and DEseq2 to analyze a RNA-seq dataset from a solid tumor cohort (n = 150). I followed the vignette: https://bioconductor.org/packages/release/bioc/vignettes/RUVSeq/inst/doc/RUVSeq.html. However, I tried to find the best k by running a loop and draw plots of p-values' distribution and perform KS-test. But I could not get an ideal p-values' distribution.
However, I found that there are some W correlated with the sizeFactors estimated by DEseq2. Therefore, I set the normalizationFactors of DEseq2 object as 1, and then I got an ideal p-values' distribution.
Can I set the normalizationFactors of DEseq2 object as 1 after using RUVg?
Here is the scripts.
K=1:19
k=1
plist=list()
for (k in K) {
set2 <- RUVg(x = set, cIdx = empirical, k=k)
pData=pData(set2)
pdf(file = paste0("pdf/plotRLE_RUVgk",k,".pdf"),width = 10,height = 5)
plotRLE(set2, outline=FALSE, ylim=c(-4, 4), col=colors[Group])
dev.off()
pdf(file = paste0("pdf/plotPCA_RUVgk",k,".pdf"),width = 10,height = 10)
plotPCA(set2, col=colors[Group], cex=1.2,labels=F)
dev.off()
vari=sort(colnames(pData)) %>% rev();vari
design_formula <- as.formula(paste("~", paste(vari, collapse = "+")));design_formula
countData = counts(set2)
colData = pData(set2)
dds <- DESeqDataSetFromMatrix(countData = countData,
colData = colData,
design = design_formula)
normFactors <- exp(-1 * offst(set2))
normFactors <- normFactors / exp(rowMeans(log(normFactors)))
normalizationFactors(dds) <- normFactors
dds=DESeq(dds)
Thanks for your reply.
" normFactors <- exp(-1 * offst(set2)) normFactors <- normFactors / exp(rowMeans(log(normFactors))) normalizationFactors(dds) <- normFactors "
Because the offset (set2) is a matrix only with 0, the scripts above just equal to "normalizationFactors(dds) = 1".
Because I found that there are some W (obtained by RUVseq) correlated with the sizeFactors estimated by DEseq2, I ask the DEseq2 do not calculate sizeFactors by setting "normalizationFactors(dds) = 1" to avoiding adjusting the sizeFactor twice.
I moved your post from an "answer" to a "comment".
I don't know about
offst()
function, what is the scale or direction, or whether you are computing NF on raw counts (in which case it makes sense that you are dealing with SF twice).So, as I'm not sure about your above code, instead I would recommend the one I linked to, which we recommend for use of RUV factor in DESeq2.