Hi,
I would create a matrix with normalized and "batch-free" values for downstream analysis (not Differential Expression Analysis but Machine Learning classification analysis). I would also take into account potential "unwanted variation" in counts data.
My idea is to combine functions from DESeq2 ('vsd' in particular) and RUVSeq packages.
Let's say I obtained 4 continuous factors of unwanted variation (W_1, W_2, W_3, W_4) from RUVg function implemented in RUVSeq. I was thinking to follow one of this 2 approaches:
Appr. 1):
dds <- DESeqDataSetFromMatrix(countData = rowCountsOfExpressedGenes, colData = colData, design = ~ conditions) dds <- DESeq(dds) # to estimated all DESeq parameters vsd <- varianceStabilizingTransformation(dds) covar <- cbind(set@phenoData@data$W_1, set@phenoData@dataW_2, setk@phenoData@dataW_3, setk@phenoData@dataW_4) vsd_nobatch <-removeBatchEffect(assay(vsd), design = model.matrix(~conditions), covariates = covar)
This is similar to A: How do I extract read counts from DESeq2
OR
Appr. 2)
dds <- DESeqDataSetFromMatrix(countData = rowCountsOfExpressedGenes, colData = colData , design = ~ W_1 + W_2 + W_3 + W_4 + conditions) dds <- DESeq(dds) vsd_nobatch <- varianceStabilizingTransformation(dds , blind = FALSE)
My questions are:
1) Can I integrate factors of unwanted variation obtained with RUVSeq (Singular Value Decomposition technique) with DESeq normalization?
2) If yes, should I use Appr 1 or Appr 2 or another approach?
Thanks in advance,
Mattia.
Thanks a lot Michael for your quick and very usefull reply.
Mattia.