Question

RUVSEq and VSD integration

0

Entering edit mode

Mattia ▴ 10

@mattia-9769

Last seen 4.8 years ago

Milano

Hi,

I would create a matrix with normalized and "batch-free" values for downstream analysis (not Differential Expression Analysis but Machine Learning classification analysis). I would also take into account potential "unwanted variation" in counts data.

My idea is to combine functions from DESeq2 ('vsd' in particular) and RUVSeq packages.

Let's say I obtained 4 continuous factors of unwanted variation (W_1, W_2, W_3, W_4) from RUVg function implemented in RUVSeq. I was thinking to follow one of this 2 approaches:

Appr. 1):

dds <- DESeqDataSetFromMatrix(countData = rowCountsOfExpressedGenes, colData = colData, design = ~ conditions)
dds <- DESeq(dds) # to estimated all DESeq parameters
vsd <- varianceStabilizingTransformation(dds)
covar <- cbind(set@phenoData@data$W_1, set@phenoData@dataW_2, setk@phenoData@dataW_3, setk@phenoData@dataW_4)
vsd_nobatch <-removeBatchEffect(assay(vsd), design = model.matrix(~conditions), covariates = covar)

This is similar to A: How do I extract read counts from DESeq2

OR

Appr. 2)

dds <- DESeqDataSetFromMatrix(countData = rowCountsOfExpressedGenes, colData = colData , design = ~ W_1 + W_2 + W_3 + W_4 + conditions)
dds <- DESeq(dds)
vsd_nobatch <- varianceStabilizingTransformation(dds , blind = FALSE)

My questions are:

1) Can I integrate factors of unwanted variation obtained with RUVSeq (Singular Value Decomposition technique) with DESeq normalization?

2) If yes, should I use Appr 1 or Appr 2 or another approach?

Thanks in advance,

Mattia.

deseq2 ruvseq covariates continuous vsd • 1.9k views

ADD COMMENT • link updated 8.1 years ago by Michael Love 41k • written 8.1 years ago by Mattia ▴ 10

score 1 · Answer 1 · 2016-03-25

varianceStabilizingTransformation() with blind=FALSE doesn't use the design to remove mean shifts associated with covariates. It's a bit hard to explain (I try to explain in more depth in the vignette section on transformations), but it only uses the design to estimate dispersions, and then uses the global trend of dispersion for formulating the transformation, which is then applied to size-factor-adjusted counts. But this is nothing like removeBatchEffect(), which is removing mean shifts associated with the covariates.

So I would recommend #1 if you want to remove effects associated with W1-W4 from variance stabilized data for downstream machine learning / classification tasks.