Question

varianceStabilizingTransformation for clustering, deconvolution

0

Entering edit mode

bruce.moran ▴ 30

@brucemoran-8388

Last seen 3.1 years ago

Ireland

Hi,

I have been using varianceStabilizingTransformation() for normailsation of gene expression data to allow use in deconvolution and clustering methods. However, having looked more closely I now see that splitting samples based on grouping (e.g. tissue type, or other phenotypic classification) and then conducting transformations creates a different dataset than when transformations are conducted on the entire set.

I have tried using the design matrix to include group information, and setting blind=FALSE with VST, but I still get large variations between groups. This is likely because I am looking at tumour vs. normal tissues, and between disease types also.

My question is should I be splitting by group, disease prior to VST? I had initially done this, but am questioning the decision now.

By way of example (N.B. I estimateSizeFactors(), estimateDispersions() in real data):

counts <- t(data.frame("GENEX" = csample.int(2000, 10, replace = TRUE),
                             sample.int(200, 10, replace = TRUE)), 
                   "GENEY" = csample.int(1000, 10, replace = TRUE), 
                             sample.int(100, 10, replace = TRUE))))
colnames(counts) <- c(paste0("T_",1:10), paste0("N_",1:10))
conds <- data.frame("sampleID" = c(paste0("T_",1:10), paste0("N_",1:10)),
                    "Type" = c(rep("Tumour", 10), rep("Normal", 10)))
dds <- DESeqDataSetFromMatrix(countData = counts,
                                colData = conds,
                                design =~ Type)
vst.all <- assay(varianceStabilizingTransformation(dds[,1:20]))[,11:20]
vst.norm <- assay(varianceStabilizingTransformation(dds[,11:20]))
vst.all; vst.norm

deseq2 deconvolution normalization limma • 1.5k views

ADD COMMENT • link updated 5.8 years ago by Michael Love 42k • written 5.8 years ago by bruce.moran ▴ 30

score 0 · Answer 1 · 2019-01-28

0

Entering edit mode

Michael Love 42k

@mikelove

Last seen 1 day ago

United States

The VST is a function, like f(x) = log(x+1), which is applied to the normalized counts. However it takes the global dispersion into account. This looks different If you know the design or not. The dispersion seems much higher if you do not allow for differences across groups. Does this answer your question?

ADD COMMENT • link 5.8 years ago Michael Love 42k

0

Entering edit mode

Yes, sorry I keep referring to VST as normalisation when it is a transformation. Your reply does address the central issue, of global dispersion and it's affect on the transformation.

What I really want to get is an opinion as to which is 'more appropriate':

create a single dds object which contains all samples with blind=FALSE; theoretically more samples gives a better estimation of dispersion, and as you say design can be used to account for group dispersion.
subset on groups (e.g. normal, disease types) and create multiple dds objects; there is then absolutely no influence on dispersion from other groups which are essentially different.

N.B. this is specifically for deconvolution and clustering analysis, for which I believe VST to be appropriate.

Appreciate any thoughts on this.

ADD REPLY • link 5.8 years ago bruce.moran ▴ 30

0

Entering edit mode

I recommend (1).

(2) is actually more problematic if you are concerned about influence, because you are applying f(x) to some samples and g(x) to other samples.