varianceStabilizingTransformation for clustering, deconvolution
1
0
Entering edit mode
bruce.moran ▴ 30
@brucemoran-8388
Last seen 3.1 years ago
Ireland

Hi,

I have been using varianceStabilizingTransformation() for normailsation of gene expression data to allow use in deconvolution and clustering methods. However, having looked more closely I now see that splitting samples based on grouping (e.g. tissue type, or other phenotypic classification) and then conducting transformations creates a different dataset than when transformations are conducted on the entire set.

I have tried using the design matrix to include group information, and setting blind=FALSE with VST, but I still get large variations between groups. This is likely because I am looking at tumour vs. normal tissues, and between disease types also.

My question is should I be splitting by group, disease prior to VST? I had initially done this, but am questioning the decision now.

By way of example (N.B. I estimateSizeFactors(), estimateDispersions() in real data):

counts <- t(data.frame("GENEX" = csample.int(2000, 10, replace = TRUE),
                             sample.int(200, 10, replace = TRUE)), 
                   "GENEY" = csample.int(1000, 10, replace = TRUE), 
                             sample.int(100, 10, replace = TRUE))))
colnames(counts) <- c(paste0("T_",1:10), paste0("N_",1:10))
conds <- data.frame("sampleID" = c(paste0("T_",1:10), paste0("N_",1:10)),
                    "Type" = c(rep("Tumour", 10), rep("Normal", 10)))
dds <- DESeqDataSetFromMatrix(countData = counts,
                                colData = conds,
                                design =~ Type)
vst.all <- assay(varianceStabilizingTransformation(dds[,1:20]))[,11:20]
vst.norm <- assay(varianceStabilizingTransformation(dds[,11:20]))
vst.all; vst.norm
deseq2 deconvolution normalization limma • 1.5k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

The VST is a function, like f(x) = log(x+1), which is applied to the normalized counts. However it takes the global dispersion into account. This looks different If you know the design or not. The dispersion seems much higher if you do not allow for differences across groups. Does this answer your question?

ADD COMMENT
0
Entering edit mode

Yes, sorry I keep referring to VST as normalisation when it is a transformation. Your reply does address the central issue, of global dispersion and it's affect on the transformation.

What I really want to get is an opinion as to which is 'more appropriate':

  1. create a single dds object which contains all samples with blind=FALSE; theoretically more samples gives a better estimation of dispersion, and as you say design can be used to account for group dispersion.

  2. subset on groups (e.g. normal, disease types) and create multiple dds objects; there is then absolutely no influence on dispersion from other groups which are essentially different.

N.B. this is specifically for deconvolution and clustering analysis, for which I believe VST to be appropriate.

Appreciate any thoughts on this.

ADD REPLY
0
Entering edit mode

I recommend (1).

(2) is actually more problematic if you are concerned about influence, because you are applying f(x) to some samples and g(x) to other samples.

ADD REPLY
0
Entering edit mode

Great, didn't think of it that way, thanks.

ADD REPLY

Login before adding your answer.

Traffic: 622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6