I have been using
varianceStabilizingTransformation() for normailsation of gene expression data to allow use in deconvolution and clustering methods. However, having looked more closely I now see that splitting samples based on grouping (e.g. tissue type, or other phenotypic classification) and then conducting transformations creates a different dataset than when transformations are conducted on the entire set.
I have tried using the design matrix to include group information, and setting
blind=FALSE with VST, but I still get large variations between groups. This is likely because I am looking at tumour vs. normal tissues, and between disease types also.
My question is should I be splitting by group, disease prior to VST? I had initially done this, but am questioning the decision now.
By way of example (N.B. I
estimateDispersions() in real data):
counts <- t(data.frame("GENEX" = csample.int(2000, 10, replace = TRUE), sample.int(200, 10, replace = TRUE)), "GENEY" = csample.int(1000, 10, replace = TRUE), sample.int(100, 10, replace = TRUE)))) colnames(counts) <- c(paste0("T_",1:10), paste0("N_",1:10)) conds <- data.frame("sampleID" = c(paste0("T_",1:10), paste0("N_",1:10)), "Type" = c(rep("Tumour", 10), rep("Normal", 10))) dds <- DESeqDataSetFromMatrix(countData = counts, colData = conds, design =~ Type) vst.all <- assay(varianceStabilizingTransformation(dds[,1:20]))[,11:20] vst.norm <- assay(varianceStabilizingTransformation(dds[,11:20])) vst.all; vst.norm