I would like to ask a very specific question about data transformation and
the appropriate comparison of groups of samples in gene expression data. In detail, based on
raw RNA-Seq data gene counts, I implemented the VST transformation from the DESeq2 R
package, for various clustering methodologies, and I ended up with a specific group of
genes, that show expression patterns that separate interestingly my samples into groups of
studied phenotype, based on heatmap plots.
My next goal is to perform some complementary pairwise boxplots of some specific pre-
defined cluster groups, based on a subset of these genes, in order to provide some extra
evidence of a significant difference in the relative expression of each gene in these groups.
Thus, my questions are the following:
1) Are VST transformed RNA-Seq counts appropriate for the creation of additional
relative boxplots? In order to compare the groups means for each selected gene?
Moreover, for adding p-values and significance levels, a simple test in this case, like
a t-test or an ANOVA test for more than 2 groups would be fine?
2) Or, VST transformed counts are not appropriate for comparing means, even for a
very small number of genes, and I should follow a different transformation?
For example, use my matrix object of counts:
xx <- estimateSizeFactorsForMatrix(counts=matrix.count)
and afterwards use the function:
xx2 <- normTransform(xx, f = log2, pc = 1) ?