Is the only difference between vst
and varianceStabilizingTransform
the first step of estimation, in which, vst
uses a subset of genes while varianceStabilizingTransform
uses all info? Therefore, vst
is a faster method than varianceStabilizingTransform
.
When the number of genes/ features is low and using the default vst
parameters gives an error message of
"less than 'nsub' rows with mean normalized count > 5"
I wonder whether executing the following will essentially equalize vst
and varianceStabilizingTransform
:
# vsd <- varianceStabilizingTransform(dds) # is this the same as the following?
nsub_input <- sum( rowMeans( counts(dds, normalized=TRUE)) > 5 )
vsd <- vst(dds, blind=TRUE, nsub = min(nsub_input, 1000)) ## I found that blind = TRUE is the default of varianceStabilizingTransform
I am curious whether using varianceStabilizingTransform
preserves more information from the data (i.e. that is more "accurate") in the case of there are more than 1000 genes for subsetting in vst
while I can tolerate a slower calculation (as long as it can be finished in a reasonable time). I know vst
is robust, but would like to know what would be the best practice. I also read the discussion here, but would like to confirm my understanding is correct. Thanks!
Kevin Thanks for the answer. May I know do you have any suggestions regarding the second part of the question: which one is the preferred method in case I do not care if the calculation takes a little bit longer? Indeed, I often found the time difference is subtle.