Hello Michael Love , I am dealing with RNAseq, 16s, proteomics and metabolomics datasets and I was wondering if it's correct to apply vst()
/ varianceStabilizingTransformation
function to a small metabolomics dataset (30 variables). I found a similar question on a post in biostats, suggesting to post it here and ask your opinion, but I am not able to find that post/question. I am integrating all the datasets with MOFA+ and I was wondering if the vst()
function is applicable to metabolomics (considered as count) data as well.
Thanks
Thanks Michael Love here the
plotDispEsts(dds)
plot, what I am not sure is how to interpret that plot, is that a lot of dispersion or not much? The shrinkage is not massive for some and a bit more for other metabolites. When I should be concern about the dispersion and the shrinkage? And how can I solve it? The blue dots are the values I am gonna get after thevarianceStabilizingTransformation
?This looks fine to me, and I think varianceStabilizingTransformation will be useful here.
What you are seeing here is not the shrinkage from VST to the data. All that happens here is the red line is used to understand the variance/mean dependence, and that is used to calculate the formula for the VST function. The VST is similar to log2(x) but it avoids inflating variance of data when x -> 0.
Thank you so much! Last question, to get vst transformed RNAseq data to plug into MOFA+, I have to do the same steps as above (
estimateDispersions
andvarianceStabilizingTransformation
) orvst()
already does this steps internally?to get the data, use the steps above, then
assay(vsd)
.vst
is a fast version of VST, but you don't (can't) apply this here because you only have a few features.Yes I understand that is not for my few variables in the metabolomics dataset, I do what you mentioned in the first replay. But for de RNAseq yes, as you refers as steps above...basically is the
DESeq ()
function which does the estimation and fitting, and extract it withassay()
after running vst() Many Thanks Michael LoveOh sorry I missed "RNA-seq", for that type of data you can just do this:
This is another dataset which I transformed with the
varianceStabilizingTransformation()
function. Shall I need to be concerned about the big difference between the gene-est and the final fit of one of the metabolites (left down corner)? Because when I do a PCA with the transformed data, PC1 explains more than 70% but do not separate groups accordingly to biology. When doing the PCA, before the transformation, I do not see anything strange and PC1 explains above 30% (more reasonable) and separate biological groups. Can be that the lower value is introducing a bias? How can I be sure that the PCA with the transformed data is correct and not biased? ThanksThat bottom point is not an issue for the transformation and doesn’t affect the amount of variance explained by PC1. What you’re seeing is more that the features with high counts are associated with condition.
Could also be because of the scaling of the PCA? Because I thought that after
vst()
transformation, the data was already kind of scaled (on the log range)? Unless, that aftervst()
some variables still have big scale differences, if the initial scales differences were very big.The data is scaled (this deals with differences in sequencing depth) and transformed (the vst is approximately log2 for large counts). These two address different aspects of the data.
There isn't a correct answer really -- on the variance stabilized scale PC1 represents something different than PC1 of the counts. PC1 of the counts prioritizes the features with the highest counts. You're essentially finding that the few top features separate the samples by your condition of interest.
Thanks Michael