Question

VST transformation (DESeq2) for metabolomics data

2

Entering edit mode

Joan Miró ▴ 80

@015b94b8

Last seen 3.1 years ago

Spain

Hello Michael Love , I am dealing with RNAseq, 16s, proteomics and metabolomics datasets and I was wondering if it's correct to apply vst() / varianceStabilizingTransformation function to a small metabolomics dataset (30 variables). I found a similar question on a post in biostats, suggesting to post it here and ask your opinion, but I am not able to find that post/question. I am integrating all the datasets with MOFA+ and I was wondering if the vst() function is applicable to metabolomics (considered as count) data as well. Thanks

vst Metabolomics DESeq2 • 4.0k views

ADD COMMENT • link 3.6 years ago Joan Miró ▴ 80

score 2 · Answer 1 · 2022-04-21

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 14 days ago

United States

I think you could apply vst even with 30 features, seems reasonable. If you want to do:

dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
plotDispEsts(dds)
vsd <- varianceStabilizingTransformation(dds, blind=FALSE)

You can post the plot to get a sense for the fit.

ADD COMMENT • link 3.6 years ago Michael Love 43k

1

Entering edit mode

Thanks Michael Love here the plotDispEsts(dds) plot, what I am not sure is how to interpret that plot, is that a lot of dispersion or not much? The shrinkage is not massive for some and a bit more for other metabolites. When I should be concern about the dispersion and the shrinkage? And how can I solve it? The blue dots are the values I am gonna get after the varianceStabilizingTransformation?

metabolites dispersion

ADD REPLY • link 3.6 years ago Joan Miró ▴ 80

2

Entering edit mode

This looks fine to me, and I think varianceStabilizingTransformation will be useful here.

What you are seeing here is not the shrinkage from VST to the data. All that happens here is the red line is used to understand the variance/mean dependence, and that is used to calculate the formula for the VST function. The VST is similar to log2(x) but it avoids inflating variance of data when x -> 0.

ADD REPLY • link 3.6 years ago Michael Love 43k

1

Entering edit mode

Thank you so much! Last question, to get vst transformed RNAseq data to plug into MOFA+, I have to do the same steps as above (estimateDispersions and varianceStabilizingTransformation) or vst() already does this steps internally?

ADD REPLY • link 3.6 years ago Joan Miró ▴ 80

1

Entering edit mode

to get the data, use the steps above, then assay(vsd).

vst is a fast version of VST, but you don't (can't) apply this here because you only have a few features.

ADD REPLY • link 3.6 years ago Michael Love 43k

1

Entering edit mode

Yes I understand that is not for my few variables in the metabolomics dataset, I do what you mentioned in the first replay. But for de RNAseq yes, as you refers as steps above...basically is the DESeq () function which does the estimation and fitting, and extract it with assay() after running vst() Many Thanks Michael Love

ADD REPLY • link 3.6 years ago Joan Miró ▴ 80

1

Entering edit mode

Oh sorry I missed "RNA-seq", for that type of data you can just do this:

vsd <- vst(dds, blind=FALSE)

ADD REPLY • link 3.6 years ago Michael Love 43k

1

Entering edit mode

This is another dataset which I transformed with the varianceStabilizingTransformation() function. Shall I need to be concerned about the big difference between the gene-est and the final fit of one of the metabolites (left down corner)? Because when I do a PCA with the transformed data, PC1 explains more than 70% but do not separate groups accordingly to biology. When doing the PCA, before the transformation, I do not see anything strange and PC1 explains above 30% (more reasonable) and separate biological groups. Can be that the lower value is introducing a bias? How can I be sure that the PCA with the transformed data is correct and not biased? Thanks

enter image description here

ADD REPLY • link 3.6 years ago Joan Miró ▴ 80

1

Entering edit mode

That bottom point is not an issue for the transformation and doesn’t affect the amount of variance explained by PC1. What you’re seeing is more that the features with high counts are associated with condition.

ADD REPLY • link 3.6 years ago Michael Love 43k

1

Entering edit mode

Could also be because of the scaling of the PCA? Because I thought that after vst() transformation, the data was already kind of scaled (on the log range)? Unless, that after vst() some variables still have big scale differences, if the initial scales differences were very big.

ADD REPLY • link 3.6 years ago Joan Miró ▴ 80

2

Entering edit mode

The data is scaled (this deals with differences in sequencing depth) and transformed (the vst is approximately log2 for large counts). These two address different aspects of the data.

There isn't a correct answer really -- on the variance stabilized scale PC1 represents something different than PC1 of the counts. PC1 of the counts prioritizes the features with the highest counts. You're essentially finding that the few top features separate the samples by your condition of interest.