Question

Variance stabilising transformation to within given range

0

Entering edit mode

A ▴ 40

@a-14337

Last seen 8 months ago

United Kingdom

Hi, I cannot find any tags specifically relevant to VST or scaling so adding them manually but also tagging DESeq2 in case there is a way to do this internally.

I need to squish VST counts to within a 0 and 1 range. This cannot be done with features (genes) wise as even low counts across samples, e.g in a range of 3-3.5.. will gain high scaled counts based on the range of values in the rows... so even though these genes are lowly expressed, they will contain values such as 0.6, 0.7 etc...

It makes more sense to to do the scaling samples-wise across all genes, however, as the scale different between different samples - i.e, 3-16 , 2-14 etc... a sample that has a minimum value of 3 might have real counts of 7 for example, but be given a value of 0 for scaling, while the same may be true for another sample with a minimum of 2, but may have real counts of 2... Clearly the 2 samples are not equivalent...

It wouldnt make a difference if a vst count of 2 in one sample was equivelant to 3 in another - unless I am missing something aabout how VST works.

Is there a way of generating VST counts in DESeq2 to within a given range across all sample to overcome this problem?

thank you.

DESeq2 scaling vst • 1.5k views

ADD COMMENT • link updated 3.4 years ago by Wolfgang Huber ★ 13k • written 3.5 years ago by A ▴ 40

score 1 · Answer 1 · 2021-02-05

1

Entering edit mode

Michael Love 42k

@mikelove

Last seen 8 hours ago

United States

For what purpose? Why do you need log expression to be between 0 and 1.

Then, what should 0 represent and what should 1 represent?

The VST values are already directly comparable across sample -- they are corrected for sequencing depth and other technical biases.

ADD COMMENT • link 3.5 years ago Michael Love 42k

0

Entering edit mode

To feed as input to a neural network... 0 should represent the most lowly expressed gene and other genes with that same expression value and 1 vice-versa, and everything in between.....

I should have prefaced my message by saying that it still works well... But I would like to avoid the discrepancy , even though small - if possible...

ADD REPLY • link 3.5 years ago A ▴ 40

0

Entering edit mode

I honestly don't have any particular feedback on the best way to scale VST as input into neural network. The values are already comparable across samples, so you can remove the row mean and then scale as needed. You may consider filtering out genes with low variance before you feed into the downstream application, as scaling the rows (genes) to force them all to the same range will just inflate noise of low count features.

ADD REPLY • link 3.5 years ago Michael Love 42k

0

Entering edit mode

ok thank you for your feedback!

I agree scaling gene values is not wise, I think I will proceed as is and accept the small variation in values.

ADD REPLY • link 3.5 years ago A ▴ 40

0

Entering edit mode

May I ask a follow up question, I haven't been able to find a clear answer to this:

Does the variance stabilisation maintain non-linear dependencies between genes within samples? There is no reason for this effect to be removed with the VST transformation right?

thank you.

ADD REPLY • link 3.4 years ago A ▴ 40

1

Entering edit mode

I don't know what you mean by nonlinear dependencies between genes. The VST is very close to log2 for higher count values, so you can just apply your question to the logarithm.

ADD REPLY • link 3.4 years ago Michael Love 42k

0

Entering edit mode

ok thank you,

I meant specifically, for example the expression of two genes can not be separated by a straight line... This would not be lost in the VST?

ADD REPLY • link 3.4 years ago A ▴ 40

1

Entering edit mode

I still don't know what you mean, but I think you can answer the question yourself by just considering the logarithm.

ADD REPLY • link 3.4 years ago Michael Love 42k

1

Entering edit mode

A straight line (y=ax) in a log-log plot corresponds to a power function (Y=X^a) on the original scale.

ADD REPLY • link 3.4 years ago Wolfgang Huber ★ 13k