Variance stabilising transformation to within given range
1
0
Entering edit mode
A ▴ 40
@a-14337
Last seen 5 months ago
United Kingdom

Hi, I cannot find any tags specifically relevant to VST or scaling so adding them manually but also tagging DESeq2 in case there is a way to do this internally.

I need to squish VST counts to within a 0 and 1 range. This cannot be done with features (genes) wise as even low counts across samples, e.g in a range of 3-3.5.. will gain high scaled counts based on the range of values in the rows... so even though these genes are lowly expressed, they will contain values such as 0.6, 0.7 etc...

It makes more sense to to do the scaling samples-wise across all genes, however, as the scale different between different samples - i.e, 3-16 , 2-14 etc... a sample that has a minimum value of 3 might have real counts of 7 for example, but be given a value of 0 for scaling, while the same may be true for another sample with a minimum of 2, but may have real counts of 2... Clearly the 2 samples are not equivalent...

It wouldnt make a difference if a vst count of 2 in one sample was equivelant to 3 in another - unless I am missing something aabout how VST works.

Is there a way of generating VST counts in DESeq2 to within a given range across all sample to overcome this problem?

thank you.

DESeq2 scaling vst • 1.4k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

For what purpose? Why do you need log expression to be between 0 and 1.

Then, what should 0 represent and what should 1 represent?

The VST values are already directly comparable across sample -- they are corrected for sequencing depth and other technical biases.

ADD COMMENT
0
Entering edit mode

To feed as input to a neural network... 0 should represent the most lowly expressed gene and other genes with that same expression value and 1 vice-versa, and everything in between.....

I should have prefaced my message by saying that it still works well... But I would like to avoid the discrepancy , even though small - if possible...

ADD REPLY
0
Entering edit mode

I honestly don't have any particular feedback on the best way to scale VST as input into neural network. The values are already comparable across samples, so you can remove the row mean and then scale as needed. You may consider filtering out genes with low variance before you feed into the downstream application, as scaling the rows (genes) to force them all to the same range will just inflate noise of low count features.

ADD REPLY
0
Entering edit mode

ok thank you for your feedback!

I agree scaling gene values is not wise, I think I will proceed as is and accept the small variation in values.

ADD REPLY
0
Entering edit mode

May I ask a follow up question, I haven't been able to find a clear answer to this:

Does the variance stabilisation maintain non-linear dependencies between genes within samples? There is no reason for this effect to be removed with the VST transformation right?

thank you.

ADD REPLY
1
Entering edit mode

I don't know what you mean by nonlinear dependencies between genes. The VST is very close to log2 for higher count values, so you can just apply your question to the logarithm.

ADD REPLY
0
Entering edit mode

ok thank you,

I meant specifically, for example the expression of two genes can not be separated by a straight line... This would not be lost in the VST?

ADD REPLY
1
Entering edit mode

I still don't know what you mean, but I think you can answer the question yourself by just considering the logarithm.

ADD REPLY
1
Entering edit mode

A straight line (y=ax) in a log-log plot corresponds to a power function (Y=X^a) on the original scale.

ADD REPLY

Login before adding your answer.

Traffic: 935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6