Hi, I cannot find any tags specifically relevant to VST or scaling so adding them manually but also tagging DESeq2 in case there is a way to do this internally.
I need to squish VST counts to within a 0 and 1 range. This cannot be done with features (genes) wise as even low counts across samples, e.g in a range of 3-3.5.. will gain high scaled counts based on the range of values in the rows... so even though these genes are lowly expressed, they will contain values such as 0.6, 0.7 etc...
It makes more sense to to do the scaling samples-wise across all genes, however, as the scale different between different samples - i.e, 3-16 , 2-14 etc... a sample that has a minimum value of 3 might have real counts of 7 for example, but be given a value of 0 for scaling, while the same may be true for another sample with a minimum of 2, but may have real counts of 2... Clearly the 2 samples are not equivalent...
It wouldnt make a difference if a vst count of 2 in one sample was equivelant to 3 in another - unless I am missing something aabout how VST works.
Is there a way of generating VST counts in DESeq2 to within a given range across all sample to overcome this problem?
thank you.
To feed as input to a neural network... 0 should represent the most lowly expressed gene and other genes with that same expression value and 1 vice-versa, and everything in between.....
I should have prefaced my message by saying that it still works well... But I would like to avoid the discrepancy , even though small - if possible...
I honestly don't have any particular feedback on the best way to scale VST as input into neural network. The values are already comparable across samples, so you can remove the row mean and then scale as needed. You may consider filtering out genes with low variance before you feed into the downstream application, as scaling the rows (genes) to force them all to the same range will just inflate noise of low count features.
ok thank you for your feedback!
I agree scaling gene values is not wise, I think I will proceed as is and accept the small variation in values.
May I ask a follow up question, I haven't been able to find a clear answer to this:
Does the variance stabilisation maintain non-linear dependencies between genes within samples? There is no reason for this effect to be removed with the VST transformation right?
thank you.
I don't know what you mean by nonlinear dependencies between genes. The VST is very close to log2 for higher count values, so you can just apply your question to the logarithm.
ok thank you,
I meant specifically, for example the expression of two genes can not be separated by a straight line... This would not be lost in the VST?
I still don't know what you mean, but I think you can answer the question yourself by just considering the logarithm.
A straight line (y=ax) in a log-log plot corresponds to a power function (Y=X^a) on the original scale.