Question

DESeq2 - VST data

0

Entering edit mode

AP • 0

@3a675fdc

Last seen 3.1 years ago

United States

I have been using DESeq2 package for RNA-sq data analysis and really like the VST data in log2 units. But was unsure about usage of VST data for certain analyses.

Specifically, can the VST data be used to calculate a gene signature score (average across all the genes in a given signature) with the aim of comparing signature scores with or without a given condition? I have generated batch-effect corrected VST data using DESeq2 and LIMMA.

I understand VST data doesn’t take into account gene length whereas TPM does and may not be used to compare expression across genes.

Thanks for any feedback !

DESeq2 VST • 2.2k views

ADD COMMENT • link 3.1 years ago AP • 0

score 0 · Answer 1 · 2021-11-18

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 days ago

United States

Yes the VST can be used for comparing samples, distances, centroids, ML, etc.

Why are you concerned about gene length?

ADD COMMENT • link 3.1 years ago Michael Love 43k

0

Entering edit mode

Thanks for your reply !

The concern with gene length was for following reason:

Does calculating a gene signature using expression values that hadn’t incorporated gene length into a normalization procedure may potentially influence the signature scores towards the influence of longer genes that would have had more counts (akin to comparing expression between different genes within a sample) ?

ADD REPLY • link 3.1 years ago AP • 0

1

Entering edit mode

Your precision in RNA-seq is inherently influenced by the count. The count is proportional to the length of the transcript and the expression level. But you can't undo this precision difference by dividing out the length. The best you can do is stabilize the variance, which ensures that the wrong transformation does not make the imprecise features overly contribute to the distance metric.