Question: Is the VST implemented in DESeq useful on time-series cell differentiation datasets?
gravatar for jiab
11 months ago by
jiab0 wrote:

The VSN proposed by [1], which generalises a VST method proposed in [2], is notorious for assuming that the various libraries in the dataset can be treated as technical replicates; i.e, most genes are not differentially expressed between conditions. When analysing a cell differentiation time-course, this is hardly an acceptable assumption.

With regards to the VST method implemented in DESeq, I am trying to figure out if it relies on the same assumption as above. While it is well established that there are undesirable artefacts introduced when the sequencing depths of each library are wildly different, I was not able to find a literature reference with regards to the type of limitation described above. However, a forum post by Huber appears to suggest that the limitation does indeed exist [3]. 

Could someone please confirm or deny whether the VST method implemented in DESeq assumes that the samples can be treated as technical replicates? 

Thank you.

[1] Huber et al, 2002

[2] Durbin et. al, 2002


ADD COMMENTlink modified 11 months ago by Wolfgang Huber13k • written 11 months ago by jiab0
gravatar for Michael Love
11 months ago by
Michael Love16k
United States
Michael Love16k wrote:


See the section of the DESeq2 vignette talking about transformation and blind dispersion estimation. I would recommend you transform with blind=FALSE, which means, the dispersion is estimated using the experimental design, and then the global trend of dispersion over mean is used to calculate the VST.

(I don't agree with your use of the term "technical replicates" here, I would call these "biological replicates" when you have multiple samples in the same condition. Technical replicates are generally referring to the same cDNA library sequenced multiple times.)

ADD COMMENTlink written 11 months ago by Michael Love16k
gravatar for Wolfgang Huber
11 months ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:

In addition to Mike's answer, let me add, as a more general comment, that it is always helpful to distinguish between the route taken and the destination, i.e., between the assumptions made to come up with an algorithm for finding a data transformation, and the usefulness of the transformation that it produces for a dataset and scientific question at hand.

In other words: someone could write a paper saying that the logarithm is the appropriate VST for data with a constant coefficient of variation under the assumption that all samples are replicates of each other. But this does not mean that you are now only allowed to use the logarithm on data that are confirmed to follow these assumptions. You can still use the logarithm for other data, as long as it "makes sense" - a criterion that is of course subjective and requires some experience and expertise.


ADD COMMENTlink modified 11 months ago • written 11 months ago by Wolfgang Huber13k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 358 users visited in the last hour