Question

Anbiguous instruction on how to set the blind value in DESeq2's vst function

0

Entering edit mode

qlin • 0

@qlin-21262

Last seen 4.8 years ago

UCONN

Hi everyone, I'm a new user of DESeq2. I'm learning DESeq2 using this webpage.

In this DESeq2 RNAseq workflow, it says "we specified blind = FALSE, which means that differences between cell lines and treatment (the variables in the design) will not contribute to the expected variance-mean trend of the experiment. The experimental design is not used directly in the transformation, only in estimating the global amount of variability in the counts. For a fully unsupervised transformation, one can set blind = TRUE (which is the default).".

But in the instruction on R help, the instruction for blind is "logical, whether to blind the transformation to the experimental design. blind=TRUE should be used for comparing samples in an manner unbiased by prior information on samples, for example to perform sample QA (quality assurance). blind=FALSE should be used for transforming data for downstream analysis, where the full use of the design information should be made. blind=FALSE will skip re-estimation of the dispersion trend, if this has already been calculated. If many of genes have large differences in counts due to the experimental design, it is important to set blind=FALSE for downstream analysis."

They look opposite to me. For PCA plots of samples with different treatments, should I set blind to T or F?

Many thanks!

deseq2 • 2.4k views

ADD COMMENT • link updated 4.8 years ago by Michael Love 41k • written 4.8 years ago by qlin • 0

0

Entering edit mode

What do you find opposing? As PCA is typically the first step in quality control and detection of potential batch effects, one typically sets it to T to get a blinded/unbiased view on the data.

ADD REPLY • link 4.8 years ago ATpoint ★ 4.0k

score 0 · Answer 1 · 2019-07-08

Both paragraphs recommend blind=FALSE basically.

An informal explanation: the VST needs to know the dispersion (think variance), so it can calculate the dispersion ~ mean trend. Should it use the per-group variance or the variance over all samples? The former requires knowing the sample groups (blind=FALSE). Once the per-gene dispersion is estimated, and the global trend of disperion ~ mean is calculated, the sample groups are not used in the following transformation.