Question: Better to use rlog or vst transformation for data quality assessment of differental expression data based on size factors?
0
17 months ago by
agdif0
agdif0 wrote:

Hello,

I finished running DESeq2 on my dataset that includes 6 timepoints and 3 biological replicates per timepoint. I would like to run a PCA analysis for quality assessment but am unsure which count transformation method I should use, vst or rlog.

The calculated the size factors for my dataset (below). Based on this, there does not appear to be a large variation in sequencing depth (dynamic range of size factors  ≳ 4, mentioned in Love, Huber, and Anders, 2014) all the samples. However, note that K013 does have a smaller size factor compared to other samples but does not exceed a factor of 4.

sizeFactors (dds)

X12HPA_J022   1.2875797
X12HPA_J024   1.7052146
X12HPA_J050   0.9460303
X1DPA_K001    1.1828260
X1DPA_K011    1.0955579
X1DPA_K121    0.7791666
X2DPA_K013    0.4708761
X2DPA_K015    1.0936920
X2DPA_K021    1.2141511
X3DPA_K012    0.7602281
X3DPA_K014    1.0525988
X3DPA_K023    0.8606639
X4DPA_K040    0.7807291
X4DPA_K080    1.0124977
X4DPA_K120    1.3053801
X5DPA_K010    0.9436090
X5DPA_K020    1.1898311
X5DPA_K030    1.1898311

In this case would it be better to use rlog and normalize for sequencing depth, or would use of vst be okay? Thanks!

modified 17 months ago by Michael Love26k • written 17 months ago by agdif0

If it helps, I don't think there is usually much difference between the two. I use VST because it is a lot faster... personally.

Answer: Better to use rlog or vst transformation for data quality assessment of differen
2
17 months ago by
Michael Love26k
United States
Michael Love26k wrote:

We saw a slight improvement of rlog in the DESeq2 paper simulation when the range of size factors was e.g. 10x fold from smallest to largest. And even still VST did pretty well. The range you show is not a problem.

I myself exclusively use vst() these days, for its speed and it is more robust to outliers when the sample size is very large. rlog() will give a warning if the sample size is >30 now.