Question

Mean-variance trend and identifying significant DE with large amount of biological change in expression in time-series data

0

Entering edit mode

hmviit • 0

@hmviit-12473

Last seen 2.9 years ago

Finland

Hi all,

I'm new to limma and voom and have been trying to wrap my head around the pipeline with my RNA-Seq data. My question does not concern code as such but more that I have understood correctly how the voom function works. And therefore I would like to ask if my logic below is correct. I have time-dependent data which is individual samples collected at 4 different developmental stages and across 3 different genotypic backgrounds. I have biologcal 3 replicates per each genotype-stage group. Biologically there is alot of expression changes going on the tissue across time for a single gene and I'm interested in the differential expression between stages due to genotype.

When applying the voom on my data (after filtering for low counts and TMM normalisation) I have given my desing matrix so the per gene variance which is plotted in the voom:mean-variance plot represents the residual variance in the gene across my whole data. Then the residual variance is plotted against the gene mean for all the gene in my data. Then to get the weighing values each observation of a gene is transformed to a predicted count value (calculated based on the lib size) which is then used to get the standard deviation for the observation using the mean-variance trend line. So does the observation in this case correspond to the TMM-normalised logCPM value of given gene in my samples?

Similar to the voom plots in Figure 1E in Law et al 2014 (the Drosophila development RNA-Seq data) I have a large cloud and large variance for the small mean log2 but also large clustering of low variance around log2 of 10. Genes of samples which have a high standard deviation and are far from the trend line will have a large weighing factor, correct? Will there be any bias for the estimation of significance due to a genes logCPM being "far" from the trend line?

All the best, Heidi Viitaniemi

My voom plot

> GroupT
 [1] st1AA st1AA st1AA st1AB st1AB st1BB st1BB st2AA st2AA st2AA st2AB st2AB st2AB st2BB st2BB st3AA st3AA st3AA st3AB st3AB st3BB st3BB st3BB st4AA st4AA st4AA
[27] st4AB st4AB st4AB st4BB st4BB st4BB
Levels: st1AA st1AB st1BB st2AA st2AB st2BB st3AA st3AB st3BB st4AA st4AB st4BB

>designT <- model.matrix(~0+GroupT)
>colnames(designT) <- levels(GroupT)
>designT
        st1AA st1AB st1BB st2AA st2AB st2BB st3AA st3AB st3BB st4AA st4AB st4BB
SKO186T     1     0     0     0     0     0     0     0     0     0     0     0
SKO187T     1     0     0     0     0     0     0     0     0     0     0     0
SKO201T     1     0     0     0     0     0     0     0     0     0     0     0
SKO223T     0     1     0     0     0     0     0     0     0     0     0     0
.

>v_data <- voom(dge3,designT,plot=TRUE)

voom • 2.4k views

ADD COMMENT • link 3.0 years ago • updated 2.9 years ago hmviit • 0

score 0 · Answer 1 · 2021-11-11

The voom method is fully described in the Law et al (2014) paper and that paper is still the best reference for understanding the methodology.

Genes of samples which have a high standard deviation and are far from the trend line will have a large weighing factor, correct?

No, voom computes observation weights rather than gene weights and distance from the trend line in the voom plot does not affect the weights.

Will there be any bias for the estimation of significance due to a genes logCPM being "far" from the trend line

No there is no bias. The limma empirical Bayes algorithm adapts to however close or how far the genewise standard deviations are from the trend line. The trend line is not used as the genewise standard deviation.

You are perhaps assuming that voom somehow weights genes, or that the voom trend line determines gene variability, but neither of these things is true. The effect of the precision weights is to give relative weights to observations within a gene and to incorporate the mean-variance relationship. The residual standard deviation of each gene is computed later in the limma pipeline and is not preset by voom.