Question

Differential variability modelling with vst() expression

0

Entering edit mode

jtsy6383 • 0

@58973972

Last seen 1 hour ago

Germany

Hi!

On top of a standard DE test with limma-voom, I want to carry out a differential variability analysis between two conditions (diets 1 and 2), controlling for categorical batches (e.g., sequencing days) and surrogate variables explaining non-condition/batch structure in the data. The goal is to identify differences in variability that may not purely be a consequence of differences in mean (e.g., control of transcription/degradation rather than increased transcription). My dataset has >100 biological replicates per condition so I got good power to detect them if they exist.

My main questions are if the vst()-based Levene's test approach below is correctly coded and/or sensible?

    counts # count matrix
    all_covariates # dataframe with conditions (1 or 2), seqday (1 or 2 or 3) and SVs (continuous)
    conditions<- as.factor(all_covariates$conditions)
    seqday <- as.factor(all_covariates $seqday)
    sv <- as.numeric(all_covariates $sv)

    data <- DESeqDataSetFromMatrix(countData = counts,
                                   colData = all_covariates,
                                   design =  ~conditions) 
    vst.data = vst(data, blind=F) 

    contrasts(seqday) <- contr.sum(levels(seqday))
    batches <- model.matrix(~seqday+sv)
    vst.counts.bc  <- limma::removeBatchEffect(vst.counts, covariates = batches[,-1],  design = ~conditions) 

    pvalues <- sapply(1:nrow(vst.counts.bc), function(i){
      data <- cbind.data.frame(gene = as.numeric(t(vst.counts.bc[i,])), conditions)
      result <- leveneTest(gene~conditions, data, center='median')
      p <- result$`Pr(>F)`[1]
      return(p)
    })
    bh <- p.adjust(pvalues, method = "BH")

For some extra context and correct me if I'm wrong - my reading of the source code is that the vst() corrects for library sizes and a mean-variance relationship that varies from gene to gene per condition, in particular addressing the dispproportionately high variances of genes that are either super-low and super-high in mean expression. Furthermore, the median-centred Levene's test should be robust to deviations from normality. Hence, the above approach feels sound. Yet, I haven't seen a paper go for this. Instead, they use packages to fit negative binomials models on TMM+CPM data that model changes in variability as an overdispersion parameter obeying a quadratic mean-variance relationship (e.g., https://journals.physiology.org/doi/full/10.1152/physiolgenomics.00128.2018?), in spite of some papers reporting higher-order polynomial mean-variance relationships in RNAseq data (https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-020-00842-z).

limma DESeq2 • 114 views

ADD COMMENT • link updated 3 hours ago by Michael Love 43k • written 19 hours ago by jtsy6383 • 0

score 1 · Answer 1 · 2025-01-09

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 hours ago

United States

Looking at differential variance across groups, where you regress out orthogonal nuisance factors from VST data, seems correct and would point you to the types of gene you are looking for.

However, you could also consider fitting dispersion in a NB GLM, and then comparing dispersion. You could test constant dispersion vs per-group with a likelihood ratio test.

ADD COMMENT • link 19 hours ago Michael Love 43k

0

Entering edit mode

Thanks for the quick and reassuring response Mike! Some follow-ups if I may on the second point.

1) Any comment on the appropriate summary statistic to report increases and decreases in variability? I've been using the fold change in MAD estimated on condition-specific subsets of vst() data so far. I take it the NB method would be the same but for the dispersion?

2) I take it I'm not supposed to use the vst() data for the NB GLM but TMM-adjusted CPM?

3) As such, would the dispersions estimated in each method be different? And thus have an impact on the genes called DV and/or DE?

ADD REPLY • link 3 hours ago jtsy6383 • 0

0

Entering edit mode

See Gordon's reply below for an implementation you can use directly.

ADD REPLY • link 3 hours ago Michael Love 43k

score 1 · Answer 2 · 2025-01-09

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 17 hours ago

WEHI, Melbourne, Australia

See

Phipson, B., Oshlack, A. DiffVar: a new method for detecting differential variability with application to methylation in cancer and aging. Genome Biol 15, 465 (2014). https://doi.org/10.1186/s13059-014-0465-4

which uses Levene's test.

ADD COMMENT • link 17 hours ago Gordon Smyth 52k