Differential variability modelling with vst() expression
2
0
Entering edit mode
jtsy6383 • 0
@58973972
Last seen 10 hours ago
Germany

Hi!

On top of a standard DE test with limma-voom, I want to carry out a differential variability analysis between two conditions (diets 1 and 2), controlling for categorical batches (e.g., sequencing days) and surrogate variables explaining non-condition/batch structure in the data. The goal is to identify differences in variability that may not purely be a consequence of differences in mean (e.g., control of transcription/degradation rather than increased transcription). My dataset has >100 biological replicates per condition so I got good power to detect them if they exist.

My main questions are if the vst()-based Levene's test approach below is correctly coded and/or sensible?

    counts # count matrix
    all_covariates # dataframe with conditions (1 or 2), seqday (1 or 2 or 3) and SVs (continuous)
    conditions<- as.factor(all_covariates$conditions)
    seqday <- as.factor(all_covariates $seqday)
    sv <- as.numeric(all_covariates $sv)

    data <- DESeqDataSetFromMatrix(countData = counts,
                                   colData = all_covariates,
                                   design =  ~conditions) 
    vst.data = vst(data, blind=F) 

    contrasts(seqday) <- contr.sum(levels(seqday))
    batches <- model.matrix(~seqday+sv)
    vst.counts.bc  <- limma::removeBatchEffect(vst.counts, covariates = batches[,-1],  design = ~conditions) 

    pvalues <- sapply(1:nrow(vst.counts.bc), function(i){
      data <- cbind.data.frame(gene = as.numeric(t(vst.counts.bc[i,])), conditions)
      result <- leveneTest(gene~conditions, data, center='median')
      p <- result$`Pr(>F)`[1]
      return(p)
    })
    bh <- p.adjust(pvalues, method = "BH")

For some extra context and correct me if I'm wrong - my reading of the source code is that the vst() corrects for library sizes and a mean-variance relationship that varies from gene to gene per condition, in particular addressing the dispproportionately high variances of genes that are either super-low and super-high in mean expression. Furthermore, the median-centred Levene's test should be robust to deviations from normality. Hence, the above approach feels sound. Yet, I haven't seen a paper go for this. Instead, they use packages to fit negative binomials models on TMM+CPM data that model changes in variability as an overdispersion parameter obeying a quadratic mean-variance relationship (e.g., https://journals.physiology.org/doi/full/10.1152/physiolgenomics.00128.2018?), in spite of some papers reporting higher-order polynomial mean-variance relationships in RNAseq data (https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-020-00842-z).

limma DESeq2 • 60 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 9 hours ago
United States

Looking at differential variance across groups, where you regress out orthogonal nuisance factors from VST data, seems correct and would point you to the types of gene you are looking for.

However, you could also consider fitting dispersion in a NB GLM, and then comparing dispersion. You could test constant dispersion vs per-group with a likelihood ratio test.

ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 7 hours ago
WEHI, Melbourne, Australia

See

Phipson, B., Oshlack, A. DiffVar: a new method for detecting differential variability with application to methylation in cancer and aging. Genome Biol 15, 465 (2014). https://doi.org/10.1186/s13059-014-0465-4

which uses Levene's test.

ADD COMMENT

Login before adding your answer.

Traffic: 523 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6