Question

Differential Expression Genes as Independent Variables

0

Entering edit mode

abadgerw • 0

@5088ef59

Last seen 41 minutes ago

United Kingdom

I have a dataset where I am interested in looking at differential expression of genes in a singular body fluid and the relationship with a histochemical outcome that is a repeated measure. I would prefer not to collapse this repeated measure into one variable and use it for input into limma due to missing data which would make a composite variable biased.

Therefore, I was wondering whether there was a statistical issue with modeling the genes as independent variables rather than a dependent variable so that my repeated measure could serve as the outcome? That way I could then run a mixed model using dream in the variance partition package or use duplicate correlation in limma?

limma DifferentialExpression variancePartition dream • 36 views

ADD COMMENT • link 7 hours ago • updated 41 minutes ago abadgerw • 0

score 0 · Answer 1 · 2024-11-29

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 33 minutes ago

WEHI, Melbourne, Australia

Sorry, your question doesn't make sense to me. There are tens of thousands of genes and you can't run a mixed model analysis with one dependent variable and tens of thousands of independent variables. That obviously makes no statistical sense. Neither limma or dream are at all applicable for such an analysis.

I am not even clear what you mean by a "repeated measure". You don't seem to be using the term in the same sense as in anova theory. It cannot be true that your outcome variable is repeated but the expression values are not. If you want to relate histochemical outcome to expression, then either both must be repeated or neither.

ADD COMMENT • link 3 hours ago • updated 2 hours ago Gordon Smyth 52k

0

Entering edit mode

The data that is repeated is a quantitative metric of post-mortem pathology across multiple sections. Not all patients have the same number of sections assessed due to availability, etc. The genes/proteins are measured once in the serum. The goal is to identify serum biomarkers of this pathological hallmark. Given missing values are present in the pathological data, I was concerned about generating a composite score to use as an independent variable.

Therefore, my question related to how to address this and I wondered whether the genes/proteins could not be tested one by one as an independent variable in a mixed model and p-values adjusted by FDR? Any insight into why this would not make statistical sense would be helpful for me to understand. Other suggestions/options are much appreciated. Apologies for the naivety.

ADD REPLY • link 41 minutes ago abadgerw • 0