Use of coef() as a measure of effect size in a multiple regression using DESeq2
1
0
Entering edit mode
@owenchapman1-23503
Last seen 17 months ago

Hello, I'm using DESeq2 to model a multi-factorial experiment with two predictor variables: Genotype (binary) and Passage (integer):

# Make passage variable numeric
coldata_exp <- coldata
coldata_exp$Passage <- (as.numeric(coldata_exp$Passage))
dds_exp <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata_exp,
                              design= ~Genotype*Passage)
keep <- rowSums(counts(dds_exp)) >= 10
dds_exp <- dds_exp[keep,]
# Make the model, perform likelihood ratio test
dds_exp <- DESeq(dds_exp, test="LRT",reduced = ~ Genotype + Passage)

We would like to derive some measure of 'effect size' of the interaction which is that is comparable between genes; or, how much does v1 affect gene expression in a v2-dependent manner? We have arrived at the standardized Beta coefficient (Wikipedia) as an appropriate measure of effect size.

My few questions are: Is it appropriate to compare standardized Beta values across genes? Does the DESeq2 coef() function output raw regression coefficients, or are they standardized Betas?

Thank you! Owen

DESeq2 GLMs • 264 views
ADD COMMENT
0
Entering edit mode

Hi owen.chapman1, could you share your method on how to compute the effect size of multiple groups comparison? Thanks.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 9 hours ago
United States

They are not standardized with respect to eg gene variability, only fold changes.

By the way, we recommend using lfcShrink() to produce posterior estimates as opposed to the MLE estimates with coef().

ADD COMMENT
1
Entering edit mode

Hi owen.chapman1, could you share your method on how to compute the effect size of multiple groups comparison? Thanks.

ADD REPLY
0
Entering edit mode

Hi Michael, Thanks for your help. Our concern with using fold change is that it's not well-defined for a multiple regression on more than one predictor variable; especially as we'd like our GLM to model a function of all timepoints, not just a pairwise comparison between two timepoints. Thus, we don't think FC addresses our question. Instead, to estimate the interaction between the two predictor variables, we landed on betas; but we'd like to be sure (1) whether coef() produces standardized betas, and (2) whether this is an appropriate metric to compare between genes?

ADD REPLY
0
Entering edit mode

I do think log fold changes are well defined even when there is more than one predictor variable. I'm not sure what you mean by well defined though.

(1) is answered already

(2) yes fold changes can be compared across genes, e.g. the MA plot shows the fold changes on the y-axis (log scale)

ADD REPLY
0
Entering edit mode

Hi Michael,

I do think log fold changes are well defined even when there is more than one predictor variable. I'm not sure what you mean by well defined though.

Thanks for your patience, I think I must be misunderstanding something about how fold change is defined. Fold change only makes sense as a comparison between two treatments, I think. And so in a factorial design, where we have one binary predictor variable [0,1] (genotype) and another integer predictor [0-3] (time), then we would have many possible pairwise comparisons, each with a fold change. We could, as you suggest in your RNA-seq workflow, use the FC for a single pairwise comparison (eg. v1=0,v2=0 vs v1=1,v2=3), but this would ignore information from the other points of the timeseries. This has motivated us to explore using coef() as a measure of effect size for the v1:v2 interaction across all samples.

Following up on (2), do you have insight as to whether standardized betas are also comparable between genes?

ADD REPLY
0
Entering edit mode

Oh if you just mean the words “fold change” shouldn’t be used except for two group comparisons, I’m not sure I agree, but you can refer to estimated coefficients however you like.

results(), coef() and lfcShrink() all are giving you estimates of coefficients, there isn’t a difference in the software among these (except that lfcShrink uses a Bayesian model).

I think we’ve covered all the questions already, and what can be obtained and not obtained from DESeq2. Sorry I don’t have abundant time these days so I’ll have to leave it at that.

ADD REPLY
0
Entering edit mode

Thank you, I appreciate your time and expertise! O

ADD REPLY
0
Entering edit mode

Thank you, I appreciate your time and expertise! O

ADD REPLY

Login before adding your answer.

Traffic: 289 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6