Question

interaction formulas without replicates

0

Entering edit mode

rea ▴ 10

@rea-11244

Last seen 6.3 years ago

I refer to section 3.3.2 on nested interaction formulas in the edgeR user guide.

I wonder if the coefficient TreatDrug.Time.1h provides the logFC and associated FDR corresponding to (drug.1h-drug.0h)-(placebo.1h-placebo.0h). Is this interpretation correct?

Does it make sense to use nested interaction formulas when I have only one replicate for each level of Time and Treatment. In your example placebo.0h has two samples.

My study has one sample for each combination of the levels of two factors.Applying nested interaction formulas, all genes reported FDR = 1. I wonder if this result is due to the absence of replicates.

Might you suggest, alternatively, other ways where I can test the influence of multiple factors in differential expression results in presence of just one replicate for each combination of their levels?

cancer • 824 views

ADD COMMENT • link updated 7.6 years ago by Aaron Lun ★ 28k • written 7.6 years ago by rea ▴ 10

0

Entering edit mode

You need to add the edgeR tag, otherwise the maintainers don't get notified.

ADD REPLY • link 7.6 years ago Aaron Lun ★ 28k

score 0 · Answer 1 · 2016-09-28

The answers to your questions are:

1) No, the TreatDrug:Time1h coefficient represents the log-fold change between drug.1h and drug.0h groups. Have a look at the design matrix if you're uncertain. (I rarely trust the column names provided by model.matrix, because the meaning of those names will depend on the design formula, even for designs that are mathematically equivalent. It's safer to just check the design matrix directly.)

2) If you have only one replicate for each combination, then using a nested interaction design will not give you any residual d.f. for dispersion estimation. Consider using a simpler model in order to free up residual d.f. for estimation. For example, you could use an additive model where you assume that the drug/time effects are independent, or a model with time as a real-valued covariate if you have enough time points.

3) Lack of replicates will reduce power to detect DE between conditions. Obviously, you won't have much data to reject the null hypothesis if you only have one observation for each condition. I also assume you manually input a dispersion value, which may or may not be appropriate; if it's too large, then that will also result in conservativeness.

4) Read the relevant section (2.11) of the edgeR user's guide on what to do without replicates.