Question

no DGE with multiple groups in edge R

0

Entering edit mode

joelrome88 ▴ 20

@joelrome88-9626

Last seen 7.1 years ago

Good morning,

I've been having an issue trying to look for DGE in edgeR. My main problem is that the FDR of ALL genes is 1.

I have data for 5 timepoints, and two different conditions for each one (Control and treatment). Additionally, there is a starting point sample (time 0, no treatment). This makes a total of 11 groups (ie., Time1.Control, Time1.TreatmentB...Time5.Control, Time5.TreatmentB ). I have ~4 biological replicates for each sample (~45 files in total).

My intention is to do pairwise comparisons across time to look at how treatment and time affect expression of genes (TreatmentvsControl@Time1... TreatmentvsControl@Time3 ...TreatmentvsControl@Time5).

I'm using GLM models to adjust data to a design using model.matrix(~0+Group). I'm following the edgeR manual but I haven't found a way to account for my problem.

When looking at the MDS plots, the samples don't separate really well by any condition (the most striking example is that when I color the sample by treatment (control vs treatment)).

Here is part of the code:

> Yglm <-DGEList(counts=GeneCounts, group=phenoInfo$TT)

## Normalize data and adjust using GLMs

> Yglm <- calcNormFactors(Yglm)

# Estimate dispersions
> Yglm <- estimateGLMCommonDisp(Yglm, design)
> Yglm <- estimateGLMTrendedDisp(Yglm, design)
> Yglm <- estimateGLMTagwiseDisp(Yglm, design)

# Fit:

> Fitglm <- glmFit(Yglm,design)

> my.contrasts <- makeContrasts(
# High vs Sufficient
tm1_TvC=T1_Trt-T1_Ctrl,
tm2_TvC=T2_Trt-T2_Ctrl,
tm3_TvC=T3_Trt-T3_Ctrl,
tm4_TvC=T4_Trt-T4_Ctrl,
tm5_TvC=T5_Trt-T5_Ctrl,
levels = design)

> lrt <- glmLRT(Fitglm, contrast=my.contrasts[,tm1_TvC])
> lrt <- topTags(llrt,n=Inf,sort.by = "none")$table
> print(summary(de <- decideTestsDGE(lrt))
#MA plot
> detags <- rownames(Yglm)[as.logical(de)]
plotSmear(lrt.temp, de.tags=detags,main=eachContrast,cex=0.2)
abline(h=c(-2, 2), col="blue")

Has anyone ever had a similar problem? How can I solve this issue?

Thanks!!

edgeR differential gene expression RNAseq • 1.5k views

ADD COMMENT • link updated 8.3 years ago by Aaron Lun ★ 28k • written 8.3 years ago by joelrome88 ▴ 20

score 2 · Answer 1 · 2016-01-29

Well, there doesn't seem to be anything wrong with your code, so the only other conclusion is that there aren't any DE genes in your data. If you don't see clear differences in the MDS plot, that's usually an indication that your DE will be weak (if there is, indeed, any DE at all). You've got plenty of samples, so power shouldn't be a problem. The only thing I can imagine if there's outlier samples or genes that are inflating the dispersion. Depending on the type of experiment, I'd expect dispersions of below 0.05 and prior degrees of freedom between 5-10.

It's worth noting that having all-unity values for the FDR is not symptomatic of a problem with the code. There's a cap on the adjusted p-value at unity, for obvious reasons; the BH correction also enforces monotonicity, so you can end up with many genes on the same adjusted p-value. This results in "blocks" of identical values that look artificial but are not incorrect. Finally, remember that it is a correction for multiple testing. If there aren't any DE genes, then the adjusted p-value should be unity, because you shouldn't detect anything at all.

If you aren't getting any DE genes in the individual pairwise comparisons, you could try doing an ANOVA-like comparison over all time points (set contrast=my.contrasts in glmLRT). This should use information from multiple time points to detect DE, which might give you a bit more power. By the sounds of it, though, it might not help much when there is so little evidence for DE at each time point.