Question

Limma-voom - Timepoint by timepoint comparison in timecourse experiment

0

Entering edit mode

rina ▴ 30

@rina-16738

Last seen 7 months ago

France

I am analyzing RNA-Seq data from a drug response dataset. There, we have a timecourse experiment with the following experimental design:

3 timepoints: 4h, 24h, 48h

3 drugs (A,B,C) + 1 untreated control

3 replicates per drug-timepoints

I would like to get differentially expressed genes for each timepoint compared to the untreated control. I have followed the limma-voom tutorial, and I get close to 300.000 differentially expressed genes (that go down to 10.000 if I apply an FDR and logFC threshold). I am expecting to find differences compared to the untreated samples, however, I am not sure whether the p-values are inflatted due to fewer changes in the earlier time points, as it has been suggested in another question in BioConductor regarding limma. Is there a difference on how limma performs differential analysis if fewer or more contrasts are specified?

My code can be found below:

# samples: df with sample metadata. Condition column combines both Drug + Timepoint (e.g. A_4h, B_4h, C_4h, ...., untreated_48h)
# gene_counts: df with gene counts

lev <- unique(samples$Condition)
fac <- factor(samples$Condition, levels=lev)
design <- model.matrix(~ 0 + fac)
colnames(design) <- lev

x <- DGEList(counts=count_df)
cpm <- cpm(x)
lcpm <- cpm(x, log=TRUE)

keep.exprs <- rowSums(cpm>1)>=3
x <- x[keep.exprs, keep.lib.sizes=FALSE]
dim(x)

contrast_list <- makeContrasts(
   Diff_A_4h = A_4h - untreated_4h,
   Diff_B_4h = B_4h - untreated_4h,
   Diff_C_4h = C_4h - untreated_4h,
   Diff_A_24h = A_24h - untreated_24h,
   Diff_A_24h = A_24h - untreated_24h,
   Diff_A_24h = A_24h - untreated_24h,
   Diff_A_48h = A_48h - untreated_48h,
   Diff_A_48h = A_48h - untreated_48h,
   Diff_A_48h = A_48h - untreated_48h,
   levels = design)

vfit <- lmFit(v, design)
vfit <- contrasts.fit(vfit, contrasts=contrasts_list)
efit <- eBayes(vfit)

# And to extract the DEGs per contrast

extract_DEGs <- function(x1){
  contr_name <- colnames(efit$coefficients)[[x1]]
  tt <- topTable(efit, x1, number = Inf) %>%
    rownames_to_column("Gene_name") %>%
    mutate(Contrast = contr_name) %>%
    as_tibble() 
  tt
}

Coeffs <- dimnames(efit$coefficients)[["Contrasts"]]

limma • 607 views

ADD COMMENT • link written 16 months ago by rina ▴ 30

score 1 · Answer 1 · 2022-12-01

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 38 minutes ago

WEHI, Melbourne, Australia

I am not sure whether the p-values are inflatted due to fewer changes in the earlier time points, as it has been suggested in another question in BioConductor regarding limma.

No, there is no such problem.

Is there a difference on how limma performs differential analysis if fewer or more contrasts are specified?

No, not unless you use non-default options to decideTests().

Regarding your earlier steps

x <- DGEList(counts=count_df)
keep.exprs <- rowSums(cpm>1)>=3
x <- x[keep.exprs, keep.lib.sizes=FALSE]

we recommend

keep.exprs <- filterByExpr(y, design)

and we recommend TMM normalization by

x <- calcNormFactors(x)

ADD COMMENT • link 16 months ago Gordon Smyth 50k

0

Entering edit mode

Thank you so much for the quick reply and recommendations regarding data pre-processing!

ADD REPLY • link 16 months ago rina ▴ 30