Hello everyone, I am a super fan of limma. Recently, I came across some chanllengings when working a complex time course RNAseq project. The metadata includes three groups, including unstained control, high, and low. Each group has 7 time series. However, they only had one biological replicate, which means I only have one sample per time point / group. The data provider wants to know which genes are differentialy regulated in time between grouped High versus Unstained (as well as high_vs_low, low_vs_unstained). According to the guideline in Chapter 9.6, I suppose this is more in line with the process of Many time points (9.6.2) since it can handle the situation without biological replication very well.
What confused me is, the example in guideline only includes control and treatment groups. I'm unsure if this approach can be directly applied to my data. Should I reassemble the data, construct linear models for High_vs_Unstained, high_vs_low, and low_vs_unstained separately, and then run contrasts.fit()
? Because I suppose the adjust p value obatined from my current codes is the trend of all pairwise comparisons among three groups. Here are my codes and data.
library(splines)
D <- readRDS("D.filt.anno.dedup.norm.rds")
targets <- D$targets
targets$group <- factor(targets$group, levels = c("unstained", "low", "high"))
targets$time <- gsub("t","",targets$time) %>% as.numeric()
targets$time
# [1] 1 1 1 2 2 2 3 3 4 4 4 5 5 5 6 6 6 7 7 7
X <- ns(targets$time, df=5)
Group <- factor(targets$group)
design <- model.matrix(~Group * X)
colnames(design) <- gsub(":","_", colnames(design))
colnames(design)
# [1] "(Intercept)" "Grouplow" "Grouphigh" "X1" "X2" "X3"
# [7] "X4" "X5" "Grouplow_X1" "Grouphigh_X1" "Grouplow_X2" "Grouphigh_X2"
# [13] "Grouplow_X3" "Grouphigh_X3" "Grouplow_X4" "Grouphigh_X4" "Grouplow_X5" "Grouphigh_X5"
logCPM <- cpm(D, log = TRUE, prior.count = 3)
fit <- lmFit(logCPM, design)
fit.eb <- eBayes(fit, trend = TRUE)
my.contrasts <- makeContrasts(
# High vs low
HighvsLow_X2 = (Grouphigh_X2 - Grouphigh_X1) - (Grouplow_X2 - Grouplow_X1),
HighvsLow_X3 = (Grouphigh_X3 - Grouphigh_X1) - (Grouplow_X3 - Grouplow_X1),
HighvsLow_X4 = (Grouphigh_X4 - Grouphigh_X1) - (Grouplow_X4 - Grouplow_X1),
HighvsLow_X5 = (Grouphigh_X5 - Grouphigh_X1) - (Grouplow_X5 - Grouplow_X1),
# High vs unstained
HighvsUnstained_X2 = Grouphigh_X2 - Grouphigh_X1,
HighvsUnstained_X3 = Grouphigh_X3 - Grouphigh_X1,
HighvsUnstained_X4 = Grouphigh_X4 - Grouphigh_X1,
HighvsUnstained_X5 = Grouphigh_X5 - Grouphigh_X1,
# Low vs unstained
LowvsUnstained_X2 = Grouplow_X2 - Grouplow_X1,
LowvsUnstained_X3 = Grouplow_X3 - Grouplow_X1,
LowvsUnstained_X4 = Grouplow_X4 - Grouplow_X1,
LowvsUnstained_X5 = Grouplow_X5 - Grouplow_X1,
levels = design)
fit2 <- contrasts.fit(fit, my.contrasts)
fit2.eb <- eBayes(fit2, trend = TRUE, robust= FALSE)
tt = topTable(fit2.eb,n=Inf,sort.by="none")
tt <- tt[,-c(1:12)]
gene_anno <- D$idTOsymbol
tt <- tt[gene_anno$entrezgene_id,]
tt <- cbind(gene_anno,tt)
for (i in 1:ncol(my.contrasts)){
tt.i = topTable(fit2.eb,coef=i,n=Inf,sort.by="none")[,c(1,3:6)]
names(tt.i) = paste0(names(tt.i)," (",colnames(my.contrasts)[i],")")
tt = cbind(tt,tt.i)}
str(tt)
#'data.frame': 20555 obs. of 66 variables:
# $ entrezgene_id : chr "1" "10" "100" "10000" ...
# $ hgnc_symbol : chr "A1BG" "NAT2" "ADA" "AKT3" ...
# $ AveExpr : num -2.85 -2.45 4.22 5.68 2.58 ...
# $ F : num 1.542 0.987 2.46 1.076 0.565 ...
# $ P.Value : num 0.298 0.517 0.134 0.472 0.779 ...
# $ adj.P.Val : num 0.987 0.987 0.987 0.987 0.987 ...
# $ logFC (HighvsLow_X2) : num 0.522 -2.0541 -0.195 -1.3448 -0.0839 ...
# $ t (HighvsLow_X2) : num 0.2136 -0.6023 -0.1544 -1.3685 -0.0527 ...
# $ P.Value (HighvsLow_X2) : num 0.837 0.567 0.882 0.216 0.96 ...
# $ adj.P.Val (HighvsLow_X2) : num 1 1 1 1 1 ...
# $ B (HighvsLow_X2) : num -4.6 -4.6 -4.6 -4.59 -4.6 ...
Secondly, I also tried to directly apply the experimental setting time series instead of 'pseudotime series' in the design. Do you think it could introduce too many false positive results due to inability to estimate within-group variability? Here are my codes.