Question

Identifying differentially expressed genes in individual samples from a large cohort

0

Entering edit mode

jamie.gearing ▴ 60

@jamiegearing-12556

Last seen 5 weeks ago

Australia

Hello,

I have a cohort of samples that are ostensibly of the same type, but some express high levels of unique genes, which make these individuals worthy of further investigation. As such, I would like to compare each single sample to the rest, in order to find the interesting ones, then specifically identify which genes differ most from the population.

I know it is not ideal to have no replicates, but I was thinking of fitting a series of linear models, one per sample, and then globally adjusting all of the resulting p-values, to take into account the extra multiple testing. Does this approach sound vaguely sensible? Any feedback or other suggestions would be gratefully received.

Thanks!

Jamie.

library(limma)

## Cohort
nsamples <- 50
ngenes <- 100

## Expression
set.seed(12)
sd <- 0.3*sqrt(4/rchisq(ngenes, df = 4))
set.seed(12)
y <- matrix(rnorm(ngenes*nsamples, sd = sd), ngenes, nsamples)
rownames(y) <- paste0("Gene", 1:ngenes)
colnames(y) <- paste0("Sample", 1: nsamples)

## Increase expression
# Gene 2 higher in Sample 2
# Gene 3 higher in Sample 3
y[2, 2] <- y[2, 2] + 5
y[3, 3] <- y[3, 3] + 5

## lmFit for each sample versus the rest
fit2.list <- lapply(1:ncol(y), function(i){
  design <- matrix(c(1, 0), ncol = 2, nrow = ncol(y), byrow = TRUE,
                    dimnames = list(colnames(y), c("(Intercept)", colnames(y)[i])))
  design[i, 2] <- 1
  fit <- lmFit(y, design)
  fit2 <- contrasts.fit(fit, coefficients = 2)
  eBayes(fit2)
})

## Adjust p-values globally
global.p <- lapply(fit2.list, "[[", "p.value")
global.p <- do.call(global.p, what = "cbind")
global.p[] <- p.adjust(global.p, method = "BH")

which(global.p < 0.05, arr.ind = TRUE)
#       row col
# Gene2   2   2
# Gene3   3   3

global.p[1:3, 1:3]
#       Sample1  Sample2  Sample3
# Gene1   0.980 9.80e-01 9.50e-01
# Gene2   0.996 2.79e-11 9.96e-01
# Gene3   0.996 9.96e-01 1.68e-23

sessionInfo()
# R version 4.4.0 (2024-04-24)
# Platform: aarch64-apple-darwin20
# Running under: macOS Sonoma 14.4.1

# Matrix products: default
# BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
# LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

# time zone: Europe/Berlin
# tzcode source: internal

# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     

# other attached packages:
# [1] limma_3.60.0

# loaded via a namespace (and not attached):
# [1] compiler_4.4.0 statmod_1.5.0

limma DifferentialExpression • 299 views

ADD COMMENT • link 7 weeks ago jamie.gearing ▴ 60

score 2 · Accepted Answer · 2025-11-18

2

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Well, it makes sense to me in a descriptive sense, as a way to investigate your question. I'm not sure that the p-values can be considered to be fully rigorous under any particular model, but that is perhaps not a serious concern.

ADD COMMENT • link 7 weeks ago Gordon Smyth 53k

0

Entering edit mode

Great. Yes, something descriptive is probably good enough, even if the p-values themselves are not so rigorous. Thanks very much Gordon!

ADD REPLY • link 7 weeks ago jamie.gearing ▴ 60