Identifying differentially expressed genes in individual samples from a large cohort
1
0
Entering edit mode
@jamiegearing-12556
Last seen 9 hours ago
Australia

Hello,

I have a cohort of samples that are ostensibly of the same type, but some express high levels of unique genes, which make these individuals worthy of further investigation. As such, I would like to compare each single sample to the rest, in order to find the interesting ones, then specifically identify which genes differ most from the population.

I know it is not ideal to have no replicates, but I was thinking of fitting a series of linear models, one per sample, and then globally adjusting all of the resulting p-values, to take into account the extra multiple testing. Does this approach sound vaguely sensible? Any feedback or other suggestions would be gratefully received.

Thanks!

Jamie.

library(limma)

## Cohort
nsamples <- 50
ngenes <- 100

## Expression
set.seed(12)
sd <- 0.3*sqrt(4/rchisq(ngenes, df = 4))
set.seed(12)
y <- matrix(rnorm(ngenes*nsamples, sd = sd), ngenes, nsamples)
rownames(y) <- paste0("Gene", 1:ngenes)
colnames(y) <- paste0("Sample", 1: nsamples)

## Increase expression
# Gene 2 higher in Sample 2
# Gene 3 higher in Sample 3
y[2, 2] <- y[2, 2] + 5
y[3, 3] <- y[3, 3] + 5

## lmFit for each sample versus the rest
fit2.list <- lapply(1:ncol(y), function(i){
  design <- matrix(c(1, 0), ncol = 2, nrow = ncol(y), byrow = TRUE,
                    dimnames = list(colnames(y), c("(Intercept)", colnames(y)[i])))
  design[i, 2] <- 1
  fit <- lmFit(y, design)
  fit2 <- contrasts.fit(fit, coefficients = 2)
  eBayes(fit2)
})

## Adjust p-values globally
global.p <- lapply(fit2.list, "[[", "p.value")
global.p <- do.call(global.p, what = "cbind")
global.p[] <- p.adjust(global.p, method = "BH")

which(global.p < 0.05, arr.ind = TRUE)
#       row col
# Gene2   2   2
# Gene3   3   3

global.p[1:3, 1:3]
#       Sample1  Sample2  Sample3
# Gene1   0.980 9.80e-01 9.50e-01
# Gene2   0.996 2.79e-11 9.96e-01
# Gene3   0.996 9.96e-01 1.68e-23

sessionInfo()
# R version 4.4.0 (2024-04-24)
# Platform: aarch64-apple-darwin20
# Running under: macOS Sonoma 14.4.1

# Matrix products: default
# BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
# LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

# time zone: Europe/Berlin
# tzcode source: internal

# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     

# other attached packages:
# [1] limma_3.60.0

# loaded via a namespace (and not attached):
# [1] compiler_4.4.0 statmod_1.5.0
limma DifferentialExpression • 38 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 24 minutes ago
WEHI, Melbourne, Australia

Well, it makes sense to me in a descriptive sense, as a way to investigate your question. I'm not sure that the p-values can be considered to be fully rigorous under any particular model, but that is perhaps not a serious concern.

ADD COMMENT

Login before adding your answer.

Traffic: 706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6