Question

Monotonic genes in results of differential gene expression analysis

0

Entering edit mode

Shaimaa Gamal • 0

@14ef1b09

Last seen 8 months ago

Egypt

I want to know which methods are commonly used for RNA seq data if the goal is to identify monotonic increasing or decreasing gene expression over time or across stages using the results of differential gene expression analysis?

DifferentialExpression rnaseqGene RNASeqData • 1.6k views

ADD COMMENT • link 2.0 years ago Shaimaa Gamal • 0

0

Entering edit mode

Cross-posted to Biostars https://www.biostars.org/p/9585490/

ADD REPLY • link 2.0 years ago Gordon Smyth 53k

score 3 · Accepted Answer · 2024-01-20

3

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

The easiest way is to fit a time trend, select genes with a significant F-test and then check whether the fitted values are in monotonic order.

ADD COMMENT • link 2.0 years ago Gordon Smyth 53k

0

Entering edit mode

Thank you so much Prof. Gordon

ADD REPLY • link 2.0 years ago Shaimaa Gamal • 0

0

Entering edit mode

I tried to fit a time trend model for each gene but it takes forever. It seems like it is computationally expensive. Can I fit the linear model for all genes simultaneously using a matrix?

Processor: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz 2.80 GHz RAM:16.0 GB

  time_trend_model <- lapply(names(ordered_counts)[1:5429], function(gene) {
  formula <- as.formula(paste(ordered_counts, "~ stage"))
  lm(formula, data = ordered_counts)
  })

ADD REPLY • link 2.0 years ago Shaimaa Gamal • 0

2

Entering edit mode

Fitting genewise time trends take only a few seconds using either limma or edgeR, even for a large dataset and even on a laptop computer. Both packages perform F-tests. There are examples in the limma and edgeR User's guides, but it is just a standard analysis for either of those packages.

I was certainly not recommending that you fit linear models directly to ordered counts. That wouldn't make any sense statistically, even it was computationally efficient.

ADD REPLY • link 2.0 years ago Gordon Smyth 53k

0

Entering edit mode

trend plots

I am not sure why the plots are different from the user guide. I am dealing with defined stages while the user guide deals with hours

design = model.matrix(~0 + dge_stage$samples$Stage)
colnames(design)[1]="Stage1"
colnames(design)[2]="Stage2"
colnames(design)[3]="Stage3"
colnames(design)[4]="Stage4"

stages=as.numeric(c(dge_stage$samples$Stage))

polycurve = poly(stages, degree=3)
design <- model.matrix(~ polycurve)

dge_stage <- estimateGLMCommonDisp(dge_stage, design)

dge_stage <- estimateGLMTagwiseDisp(dge_stage, design)

glmfit=glmQLFit(dge_stage, design=design)
plotQLDisp(glmfit)

Time_FTest=glmQLFTest(glmfit, coef=ncol(glmfit$design))

#The topTags function lists the top set of genes with most 
#significant time effects.

tab <- as.data.frame(topTags(Time_FTest, n=30))

summary(decideTests(Time_FTest))

logCPM.obs <- cpm(dge_stage$counts, log=TRUE, prior.count=Time_FTest$prior.count)
logCPM.fit <- cpm(Time_FTest, log=TRUE)


par(mfrow=c(2,2))
for(i in 1:10) {
GeneID <- row.names(tab)[i]
Symbol <- tab$hgnc_symbol[i]
logCPM.obs.i <- logCPM.obs[GeneID,]
logCPM.fit.i <- logCPM.fit[GeneID,]
plot(stages, logCPM.obs.i, ylab="log-CPM", main=Symbol, pch=16)
lines(stages, logCPM.fit.i, col="green", lwd=2)
}

Also, results of summary(decideTests(Time_FTest))        

polycurve3
Down            0
NotSig       5429
Up                0

ADD REPLY • link 2.0 years ago Shaimaa Gamal • 0