DeSEQ2- What is the difference between Gene Clustering and using LRT for Time Series experiment WITHOUT treatments
Entering edit mode
Last seen 2.2 years ago

Hi, I am trying to analyse a time series experiment of neurons differentiated from human stem cells, to understand the differentiation process. I have sampled my cells at the following time points post-differentiation

Day 0, 6h, 12h, 24h, 36h, Day 2, Day 3, Day 4, Day 14 and Day 21. The reason for the gap between Day 4 and subsequent time points is because fate commitment happens by day 4 and several functional events occur at around Day 14 and Day 21. I have performed RNAseq and ATACseq for each timepoint. Firstly, with RNAseq, I am trying to figure out what is the right way to identify gene clusters across the time timepoints.

1) Should I use the LRT with a reduced design as described by Michael Love et al in In which chase, I did the following:

timepoint <- factor(c(rep("D00H00", 3), rep("D00H06", 3), rep("D00H12", 3), rep("D01H00", 3), rep("D01H12", 3), rep("D02H00", 3), rep("D03H00", 3), rep("D04H00", 3), rep("D14H00", 3), rep("D21H00", 3)))
ddsMat<-DESeqDataSetFromMatrix(countData=RNAseq_genecounts_matrix, colData=coldata, design=~timepoint)
ddsMat <- ddsMat[ rowSums(counts(ddsMat)) > 1, ] 
ddsLRT <- DESeq(ddsMat, test = "LRT", reduced = ~1)
resLRT <- results(ddsLRT)
betas <- coef(ddsLRT)
topGenes <- head(order(resLRT$padj),1000)
mat <- betas[topGenes, -1]
thr <- 3 
mat[mat < -thr] <- -thr
mat[mat > thr] <- thr
pheatmap(mat, breaks=seq(from=-thr, to=thr, length=101),

2) or should I do gene clustering using transformed values as described in the following:

rld<- rlog(ddsMat, blind= FALSE)
topVarGenes <- head(order(rowVars(assay(rld)), decreasing = TRUE), 1000)
mat  <- assay(rld)[ topVarGenes, ]
mat  <- mat - rowMeans(mat)
pdf ("plots/GeneCluster_50.pdf")
pheatmap(mat, clustering_distance_cols=sdist, clustering_distance_rows=sdist) 

Ultimately, it would be great if you could explain the differences between the two approaches. I have looked at several other related posts on this matter, but can't seem to understand this difference. I know this a big ask, so I greatly appreciate any help you can offer!

deseq2 Time Series Gene Clustering LRT ATACseq • 934 views
Entering edit mode
Last seen 17 hours ago
United States

The first one focuses on statistical significance while the second uses the variance stabilized data alone and so includes a different set of genes. There’s not really a “correct” choice.

Entering edit mode

As simple as that! Thanks Michael!


Login before adding your answer.

Traffic: 172 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6