Question

DeSEQ2- What is the difference between Gene Clustering and using LRT for Time Series experiment WITHOUT treatments

0

Entering edit mode

kaiser.karim • 0

@kaiserkarim-20107

Last seen 4.8 years ago

Hi, I am trying to analyse a time series experiment of neurons differentiated from human stem cells, to understand the differentiation process. I have sampled my cells at the following time points post-differentiation

Day 0, 6h, 12h, 24h, 36h, Day 2, Day 3, Day 4, Day 14 and Day 21. The reason for the gap between Day 4 and subsequent time points is because fate commitment happens by day 4 and several functional events occur at around Day 14 and Day 21. I have performed RNAseq and ATACseq for each timepoint. Firstly, with RNAseq, I am trying to figure out what is the right way to identify gene clusters across the time timepoints.

1) Should I use the LRT with a reduced design as described by Michael Love et al in https://bioconductor.org/packages/3.7/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#time-course-experiments In which chase, I did the following:

timepoint <- factor(c(rep("D00H00", 3), rep("D00H06", 3), rep("D00H12", 3), rep("D01H00", 3), rep("D01H12", 3), rep("D02H00", 3), rep("D03H00", 3), rep("D04H00", 3), rep("D14H00", 3), rep("D21H00", 3)))
ddsMat<-DESeqDataSetFromMatrix(countData=RNAseq_genecounts_matrix, colData=coldata, design=~timepoint)
ddsMat <- ddsMat[ rowSums(counts(ddsMat)) > 1, ] 
ddsLRT <- DESeq(ddsMat, test = "LRT", reduced = ~1)
resLRT <- results(ddsLRT)
betas <- coef(ddsLRT)
topGenes <- head(order(resLRT$padj),1000)
mat <- betas[topGenes, -1]
thr <- 3 
mat[mat < -thr] <- -thr
mat[mat > thr] <- thr
pheatmap(mat, breaks=seq(from=-thr, to=thr, length=101),
         cluster_col=FALSE)

2) or should I do gene clustering using transformed values as described in the following: http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#gene-clustering

rld<- rlog(ddsMat, blind= FALSE)
topVarGenes <- head(order(rowVars(assay(rld)), decreasing = TRUE), 1000)
mat  <- assay(rld)[ topVarGenes, ]
mat  <- mat - rowMeans(mat)
pdf ("plots/GeneCluster_50.pdf")
pheatmap(mat, clustering_distance_cols=sdist, clustering_distance_rows=sdist)

Ultimately, it would be great if you could explain the differences between the two approaches. I have looked at several other related posts on this matter, but can't seem to understand this difference. I know this a big ask, so I greatly appreciate any help you can offer!

deseq2 Time Series Gene Clustering LRT ATACseq • 2.0k views

ADD COMMENT • link updated 5.1 years ago by Michael Love 41k • written 5.1 years ago by kaiser.karim • 0

score 1 · Answer 1 · 2019-03-06

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 17 minutes ago

United States

The first one focuses on statistical significance while the second uses the variance stabilized data alone and so includes a different set of genes. There’s not really a “correct” choice.