Question: DeSEQ2- What is the difference between Gene Clustering and using LRT for Time Series experiment WITHOUT treatments
0
gravatar for kaiser.karim
6 months ago by
kaiser.karim0 wrote:

Hi, I am trying to analyse a time series experiment of neurons differentiated from human stem cells, to understand the differentiation process. I have sampled my cells at the following time points post-differentiation

Day 0, 6h, 12h, 24h, 36h, Day 2, Day 3, Day 4, Day 14 and Day 21. The reason for the gap between Day 4 and subsequent time points is because fate commitment happens by day 4 and several functional events occur at around Day 14 and Day 21. I have performed RNAseq and ATACseq for each timepoint. Firstly, with RNAseq, I am trying to figure out what is the right way to identify gene clusters across the time timepoints.

1) Should I use the LRT with a reduced design as described by Michael Love et al in https://bioconductor.org/packages/3.7/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#time-course-experiments In which chase, I did the following:

timepoint <- factor(c(rep("D00H00", 3), rep("D00H06", 3), rep("D00H12", 3), rep("D01H00", 3), rep("D01H12", 3), rep("D02H00", 3), rep("D03H00", 3), rep("D04H00", 3), rep("D14H00", 3), rep("D21H00", 3)))
ddsMat<-DESeqDataSetFromMatrix(countData=RNAseq_genecounts_matrix, colData=coldata, design=~timepoint)
ddsMat <- ddsMat[ rowSums(counts(ddsMat)) > 1, ] 
ddsLRT <- DESeq(ddsMat, test = "LRT", reduced = ~1)
resLRT <- results(ddsLRT)
betas <- coef(ddsLRT)
topGenes <- head(order(resLRT$padj),1000)
mat <- betas[topGenes, -1]
thr <- 3 
mat[mat < -thr] <- -thr
mat[mat > thr] <- thr
pheatmap(mat, breaks=seq(from=-thr, to=thr, length=101),
         cluster_col=FALSE)  

2) or should I do gene clustering using transformed values as described in the following: http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#gene-clustering

rld<- rlog(ddsMat, blind= FALSE)
topVarGenes <- head(order(rowVars(assay(rld)), decreasing = TRUE), 1000)
mat  <- assay(rld)[ topVarGenes, ]
mat  <- mat - rowMeans(mat)
pdf ("plots/GeneCluster_50.pdf")
pheatmap(mat, clustering_distance_cols=sdist, clustering_distance_rows=sdist) 

Ultimately, it would be great if you could explain the differences between the two approaches. I have looked at several other related posts on this matter, but can't seem to understand this difference. I know this a big ask, so I greatly appreciate any help you can offer!

ADD COMMENTlink modified 6 months ago by Michael Love25k • written 6 months ago by kaiser.karim0
Answer: DeSEQ2- What is the difference between Gene Clustering and using LRT for Time Se
1
gravatar for Michael Love
6 months ago by
Michael Love25k
United States
Michael Love25k wrote:

The first one focuses on statistical significance while the second uses the variance stabilized data alone and so includes a different set of genes. There’s not really a “correct” choice.

ADD COMMENTlink written 6 months ago by Michael Love25k

As simple as that! Thanks Michael!

ADD REPLYlink written 6 months ago by kaiser.karim0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 306 users visited in the last hour