Question: Interpretation of the heatmap produced by DESeq2
0
gravatar for yh362
3 months ago by
yh3620
yh3620 wrote:

I have a time series RNA-seq data for two different genotypes, and I did a Likelihood ratio test in DEseq2 to see whether there are genes showing different expression profiles between the two genotypes across time points (as described in https://f1000research.com/articles/4-1070, the last section: Time Course Experiment). So I did the following:

sampleTable = data.frame(sampleName = files.htseq,fileName =
files.htseq,time = time, strain = genotype)

library("DESeq2") 
ddsHTSeq = DESeqDataSetFromHTSeqCount(sampleTable =sampleTable,
                                        directory = dir.htseq,
                                        design= ~ time + strain + strain:time) 
 dds.LRT = DESeq(ddsHTSeq, test="LRT", reduced = ~ time +strain) 
 res.LRT = results(dds.LRT)

To make it more clear, the sample table looks like the following:

 sampleTable
    sampleName    fileName time   strain
1  A5967.htseq A5967.htseq   T0     nc95
2  A5968.htseq A5968.htseq   T0     nc95
3  A5969.htseq A5969.htseq   T0     nc95
4  A5973.htseq A5973.htseq   T2     nc95
5  A5974.htseq A5974.htseq   T2     nc95
6  A5975.htseq A5975.htseq   T2     nc95
7  A5979.htseq A5979.htseq   T6     nc95
8  A5980.htseq A5980.htseq   T6     nc95
9  A5985.htseq A5985.htseq  T24     nc95
10 A5986.htseq A5986.htseq  T24     nc95
11 A5987.htseq A5987.htseq  T24     nc95
12 A5988.htseq A5988.htseq   T0 nc95_nic
13 A5989.htseq A5989.htseq   T0 nc95_nic
14 A5994.htseq A5994.htseq   T2 nc95_nic
15 A5996.htseq A5996.htseq   T2 nc95_nic
16 A6000.htseq A6000.htseq   T6 nc95_nic
17 A6001.htseq A6001.htseq   T6 nc95_nic
18 A6006.htseq A6006.htseq  T24 nc95_nic
19 A6007.htseq A6007.htseq  T24 nc95_nic

Then, as described in the RNA-seq workflow in the link I provided above, I looked at the heatmap of GLM coefficients, excluding the intercept and strainnc95_nic columns (col1 and col5, respectively)

betas = coef(dds.LRT) 
> library("pheatmap")  
> topGenes = head(order(res.LRT$padj),200)
> mat = betas[topGenes, -c(1,5)] 
> thr = 3
> mat[mat < -thr] <- -thr 
> mat[mat > thr] <- thr pheatmap(mat,
> breaks=seq(from=-thr, to=thr, length=101),cluster_col=FALSE)

Then I get the following heatmap. It is interesting that, genes that are red for the first three columns are blue for the last three columns, and the absolute value (as encoded by the shade of the color) of the first three coefficients positively correlates with the last three coefficients. However, what can I make of these observations? What does it really mean (if any)? (It seems like the image doesn't show up... it can be accessed here https://ibb.co/3Yhx9Ph ) Shown following is the heatmap I got from the above code

deseq2 • 166 views
ADD COMMENTlink modified 3 months ago by Michael Love25k • written 3 months ago by yh3620
Answer: Interpretation of the heatmap produced by DESeq2
0
gravatar for Michael Love
3 months ago by
Michael Love25k
United States
Michael Love25k wrote:

Note that the color of the heatmap is affected by the scaling done by the pheatmap function (read over the ?pheatmap documentation for more details). It's also kind of hard to interpret the shades of color and how those related to the linear model terms.

It may make more sense to look at plotCounts for individual genes to see why they may come up with a small p-value in the LRT.

ADD COMMENTlink written 3 months ago by Michael Love25k

Hi, thanks for reply! Sorry for my late followup. So I looked at the plotCounts, but I was wondering why it puts T6 after T24 on the x-axis(image can be seen here https://ibb.co/hX0TTFL)

The code used:

> library(ggplot2)
> data <- plotCounts(dds.LRT, which.min(res.LRT$padj),
>                    intgroup=c("time","strain"), returnData=TRUE)
> ggplot(data, aes(x=time, y=count, color=strain, group=strain)) +
> geom_point() + stat_smooth(se=FALSE,method="loess") + scale_y_log10()
ADD REPLYlink modified 3 months ago by Martin Morgan ♦♦ 24k • written 3 months ago by yh3620
1

This is a base R point. R orders factors alphabetically unless you specify otherwise. See the "Note on factor levels" section of the DESeq2 vignette.

ADD REPLYlink written 3 months ago by Michael Love25k

It's 'fun' to note that the notion of 'alphabetical' order depends on the locale, for instance in Estonian apparently 'z' sorts before 't' https://www.unicode.org/cldr/charts/30/collation/et.html so in my locale

> levels(factor(c("zero", "three")))
[1] "three" "zero"

whereas a user with an Estonian encoding would have the levels reversed -- it's always best practice to specify levels!

ADD REPLYlink written 3 months ago by Martin Morgan ♦♦ 24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 476 users visited in the last hour