Question

Interpretation of the heatmap produced by DESeq2

0

Entering edit mode

yh362 • 0

@yh362-21462

Last seen 4.7 years ago

I have a time series RNA-seq data for two different genotypes, and I did a Likelihood ratio test in DEseq2 to see whether there are genes showing different expression profiles between the two genotypes across time points (as described in https://f1000research.com/articles/4-1070, the last section: Time Course Experiment). So I did the following:

sampleTable = data.frame(sampleName = files.htseq,fileName =
files.htseq,time = time, strain = genotype)

library("DESeq2") 
ddsHTSeq = DESeqDataSetFromHTSeqCount(sampleTable =sampleTable,
                                        directory = dir.htseq,
                                        design= ~ time + strain + strain:time) 
 dds.LRT = DESeq(ddsHTSeq, test="LRT", reduced = ~ time +strain) 
 res.LRT = results(dds.LRT)

To make it more clear, the sample table looks like the following:

 sampleTable
    sampleName    fileName time   strain
1  A5967.htseq A5967.htseq   T0     nc95
2  A5968.htseq A5968.htseq   T0     nc95
3  A5969.htseq A5969.htseq   T0     nc95
4  A5973.htseq A5973.htseq   T2     nc95
5  A5974.htseq A5974.htseq   T2     nc95
6  A5975.htseq A5975.htseq   T2     nc95
7  A5979.htseq A5979.htseq   T6     nc95
8  A5980.htseq A5980.htseq   T6     nc95
9  A5985.htseq A5985.htseq  T24     nc95
10 A5986.htseq A5986.htseq  T24     nc95
11 A5987.htseq A5987.htseq  T24     nc95
12 A5988.htseq A5988.htseq   T0 nc95_nic
13 A5989.htseq A5989.htseq   T0 nc95_nic
14 A5994.htseq A5994.htseq   T2 nc95_nic
15 A5996.htseq A5996.htseq   T2 nc95_nic
16 A6000.htseq A6000.htseq   T6 nc95_nic
17 A6001.htseq A6001.htseq   T6 nc95_nic
18 A6006.htseq A6006.htseq  T24 nc95_nic
19 A6007.htseq A6007.htseq  T24 nc95_nic

Then, as described in the RNA-seq workflow in the link I provided above, I looked at the heatmap of GLM coefficients, excluding the intercept and strainnc95_nic columns (col1 and col5, respectively)

betas = coef(dds.LRT) 
> library("pheatmap")  
> topGenes = head(order(res.LRT$padj),200)
> mat = betas[topGenes, -c(1,5)] 
> thr = 3
> mat[mat < -thr] <- -thr 
> mat[mat > thr] <- thr pheatmap(mat,
> breaks=seq(from=-thr, to=thr, length=101),cluster_col=FALSE)

Then I get the following heatmap. It is interesting that, genes that are red for the first three columns are blue for the last three columns, and the absolute value (as encoded by the shade of the color) of the first three coefficients positively correlates with the last three coefficients. However, what can I make of these observations? What does it really mean (if any)? (It seems like the image doesn't show up... it can be accessed here https://ibb.co/3Yhx9Ph ) Shown following is the heatmap I got from the above code

DESeq2 • 809 views

ADD COMMENT • link updated 4.7 years ago by Michael Love 41k • written 4.7 years ago by yh362 • 0

Martin Morgan · Answer 1 · 2019-07-26

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 41 minutes ago

United States

Note that the color of the heatmap is affected by the scaling done by the pheatmap function (read over the ?pheatmap documentation for more details). It's also kind of hard to interpret the shades of color and how those related to the linear model terms.

It may make more sense to look at plotCounts for individual genes to see why they may come up with a small p-value in the LRT.

ADD COMMENT • link 4.7 years ago Michael Love 41k

0

Entering edit mode

Hi, thanks for reply! Sorry for my late followup. So I looked at the plotCounts, but I was wondering why it puts T6 after T24 on the x-axis(image can be seen here https://ibb.co/hX0TTFL)

The code used:

> library(ggplot2)
> data <- plotCounts(dds.LRT, which.min(res.LRT$padj),
>                    intgroup=c("time","strain"), returnData=TRUE)
> ggplot(data, aes(x=time, y=count, color=strain, group=strain)) +
> geom_point() + stat_smooth(se=FALSE,method="loess") + scale_y_log10()

ADD REPLY • link updated 4.7 years ago by Martin Morgan 25k • written 4.7 years ago by yh362 • 0

1

Entering edit mode

This is a base R point. R orders factors alphabetically unless you specify otherwise. See the "Note on factor levels" section of the DESeq2 vignette.

ADD REPLY • link 4.7 years ago Michael Love 41k

0

Entering edit mode

It's 'fun' to note that the notion of 'alphabetical' order depends on the locale, for instance in Estonian apparently 'z' sorts before 't' https://www.unicode.org/cldr/charts/30/collation/et.html so in my locale

> levels(factor(c("zero", "three")))
[1] "three" "zero"

whereas a user with an Estonian encoding would have the levels reversed -- it's always best practice to specify levels!

ADD REPLY • link 4.7 years ago Martin Morgan 25k