I have a time series RNA-seq data for two different genotypes, and I did a Likelihood ratio test in DEseq2 to see whether there are genes showing different expression profiles between the two genotypes across time points (as described in https://f1000research.com/articles/4-1070, the last section: Time Course Experiment). So I did the following:
sampleTable = data.frame(sampleName = files.htseq,fileName =
files.htseq,time = time, strain = genotype)
library("DESeq2")
ddsHTSeq = DESeqDataSetFromHTSeqCount(sampleTable =sampleTable,
directory = dir.htseq,
design= ~ time + strain + strain:time)
dds.LRT = DESeq(ddsHTSeq, test="LRT", reduced = ~ time +strain)
res.LRT = results(dds.LRT)
To make it more clear, the sample table looks like the following:
sampleTable
sampleName fileName time strain
1 A5967.htseq A5967.htseq T0 nc95
2 A5968.htseq A5968.htseq T0 nc95
3 A5969.htseq A5969.htseq T0 nc95
4 A5973.htseq A5973.htseq T2 nc95
5 A5974.htseq A5974.htseq T2 nc95
6 A5975.htseq A5975.htseq T2 nc95
7 A5979.htseq A5979.htseq T6 nc95
8 A5980.htseq A5980.htseq T6 nc95
9 A5985.htseq A5985.htseq T24 nc95
10 A5986.htseq A5986.htseq T24 nc95
11 A5987.htseq A5987.htseq T24 nc95
12 A5988.htseq A5988.htseq T0 nc95_nic
13 A5989.htseq A5989.htseq T0 nc95_nic
14 A5994.htseq A5994.htseq T2 nc95_nic
15 A5996.htseq A5996.htseq T2 nc95_nic
16 A6000.htseq A6000.htseq T6 nc95_nic
17 A6001.htseq A6001.htseq T6 nc95_nic
18 A6006.htseq A6006.htseq T24 nc95_nic
19 A6007.htseq A6007.htseq T24 nc95_nic
Then, as described in the RNA-seq workflow in the link I provided above, I looked at the heatmap of GLM coefficients, excluding the intercept and strainnc95_nic columns (col1 and col5, respectively)
betas = coef(dds.LRT)
> library("pheatmap")
> topGenes = head(order(res.LRT$padj),200)
> mat = betas[topGenes, -c(1,5)]
> thr = 3
> mat[mat < -thr] <- -thr
> mat[mat > thr] <- thr pheatmap(mat,
> breaks=seq(from=-thr, to=thr, length=101),cluster_col=FALSE)
Then I get the following heatmap. It is interesting that, genes that are red for the first three columns are blue for the last three columns, and the absolute value (as encoded by the shade of the color) of the first three coefficients positively correlates with the last three coefficients. However, what can I make of these observations? What does it really mean (if any)? (It seems like the image doesn't show up... it can be accessed here https://ibb.co/3Yhx9Ph )
Hi, thanks for reply! Sorry for my late followup. So I looked at the plotCounts, but I was wondering why it puts T6 after T24 on the x-axis(image can be seen here https://ibb.co/hX0TTFL)
The code used:
This is a base R point. R orders factors alphabetically unless you specify otherwise. See the "Note on factor levels" section of the DESeq2 vignette.
It's 'fun' to note that the notion of 'alphabetical' order depends on the locale, for instance in Estonian apparently 'z' sorts before 't' https://www.unicode.org/cldr/charts/30/collation/et.html so in my locale
whereas a user with an Estonian encoding would have the levels reversed -- it's always best practice to specify levels!