Bioconductor vignette on Time course experiments only describes how to plot the the gene with the smallest p-value. How can I plot specific genes of interest instead?
2
0
Entering edit mode
moldach ▴ 20
@moldach-8829
Last seen 2.5 years ago
Canada/Montreal/Douglas Mental Health I…

The Bioconductor website offers an excellent RNA-Seq workflow vignette on Time-Course experiments however the ggplot2 only selects the gene with the smallest p-value, using the which.min() function:

data <- plotCounts(ddsTC, which.min(resTC$padj), intgroup=c("minute","strain"), returnData=TRUE) What I want to do is to either: (A) make a plot showing multiple genes of interest (e.g. 4 genes from a gene family, in my case circadian genes) (B) Ideally, create multi-panel plots where each plot is a gene of specific interest (e.g. isogroup298, isogroup4598) Any help is greatly appreciated! timecourse time course rna-seq deseq2 ggplot2 • 2.2k views ADD COMMENT 1 Entering edit mode @james-w-macdonald-5106 Last seen 1 hour ago United States Right above the section you are quoting from is this: A quick way to visualize the counts for a particular gene is to use the plotCounts function that takes as arguments the DESeqDataSet, a gene name, and the group over which to plot the counts (Figure below). topGene <- rownames(res)[which.min(res$padj)]
plotCounts(dds, gene=topGene, intgroup=c("dex"))


So if you use plotCounts() with returnData = TRUE, then you can get any gene (or set of genes) that you want. To do multi-panel plots you would just (I'm guessing here) rbind()the results from plotCounts() and then you would probably have to add a column of gene names, and then add a facet_wrap(~gene) to the call to ggplot2().

But ggplot2 is an entirely different beast, and a CRAN package to boot, so you may need to use your mad google skillz and this website to get farther with that.

ADD COMMENT
0
Entering edit mode
moldach ▴ 20
@moldach-8829
Last seen 2.5 years ago
Canada/Montreal/Douglas Mental Health I…

I'm still not understanding something very basic here.

topGene <- rownames(res)[which.min(res$padj)] plotCounts(dds, gene=topGene, intgroup=c("dex") The code above creates a plot with a figure title saying "isogroup3584" - great! But that is only for the lowest p-value. Let's say instead I want to plot isogroup4222 - how would I do that? InterestingGene <- rownames(res)[SOMETHINGHERE?(res$padj)]
ADD COMMENT
2
Entering edit mode

If the gene name is isogroup4222, then you use that! I think you are getting hung up on the way the gene name is being selected. If you type topGene at the R prompt, it should return "isogroup3584". As an example, from the workflow:

> topGene
[1] "ENSG00000152583"

I could do either

plotCounts(dds, topGene, "dex")

plotCounts(dds, "ENSG00000152583", "dex")

and I would get the same exact results. If you use returnData = TRUE, you get

> plotCounts(dds, topGene, intgroup = "dex", returnData = TRUE)
count   dex
SRR1039508   61.06772 untrt
SRR1039509 2276.86214   trt
SRR1039512   84.43486 untrt
SRR1039513 1749.61320   trt
SRR1039516   85.41333 untrt
SRR1039517 1375.73220   trt
SRR1039520   86.29695 untrt
SRR1039521 2264.09778   trt

Now to get entirely explicit about what I was saying before, say we want the top 5 genes.

> topGenes <- rownames(res)[order(res$padj)][1:5] > topGenes [1] "ENSG00000152583" "ENSG00000165995" "ENSG00000101347" "ENSG00000120129" [5] "ENSG00000189221" Don't get hung up on how I got that - it's just a vector of gene names. The rest of this is homework for you, to go through and figure out what I am doing. > z <- lapply(topGenes, function(x) plotCounts(dds, x, c("dex", "cell"),returnData = TRUE)) > z [[1]] count dex cell SRR1039508 61.06772 untrt N61311 SRR1039509 2276.86214 trt N61311 SRR1039512 84.43486 untrt N052611 SRR1039513 1749.61320 trt N052611 SRR1039516 85.41333 untrt N080611 SRR1039517 1375.73220 trt N080611 SRR1039520 86.29695 untrt N061011 SRR1039521 2264.09778 trt N061011 [[2]] count dex cell SRR1039508 78.65189 untrt N61311 SRR1039509 816.19643 trt N61311 SRR1039512 125.13055 untrt N052611 SRR1039513 1225.77469 trt N052611 SRR1039516 58.24107 untrt N080611 SRR1039517 511.56602 trt N080611 SRR1039520 104.75958 untrt N061011 SRR1039521 1044.42302 trt N061011 [[3]] count dex cell SRR1039508 1594.799 untrt N61311 SRR1039509 19110.783 trt N61311 SRR1039512 1779.241 untrt N052611 SRR1039513 29392.168 trt N052611 SRR1039516 1357.415 untrt N080611 SRR1039517 12649.920 trt N080611 SRR1039520 1828.301 untrt N061011 SRR1039521 33918.469 trt N061011 [[4]] count dex cell SRR1039508 650.1376 untrt N61311 SRR1039509 5602.1363 trt N61311 SRR1039512 1028.9140 untrt N052611 SRR1039513 6777.5675 trt N052611 SRR1039516 693.3928 untrt N080611 SRR1039517 4526.4721 trt N080611 SRR1039520 755.2959 untrt N061011 SRR1039521 7242.3189 trt N061011 [[5]] count dex cell SRR1039508 431.3123 untrt N61311 SRR1039509 4137.0071 trt N61311 SRR1039512 488.0004 untrt N052611 SRR1039513 4626.9940 trt N052611 SRR1039516 554.9841 untrt N080611 SRR1039517 4563.6405 trt N080611 SRR1039520 247.0305 untrt N061011 SRR1039521 3689.1692 trt N061011 > for(i in 1:5) z[[i]]$gene <- rep(topGenes[i], 8)
> z <- do.call(rbind, z)
> ggplot(z, aes(dex, count, colour = cell)) + scale_y_log10() + geom_point(position = position_jitter(width = 0.1, height = 0), size = 3) + facet_wrap(~gene)



ADD REPLY
0
Entering edit mode

Thank you very much James for your well written and well described answer. I don't have much computational support in my lab so figuring these things out can often be challenging - even with Google.

I wracked my brain for nearly two weeks on your "homework" but to no avail. I thought I may be able to do something with grep like:

interestingGene <- res[grep("^isogroup3676", rownames(res)),]

But this grabbed me anything starting with isogroup3676 (isogroup367601, isogroup 36769, etc.) Many Google searches later and I was still no closer to sub-setting my data. Then I had the idea that I could look at DESeq2 contrasts results, find out how many DEGs there were then make alterations by upping the following code to 75 (for example):

topGenes <- rownames(res)[order(res$padj)][1:75] The problem is that ggplot can only do a max of 1:25. So next I though maybe I run the analysis multiple times: 1) topGenes <- rownames(res)[order(res$padj)][1:25]

2) topGenes <- rownames(res)[order(res\$padj)][26:50]

3) etc.

This still isn't ideal though.
I figured out how to subset vectors by gene names by doing the following:

interestingGene <- res[row.names(res) == "isogroup3676"]

Unfortunately, I can't figure out how to make it work in the bioconductor example they used for which.min()

data <- plotCounts(ddsTC, interestingGene),
intgroup=c("minute","strain"), returnData=TRUE)
ggplot(data, aes(x=minute, y=count, color=strain, group=strain)) +
geom_point() + stat_smooth(se=FALSE,method="loess") +  scale_y_log10()
ADD REPLY

Login before adding your answer.

Traffic: 293 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6