Question: Bioconductor vignette on Time course experiments only describes how to plot the the gene with the smallest p-value. How can I plot specific genes of interest instead?
0
gravatar for moldach
4.2 years ago by
moldach20
Canada/Montreal/Douglas Mental Health Institute
moldach20 wrote:

The Bioconductor website offers an excellent RNA-Seq workflow vignette on Time-Course experiments however the ggplot2 only selects the gene with the smallest p-value, using the which.min() function:
 

data <- plotCounts(ddsTC, which.min(resTC$padj), 
                   intgroup=c("minute","strain"), returnData=TRUE)


What I want to do is to either:

(A) make a plot showing multiple genes of interest (e.g. 4 genes from a gene family, in my case circadian genes)

(B) Ideally, create multi-panel plots where each plot is a gene of specific interest (e.g. isogroup298, isogroup4598)

Any help is greatly appreciated!
 

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by moldach20
Answer: Bioconductor vignette on Time course experiments only describes how to plot the
1
gravatar for James W. MacDonald
4.2 years ago by
United States
James W. MacDonald51k wrote:

Right above the section you are quoting from is this:

A quick way to visualize the counts for a particular gene is to use the plotCounts function that takes as arguments the DESeqDataSet, a gene name, and the group over which to plot the counts (Figure below).

topGene <- rownames(res)[which.min(res$padj)]
plotCounts(dds, gene=topGene, intgroup=c("dex"))

So if you use plotCounts() with returnData = TRUE, then you can get any gene (or set of genes) that you want. To do multi-panel plots you would just (I'm guessing here) rbind()the results from plotCounts() and then you would probably have to add a column of gene names, and then add a facet_wrap(~gene) to the call to ggplot2().

But ggplot2 is an entirely different beast, and a CRAN package to boot, so you may need to use your mad google skillz and this website to get farther with that.

ADD COMMENTlink written 4.2 years ago by James W. MacDonald51k
Answer: Bioconductor vignette on Time course experiments only describes how to plot the
0
gravatar for moldach
4.2 years ago by
moldach20
Canada/Montreal/Douglas Mental Health Institute
moldach20 wrote:

I'm still not understanding something very basic here.
 

topGene <- rownames(res)[which.min(res$padj)]
plotCounts(dds, gene=topGene, intgroup=c("dex")

 

The code above creates a plot with a figure title saying "isogroup3584" - great!
But that is only for the lowest p-value.

Let's say instead I want to plot isogroup4222 - how would I do that?

InterestingGene <- rownames(res)[SOMETHINGHERE?(res$padj)]
ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by moldach20
2

If the gene name is isogroup4222, then you use that! I think you are getting hung up on the way the gene name is being selected. If you type topGene at the R prompt, it should return "isogroup3584". As an example, from the workflow:

> topGene
[1] "ENSG00000152583"

I could do either

plotCounts(dds, topGene, "dex")

plotCounts(dds, "ENSG00000152583", "dex")

and I would get the same exact results. If you use returnData = TRUE, you get

> plotCounts(dds, topGene, intgroup = "dex", returnData = TRUE)
                count   dex
SRR1039508   61.06772 untrt
SRR1039509 2276.86214   trt
SRR1039512   84.43486 untrt
SRR1039513 1749.61320   trt
SRR1039516   85.41333 untrt
SRR1039517 1375.73220   trt
SRR1039520   86.29695 untrt
SRR1039521 2264.09778   trt

Now to get entirely explicit about what I was saying before, say we want the top 5 genes.

> topGenes <- rownames(res)[order(res$padj)][1:5]
> topGenes
[1] "ENSG00000152583" "ENSG00000165995" "ENSG00000101347" "ENSG00000120129"
[5] "ENSG00000189221"

Don't get hung up on how I got that - it's just a vector of gene names. The rest of this is homework for you, to go through and figure out what I am doing.

> z <- lapply(topGenes, function(x) plotCounts(dds, x, c("dex", "cell"),returnData = TRUE))
> z
[[1]]
                count   dex    cell
SRR1039508   61.06772 untrt  N61311
SRR1039509 2276.86214   trt  N61311
SRR1039512   84.43486 untrt N052611
SRR1039513 1749.61320   trt N052611
SRR1039516   85.41333 untrt N080611
SRR1039517 1375.73220   trt N080611
SRR1039520   86.29695 untrt N061011
SRR1039521 2264.09778   trt N061011

[[2]]
                count   dex    cell
SRR1039508   78.65189 untrt  N61311
SRR1039509  816.19643   trt  N61311
SRR1039512  125.13055 untrt N052611
SRR1039513 1225.77469   trt N052611
SRR1039516   58.24107 untrt N080611
SRR1039517  511.56602   trt N080611
SRR1039520  104.75958 untrt N061011
SRR1039521 1044.42302   trt N061011

[[3]]
               count   dex    cell
SRR1039508  1594.799 untrt  N61311
SRR1039509 19110.783   trt  N61311
SRR1039512  1779.241 untrt N052611
SRR1039513 29392.168   trt N052611
SRR1039516  1357.415 untrt N080611
SRR1039517 12649.920   trt N080611
SRR1039520  1828.301 untrt N061011
SRR1039521 33918.469   trt N061011

[[4]]
               count   dex    cell
SRR1039508  650.1376 untrt  N61311
SRR1039509 5602.1363   trt  N61311
SRR1039512 1028.9140 untrt N052611
SRR1039513 6777.5675   trt N052611
SRR1039516  693.3928 untrt N080611
SRR1039517 4526.4721   trt N080611
SRR1039520  755.2959 untrt N061011
SRR1039521 7242.3189   trt N061011

[[5]]
               count   dex    cell
SRR1039508  431.3123 untrt  N61311
SRR1039509 4137.0071   trt  N61311
SRR1039512  488.0004 untrt N052611
SRR1039513 4626.9940   trt N052611
SRR1039516  554.9841 untrt N080611
SRR1039517 4563.6405   trt N080611
SRR1039520  247.0305 untrt N061011
SRR1039521 3689.1692   trt N061011

> for(i in 1:5) z[[i]]$gene <- rep(topGenes[i], 8)
> z <- do.call(rbind, z)
> ggplot(z, aes(dex, count, colour = cell)) + scale_y_log10() + geom_point(position = position_jitter(width = 0.1, height = 0), size = 3) + facet_wrap(~gene)

 

ADD REPLYlink written 4.2 years ago by James W. MacDonald51k

Thank you very much James for your well written and well described answer. I don't have much computational support in my lab so figuring these things out can often be challenging - even with Google.

I wracked my brain for nearly two weeks on your "homework" but to no avail. I thought I may be able to do something with grep like:

interestingGene <- res[grep("^isogroup3676", rownames(res)),]

But this grabbed me anything starting with isogroup3676 (isogroup367601, isogroup 36769, etc.)

Many Google searches later and I was still no closer to sub-setting my data.

Then I had the idea that I could look at DESeq2 contrasts results, find out how many DEGs there were then make alterations by upping the following code to 75 (for example):

 

topGenes <- rownames(res)[order(res$padj)][1:75]

The problem is that ggplot can only do a max of 1:25. So next I though maybe I run the analysis multiple times:
1)
topGenes <- rownames(res)[order(res$padj)][1:25]

2) topGenes <- rownames(res)[order(res$padj)][26:50]

3) etc.

This still isn't ideal though.
I figured out how to subset vectors by gene names by doing the following:

interestingGene <- res[row.names(res) == "isogroup3676"]

Unfortunately, I can't figure out how to make it work in the bioconductor example they used for which.min()

data <- plotCounts(ddsTC, interestingGene), 
                   intgroup=c("minute","strain"), returnData=TRUE)
ggplot(data, aes(x=minute, y=count, color=strain, group=strain)) + 
  geom_point() + stat_smooth(se=FALSE,method="loess") +  scale_y_log10()
ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by moldach20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 193 users visited in the last hour