Volcano plot labeling troubles
4
0
Entering edit mode
@mccallionc403-6871
Last seen 7.0 years ago
United States

Hello,

I am trying to make a volcano plot, but of Log Fold Change vs FDR adjusted p-values.  So far, I've managed to make a decent plot, but I'm having trouble figuring out how to label the genes that are differentially expressed, wether as the top, say 30, genes, or all of the genes above a certain threshold, and identify them on the graph.  My progress so far goes thusly (Also, if anyone knows how to get limma to plot a graph with these axis instead of log fold change vs log odds, that would be greatly appreciated as well):

source ("http://bioconductor.org/biocLite.R")#f o r
biocLite()#for installing bioconductor packages
biocLite("limma")#linear fits
biocLite("statmod")#
biocLite("affyPLM")#normalizing ExpressionSets
#begin loading libraries
library("limma")
library("GEOquery")
library("statmod")
library("affyPLM")
#libraries loaded
gse44563 <- getGEO("GSE44563", GSEMatrix = TRUE)
#44563 with any attached data
eset44563 = gse44563[[1]]#extract attached eSet (data)
y <- log2(exprs(eset44563))
design <- model.matrix(~source_name_ch1,data=pData(eset44563))
preFit <- lmFit(y, design)
eset <- normalize.ExpressionSet.quantiles(eset44563, transfn="none" )
#quantile normalization of eset44563
#8
y <- log2(exprs(eset))#log2 transformation of eset
design <- model.matrix(~source_name_ch1,data=pData(eset44563))#making
#design matrix
#describes what array looked like
#allows for a linear fit , using matrix algebra
fit <- lmFit(y, design)#linear fit
plotMA(fit)
IsExpressed <- fit$Amean>3#is mean expression #of each gene above 3? efit <- eBayes(fit[IsExpressed,], trend=TRUE, robust=TRUE)#Empi r ical #Bayes adjustment plotMA(efit) write.csv((topTable(efit, adjust = "fdr", n = 54675)), "possible5.csv") #write table of genes, #sorted according to False Discovery Rate adjusted #p-value to file tableTop <- topTable(efit, adjust = "fdr", n = 42471) plot(tableTop$logFC, -log10(tableTop$adj.P.Val), xlim = c(-1, 2), ylim = c(0, 6), xlab = "log2 fold change", ylab = "FDR adjusted p-value") My issue right now is trying to figure out how to label the points that are meaningfully foldchanged, for now, without using limma, since it appears that limma only uses logodds as its measure of statistical significance. Any help, or even just pointers to pre-existing explanations would be immensly appreciated. Thank you Also, here's my SessionInfo(): R version 3.1.1 (2014-07-10) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] splines parallel stats graphics [5] grDevices utils datasets methods [9] base other attached packages: [1] BiocInstaller_1.16.0 affyPLM_1.40.1 [3] preprocessCore_1.26.1 gcrma_2.36.0 [5] statmod_1.4.20 GEOquery_2.30.1 [7] limma_3.20.9 affy_1.42.3 [9] Biobase_2.24.0 BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] affyio_1.32.0 Biostrings_2.32.1 [3] IRanges_1.22.10 RCurl_1.95-4.3 [5] stats4_3.1.1 tools_3.1.1 [7] XML_3.98-1.1 XVector_0.4.0 [9] zlibbioc_1.10.0 limma volcanoplot • 5.2k views ADD COMMENT 3 Entering edit mode @gordon-smyth Last seen 6 hours ago WEHI, Melbourne, Australia I understand why you might be tempted to plot logFC vs FDR but, in my opinion, this isn't the right way to make a volcano plot. First, if a volcano plot is made with p-values, then it should use raw p-values rather than FDR adjusted values. This is because FDR adjusting loses some information (e.g., distinct p-values may be mapped to the same FDR value). Second, there is a reason why limma plots log-odds instead of log-p-values. The log-odds are exactly equivalent to p-values for your analysis (same ordering) but on a scale that is arguably more meaningful than log-p. The log-odds also has the effect of global adjustment, like FDR, but without losing information. Instead of doing your own coding, you might consider using the plot that limma provides Later edit By popular demand, volcanoplot() now offers either p-values or log-odds as options for the y-axis and the default has been changed to p-values. ADD COMMENT 1 Entering edit mode @james-w-macdonald-5106 Last seen 21 hours ago United States ind <- tableTop$adj.P.Val < 0.05
points(tableTop$logFC[ind], -log10(tableTop$adj.P.Val)[ind], pch = 19, col = "red")
1
Entering edit mode
@gordon-smyth
Last seen 6 hours ago
WEHI, Melbourne, Australia

The quantile normalization step you are doing is not necessary. This data has already been RMA processed, according to the GEO webpage, meaning that it can already been quantile normalized at the probe level. By re-normalizing at the probe-set level, you are replacing the normalization already done with something not quite as good.

0
Entering edit mode
@mccallionc403-6871
Last seen 7.0 years ago
United States

Thank you so much

Traffic: 657 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.