Search
Question: Volcano plot labeling troubles
0
gravatar for mccallionc403
3.1 years ago by
United States
mccallionc4030 wrote:

Hello,

I am trying to make a volcano plot, but of Log Fold Change vs FDR adjusted p-values.  So far, I've managed to make a decent plot, but I'm having trouble figuring out how to label the genes that are differentially expressed, wether as the top, say 30, genes, or all of the genes above a certain threshold, and identify them on the graph.  My progress so far goes thusly (Also, if anyone knows how to get limma to plot a graph with these axis instead of log fold change vs log odds, that would be greatly appreciated as well):

source ("http://bioconductor.org/biocLite.R")#f o r
#downloading biocLite
biocLite()#for installing bioconductor packages
biocLite("BiocUpgrade")#use latest version
biocLite("limma")#linear fits
biocLite("GEOquery")#downloading data sets
biocLite("statmod")#
biocLite("affyPLM")#normalizing ExpressionSets
#begin loading libraries
library("limma")
library("GEOquery")
library("statmod")
library("affyPLM")
#libraries loaded
gse44563 <- getGEO("GSE44563", GSEMatrix = TRUE)
#downloading series record
#44563 with any attached data
eset44563 = gse44563[[1]]#extract attached eSet (data)
y <- log2(exprs(eset44563))
design <- model.matrix(~source_name_ch1,data=pData(eset44563))
preFit <- lmFit(y, design)
eset <- normalize.ExpressionSet.quantiles(eset44563, transfn="none" )
#quantile normalization of eset44563
#8
y <- log2(exprs(eset))#log2 transformation of eset
design <- model.matrix(~source_name_ch1,data=pData(eset44563))#making
#design matrix
#describes what array looked like
#allows for a linear fit , using matrix algebra
fit <- lmFit(y, design)#linear fit
plotMA(fit)
IsExpressed <- fit$Amean>3#is mean expression
#of each gene above 3?
efit <- eBayes(fit[IsExpressed,], trend=TRUE, robust=TRUE)#Empi r ical
#Bayes adjustment
plotMA(efit)
write.csv((topTable(efit, adjust = "fdr", n = 54675)), "possible5.csv")
#write table of genes,
#sorted according to False Discovery Rate adjusted
#p-value to file
tableTop <- topTable(efit, adjust = "fdr", n = 42471)
plot(tableTop$logFC, -log10(tableTop$adj.P.Val), xlim = c(-1, 2), ylim = c(0, 6), xlab = "log2 fold change", ylab = "FDR adjusted p-value")

 

My issue right now is trying to figure out how to label the points that are meaningfully foldchanged, for now, without using limma, since it appears that limma only uses logodds as its measure of statistical significance.

Any help, or even just pointers to pre-existing explanations would be immensly appreciated.  Thank you

Also, here's my SessionInfo():

R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] splines   parallel  stats     graphics 
[5] grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] BiocInstaller_1.16.0  affyPLM_1.40.1       
 [3] preprocessCore_1.26.1 gcrma_2.36.0         
 [5] statmod_1.4.20        GEOquery_2.30.1      
 [7] limma_3.20.9          affy_1.42.3          
 [9] Biobase_2.24.0        BiocGenerics_0.10.0  

loaded via a namespace (and not attached):
[1] affyio_1.32.0     Biostrings_2.32.1
[3] IRanges_1.22.10   RCurl_1.95-4.3   
[5] stats4_3.1.1      tools_3.1.1      
[7] XML_3.98-1.1      XVector_0.4.0    
[9] zlibbioc_1.10.0
ADD COMMENTlink modified 3 months ago by Gordon Smyth32k • written 3.1 years ago by mccallionc4030
1
gravatar for James W. MacDonald
3.1 years ago by
United States
James W. MacDonald45k wrote:
ind <- tableTop$adj.P.Val < 0.05
points(tableTop$logFC[ind], -log10(tableTop$adj.P.Val)[ind], pch = 19, col = "red")
ADD COMMENTlink written 3.1 years ago by James W. MacDonald45k
1
gravatar for Gordon Smyth
3.1 years ago by
Gordon Smyth32k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth32k wrote:

I understand why you might be tempted to plot logFC vs FDR but, in my opinion, this isn't the right way to make a volcano plot.

First, if a volcano plot is made with p-values, then it should use raw p-values rather than FDR adjusted values. This is because FDR adjusting loses some information (e.g., distinct p-values may be mapped to the same FDR value).

Second, there is a reason why limma plots log-odds instead of log-p-values. The log-odds are exactly equivalent to p-values for your analysis (same ordering) but on a scale that is arguably more meaningful than log-p. The log-odds also has the effect of global adjustment, like FDR, but without losing information.

Instead of doing your own coding, you might consider using the plot that limma provides

Later edit:

By popular demand, volcanaplot() now offers either p-values or log-odds as options for the y-axis and the default has been changed to p-values.

ADD COMMENTlink modified 3 months ago • written 3.1 years ago by Gordon Smyth32k
1
gravatar for Gordon Smyth
3.1 years ago by
Gordon Smyth32k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth32k wrote:

The quantile normalization step you are doing is not necessary. This data has already been RMA processed, according to the GEO webpage, meaning that it can already been quantile normalized at the probe level. By re-normalizing at the probe-set level, you are replacing the normalization already done with something not quite as good.

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Gordon Smyth32k
0
gravatar for mccallionc403
3.1 years ago by
United States
mccallionc4030 wrote:

Thank you so much

ADD COMMENTlink written 3.1 years ago by mccallionc4030
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 134 users visited in the last hour