EnhancedVolcano results in an empty plot
1
0
Entering edit mode
Patadù ▴ 20
@ff20d7e7
Last seen 5 months ago
The Netherlands

Hi all,

I am trying to use the EnhancedVolcano package to plot the results of my DEGs from DESeq2. I have been reading this guide but when using EnhancedVolcano function, the plot won't show. The following is the code I am using:

myCondition <- read.csv("C:/Users/Luigi/Desktop/WUR/R_Directory/Excel_v2/myCondition/myCondition_CTRL_XS.csv", header = TRUE, sep = ";")
View(myCondition)

HitCount <- read.csv("C:/Users/Luigi/Desktop/WUR/R_Directory/Excel_v2/HitCounts/HitCounts_30mpi_CTRL_XS.csv", header = TRUE, sep = ";")
View(HitCount)

rownames(HitCount) <- HitCount[,1]
HitCount <- HitCount[,-1]
head(HitCount)

all(colnames(HitCount) == myCondition$SampleID)

str(HitCount)

dds <- DESeqDataSetFromMatrix(countData = HitCount,
                              colData = myCondition,
                              design = ~ Condition)

dds <- DESeq(dds)
res <- results(dds)
res.df <- as.data.frame(res)

EnhancedVolcano(res.df, lab = rownames(res.df), 
                x = "log2FoldChange", 
                y = "pvalue")

And this is an extract from my res.df:

               baseMean log2FoldChange      lfcSE          stat        pvalue          padj
VDAG_00001   549.046166  -0.0092977503 0.12107151  -0.076795523  9.387862e-01  9.545491e-01
VDAG_00002  1625.449627   0.1763897241 0.08555426   2.061729317  3.923351e-02  6.284944e-02
VDAG_00003  1188.308547   1.8559310919 0.09512573  19.510295447  8.976185e-85  2.053759e-83

As you can see, in the first column is missing the header Gene_ID. I have tried to retrive the gene symbol using this code, but apparently VDAGs aren't either "ENTREZ" or "ENSEMBL" IDs. I did use AnnotationDd and made my own database, org.Vdahliae.eg.db.

Do you have any suggestions on how to modify the code and plot the Volcano plot?

Thanks

EnhancedVolcano • 794 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 10 hours ago
United States

Works for me. Here is a reproducible example.

> library(DESeq2)
> z <- makeExampleDESeqDataSet()
> dds <- DESeq(z)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
> res <- results(dds)
> res.df <- as.data.frame(res)
> EnhancedVolcano(res.df, lab = rownames(res.df), x = "log2FoldChange", y = "pvalue")
## OR, using the results object directly works as well
> EnhancedVolcano(res, x = "log2FoldChange", y = "pvalue", lab = "rownames")
0
Entering edit mode

Hi James,

Thanks for your reply. Actually, I have tried again to run the Volcano Plot. This time I first saved the results from the dds then run alone the code you have posted and I get a weird result (it is my first time running EnhancedVolcano).

Volcano Plot

This is the code I have used:

setwd("C:/Users/Luigi/Desktop/WUR/R_Directory") #set your path to your folder here
getwd()
dir() #shows you the files in your folder
rm(list = ls()) #removes previous imported data from your environment to not confuse parameters

if (!requireNamespace('BiocManager', quietly = TRUE))
  install.packages('BiocManager')

BiocManager::install('EnhancedVolcano')

library(EnhancedVolcano)


vol <- read.csv("C:/Users/Luigi/Desktop/WUR/R_Directory/Excel_v2/DESeq_Results/VDAGs_CTRL_SXM_30mpi_padj.csv", header = TRUE, sep = ";")
View(vol)

vol$log2FoldChange <- as.numeric(vol$log2FoldChange)
vol$padj <- as.numeric(vol$padj)


vol.df <- as.data.frame(vol)
EnhancedVolcano(vol.df, lab = rownames(vol.df), x = "log2FoldChange", y = "padj")

Did I made an error in the code or is it just that no DEGs are in my dataset?

Moreover, in the previous code I have posted I think I might have a problem with:

rownames(HitCount) <- HitCount[,1]
HitCount <- HitCount[,-1]
head(HitCount)

Which removed the header from the first column and thus having problems in plotting the Volcano Plot.

ADD REPLY
0
Entering edit mode

I can't diagnose much, because I don't have your data. But there are things you can do!

summary(vol.df$log2FoldChange)
## I wouldn't use FDR values!
summary(vol.df$padj)
## but instead would use raw p-values
summary(vol.df$pvalue)

When you get weird plotting results it's often most useful to check the data yourself, as nobody else can do that for you.

ADD REPLY
0
Entering edit mode

Thanks again. I have run:

summary(vol.df$pvalue)

and I get this as result:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   0.000   0.003   0.007   0.011   0.032    4790

To be honest, I do not know how to interpret this. The only thing that pop-ups to my eye is NA's 4790. That's very weird because before saving my results I have run na.omit(). Also, if I just sort my file, the there are no NA's but just a hundred of values equal to 0, or with a p-value very low. And, when I run the Volcano plot, I get this warning message:

One or more p-values is 0. Converting to 10^-1 * current lowest non-zero p-value... 
ADD REPLY
0
Entering edit mode

It's not uncommon for DESeq2 to produce NA p-values. Also there is almost no reason to ever 'save your results' as part of an analysis, if by that you mean that you run a bunch of code and then output the results in a CSV file or whatever that you plan on reading back in at a later date.

There are reasons to save intermediate R objects if they are computationally expensive to produce, but that is almost never the case with something small like a bulk RNA-Seq analysis. If I have some long-running process that will make re-running some code painfully slow, I will save the results of that step to speed things up. But in general I don't save anything but the code, which can be re-run at any time (and which will be re-run many times in the course of doing the analysis).

And if you are saving intermediate data, you should be using either Rdata or RDS files rather than converting to some non-R format, where you might end up with unexpected conversion issues (like converting a bunch of really small p-values to zeros, as it appears happened here).

You should also not have a maximum p-value of 0.032, but instead it should be 1 or very close to it. That looks to me like you are filtering the results, which is not what you want to be doing. Getting the volcano plot should be a very simple thing:

dds <- DESeq(<args go here>)
res <- results(dds)
EnhancedVolcano(res, x = "logFoldChange", y = "pvalue", lab = "rownames")

Three lines of code and you are there. Unless you have the expertise, you shouldn't (and don't need to) do anything more than this.

ADD REPLY
0
Entering edit mode

I have tried to make the Volcano plot also after performing the dds analysis but the plot is empty. I do believe it is because I had to use this piece of code:

rownames(HitCount) <- HitCount[,1]
HitCount <- HitCount[,-1]
head(HitCount) 

If I do not use it, the dds gives me an error and the analysis stops. This is because instead of reading the value in the second column, it read the value (Gene_IDs) in the first column. I do not know if there is a way of fixing it without modifying the basic dds.

Regarding the filtering you mentioned, in the first post I wrote the whole code I used. I did not use filtering in my code, as I just followed the DESeq2 vignette and some videos on internet.

ADD REPLY
0
Entering edit mode

Hi James, I do not know if it might be of interest for you, but in the end I could plot the Volcano plot. I did have to use the lfcShrink(). I guess in this way the fold change differences fitted on the x axis.

Also, in doing so, my summary(vol.df$pvalue) have a max of 0.992

ADD REPLY

Login before adding your answer.

Traffic: 619 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6