Plotting FPKM
Entering edit mode
Gime ▴ 10
Last seen 15 days ago
United States

Hello, I've used the fpkm() function of the DESeq2 package to get a matrix with fpkm values. The rows are the different genes and the columns are the different samples. There are 10 columns (5 are Control replicates and 5 RNAi replicates). I want to do a scatter plot of the log10 FPKM, with the x-axis is RNAi and y-axis is Control. Does anyone have any recommendations on how to do this plot? Also, do I need to average all the replicates for Control together and all the replicates for RNAi together before making the scatter plot?

I'm new to plotting data with ggplot and using R/RStudio in general, but here is the code I have so far. I tried to do a ggplot, but it just gives me one dot at zero value. Therefore, I'm guessing I'm making a mistake somewhere in my code.

mcols(dds)$basepairs <- sortfeatureCounts[, "Length"]
fpkm_table <- fpkm(dds)

log10_fpkm = log10(fpkm_table+1)
log10_fpkmData = data.frame(log10_fpkm)

ggplot(log10_fpkmData,aes(x="RNAi",y="Control")) + geom_point()
ggplots2 DESeq2 fpkm() • 124 views
Entering edit mode
Last seen 16 minutes ago
Naas, Republic of Ireland


It seems that your question is more about ggplot2 as opposed to DESeq2. ggplot2 is not, unfortunately, a Bioconductor package.

It may be good to understand what you want to plot against what. For example, MYC expression between your groups, or the expression of all genes between your groups?; or the average of all genes? These are decisions for you to make, I think, based on what information you want to get from the plot.

Learning the input data formats for ggplot2 can take a while, but it becomes easier over time.

If you wanted to plot every gene against every other gene, then we could do it this way (100% reproducible from the code provided - please adapt to your own situation):

generate random data for 20 controls and 20 disease samples

fpkm <- matrix(rexp(20000, rate=.1), ncol=40)
colnames(fpkm) <- paste0('sample', 1:ncol(fpkm))
rownames(fpkm) <- paste0('gene', 1:nrow(fpkm))
group <- c(rep('control', 20), rep('disease', 20))
        sample1   sample2    sample3     sample4    sample5
gene1  3.400731  5.245857  6.0428353 3.933216059 43.5360782
gene2 14.499884 11.922893 15.7661271 8.216867894  0.4902539
gene3  1.465857 28.600315 12.5097092 0.005078154  2.9992393
gene4  2.968206 12.723561  0.9665945 4.619777547 12.7282447
gene5  1.821001 19.766981  9.1385842 3.951479285  9.5526014

 [1] "control" "control" "control" "control" "control" "control" "control"
 [8] "control" "control" "control" "control" "control" "control" "control"
[15] "control" "control" "control" "control" "control" "control" "disease"
[22] "disease" "disease" "disease" "disease" "disease" "disease" "disease"
[29] "disease" "disease" "disease" "disease" "disease" "disease" "disease"
[36] "disease" "disease" "disease" "disease" "disease"

'melt' data to long format

fpkm.control <- fpkm[,which(group == 'control')]
fpkm.disease <- fpkm[,which(group == 'disease')]
fpkm_melt <- cbind(control = reshape2::melt(fpkm.control), disease = reshape2::melt(fpkm.disease))
  control.Var1 control.Var2 control.value disease.Var1 disease.Var2
1        gene1      sample1      3.400731        gene1     sample21
2        gene2      sample1     14.499884        gene2     sample21
3        gene3      sample1      1.465857        gene3     sample21
4        gene4      sample1      2.968206        gene4     sample21
5        gene5      sample1      1.821001        gene5     sample21
6        gene6      sample1     11.960041        gene6     sample21
1    2.67586111
2    0.03849441
3    4.32136384
4    6.33721978
5   16.20695587
6   12.63470070

generate plot

ggplot(data = fpkm_melt, aes(x = log10(control.value + 1), y = log10(disease.value + 1))) +
  geom_point(size = 0.1) +

  #Set the size of the plotting window
  theme_bw(base_size=24) +

  # Modify various aspects of the plot text and legend
    legend.position = 'none',
    legend.background = element_rect(),
    plot.title = element_text(angle = 0, size = 14, face = 'bold', vjust = 1),
    axis.text.x = element_text(angle = 45, size =14, face="bold", hjust = 1.10),
    axis.text.y = element_text(angle = 0, size = 14, face = 'bold', vjust = 0.5),
    axis.title = element_text(size = 14, face = 'bold'),
    # Legend
    legend.key = element_blank(),  # removes the border
    legend.key.size = unit(1, 'cm'),  # Sets overall area/size of the legend
    legend.text = element_text(size = 12),  # Text size
    title = element_text(size=12)) +  # Title text size

  # Set x- and y-axes labels
  xlab(bquote(log[10]~Control~" + 1")) +
  ylab(bquote(log[10]~Disease~" + 1")) +

    title = 'Mi título',
    subtitle = 'Hola de nuevo',
    caption = '...y otra vez')


For follow-up questions, it may be better to perform a search via a search engine, e.g., for tweaking the plot. For other general bioinformatics queries, you could try Biostars.


Entering edit mode

Thank you Kevin for your detailed response! It was really helpful. I was able to plot my data using your instructions. I do want to plot the expression of all the genes between my groups, but I think I want the average of the control replicates together and the average of the RNAi replicates and then plot the data using those averages. Thank you again for your help.


Login before adding your answer.

Traffic: 440 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6