Hey,
It seems that your question is more about ggplot2 as opposed to DESeq2. ggplot2 is not, unfortunately, a Bioconductor package.
It may be good to understand what you want to plot against what. For example, MYC expression between your groups, or the expression of all genes between your groups?; or the average of all genes? These are decisions for you to make, I think, based on what information you want to get from the plot.
Learning the input data formats for ggplot2 can take a while, but it becomes easier over time.
If you wanted to plot every gene against every other gene, then we could do it this way (100% reproducible from the code provided - please adapt to your own situation):
generate random data for 20 controls and 20 disease samples
fpkm <- matrix(rexp(20000, rate=.1), ncol=40)
colnames(fpkm) <- paste0('sample', 1:ncol(fpkm))
rownames(fpkm) <- paste0('gene', 1:nrow(fpkm))
group <- c(rep('control', 20), rep('disease', 20))
fpkm[1:5,1:5]
sample1 sample2 sample3 sample4 sample5
gene1 3.400731 5.245857 6.0428353 3.933216059 43.5360782
gene2 14.499884 11.922893 15.7661271 8.216867894 0.4902539
gene3 1.465857 28.600315 12.5097092 0.005078154 2.9992393
gene4 2.968206 12.723561 0.9665945 4.619777547 12.7282447
gene5 1.821001 19.766981 9.1385842 3.951479285 9.5526014
group
[1] "control" "control" "control" "control" "control" "control" "control"
[8] "control" "control" "control" "control" "control" "control" "control"
[15] "control" "control" "control" "control" "control" "control" "disease"
[22] "disease" "disease" "disease" "disease" "disease" "disease" "disease"
[29] "disease" "disease" "disease" "disease" "disease" "disease" "disease"
[36] "disease" "disease" "disease" "disease" "disease"
'melt' data to long format
fpkm.control <- fpkm[,which(group == 'control')]
fpkm.disease <- fpkm[,which(group == 'disease')]
fpkm_melt <- cbind(control = reshape2::melt(fpkm.control), disease = reshape2::melt(fpkm.disease))
head(fpkm_melt)
control.Var1 control.Var2 control.value disease.Var1 disease.Var2
1 gene1 sample1 3.400731 gene1 sample21
2 gene2 sample1 14.499884 gene2 sample21
3 gene3 sample1 1.465857 gene3 sample21
4 gene4 sample1 2.968206 gene4 sample21
5 gene5 sample1 1.821001 gene5 sample21
6 gene6 sample1 11.960041 gene6 sample21
disease.value
1 2.67586111
2 0.03849441
3 4.32136384
4 6.33721978
5 16.20695587
6 12.63470070
generate plot
require(ggplot2)
ggplot(data = fpkm_melt, aes(x = log10(control.value + 1), y = log10(disease.value + 1))) +
geom_point(size = 0.1) +
#Set the size of the plotting window
theme_bw(base_size=24) +
# Modify various aspects of the plot text and legend
theme(
legend.position = 'none',
legend.background = element_rect(),
plot.title = element_text(angle = 0, size = 14, face = 'bold', vjust = 1),
axis.text.x = element_text(angle = 45, size =14, face="bold", hjust = 1.10),
axis.text.y = element_text(angle = 0, size = 14, face = 'bold', vjust = 0.5),
axis.title = element_text(size = 14, face = 'bold'),
# Legend
legend.key = element_blank(), # removes the border
legend.key.size = unit(1, 'cm'), # Sets overall area/size of the legend
legend.text = element_text(size = 12), # Text size
title = element_text(size=12)) + # Title text size
# Set x- and y-axes labels
xlab(bquote(log[10]~Control~" + 1")) +
ylab(bquote(log[10]~Disease~" + 1")) +
labs(
title = 'Mi tÃtulo',
subtitle = 'Hola de nuevo',
caption = '...y otra vez')
For follow-up questions, it may be better to perform a search via a search engine, e.g., for tweaking the plot. For other general bioinformatics queries, you could try Biostars.
Kevin
Thank you Kevin for your detailed response! It was really helpful. I was able to plot my data using your instructions. I do want to plot the expression of all the genes between my groups, but I think I want the average of the control replicates together and the average of the RNAi replicates and then plot the data using those averages. Thank you again for your help.