Hello,
I'm trying to plot three different bed files (one with SNP data, another one with deletions and a third one with duplication data), but I cannot manage the legend to contain the values of the three layers unless I put the data altogether in the same file. The problem about combining the three files into a single one is that I'm not able to set ylims for each of the levels of the variable.
This is an example of one of my input files (the one containing the SNP info):
chr10 47000019 47000019 rs150696937 2 + chr11 1017064 1017064 NA 2 + chr11 1017280 1017280 rs199539548 2 + chr11 1017294 1017294 NA 2 + chr11 1017756 1017756 NA 2 + chr13 31898038 31898038 rs200460848 2 + chr13 40298639 40298639 NA 2 + chr13 48996928 48996928 rs530812916 2 + chr13 50204777 50204777 rs117251022 2 + chr14 20216005 20216005 rs566685404 2 + chr14 20404076 20404076 rs114526346 2 + chr21 10944668 10944668 rs138088406 2 +
I'm using the score column to specify the kind of variant type that I want to plot in the following way: "1" = Deletion; "2" = SNP and "3" = Duplication.
The code that I'm using to generate the plot is the following one:
## Load libraries and required databases library(ggbio) data(hg19IdeogramCyto, package = "biovizBase") library(GenomicRanges) hg19 <- keepSeqlevels(hg19IdeogramCyto, paste0("chr", c(1:22, "X", "Y"))) biovizBase::isIdeogram(hg19) data("hg19IdeogramCyto", package = "biovizBase") data("hg19Ideogram", package = "biovizBase")
I use the Bed2GRanges function available at this website: http://davetang.org/muse/2015/02/04/bed-granges/ to convert my bed file into a GRanges object.
# Required Bed2GRanges function # BED to GRanges # # This function loads a BED-like file and stores it as a GRanges object. # The tab-delimited file must be ordered as 'chr', 'start', 'end', 'id', 'score', 'strand'. # The minimal BED file must have the 'chr', 'start', 'end' columns. # Any columns after the strand column are ignored. # # @param file Location of your file # @keywords BED GRanges # @export # @examples # bed_to_granges('my_bed_file.bed') bed_to_granges <- function(file){ df <- read.table(file, header=F, stringsAsFactors=F) if(length(df) > 6){ df <- df[,-c(7:length(df))] } if(length(df)<3){ stop("File has less than 3 columns") } header <- c('chr','start','end','id','score','strand') names(df) <- header[1:length(names(df))] if('strand' %in% colnames(df)){ df$strand <- gsub(pattern="[^+-]+", replacement = '*', x = df$strand) } library("GenomicRanges") if(length(df)==3){ gr <- with(df, GRanges(chr, IRanges(start, end))) } else if (length(df)==4){ gr <- with(df, GRanges(chr, IRanges(start, end), id=id)) } else if (length(df)==5){ gr <- with(df, GRanges(chr, IRanges(start, end), id=id, score=as.character(score))) } else if (length(df)==6){ gr <- with(df, GRanges(chr, IRanges(start, end), id=id, score=as.character(score), strand=strand)) } return(gr) }
I import my bedfile:
## Import bed files as GRanges file SNP <- bed_to_granges("SNPs.bed") seqlengths(SNP) <- seqlengths(hg19Ideogram)[names(seqlengths(SNP))] SNP_dn <- keepSeqlevels(SNP, paste0("chr", c(1:22, "X", "Y")))
I plot my data:
#Plotting SNP_dn according to score column
test <- autoplot(SNP_dn, aes(color = score)) +
scale_color_manual("Variant type",
values = score <- c("black", "red", "blue"),
breaks = c("2","1","3"),
drop = FALSE,
labels = c("SNP", "Deletion", "Duplication")) +
theme(legend.position = "right")
test
Even I specify the option drop = FALSE
, I always miss the levels "Deletion" and "Duplication" in the legend.
Here you have an example of the output plot:
I would like to have a legend that includes the three levels that I have specified with the scale_color_manual () function (i.e. SNP, Deletion, Duplication), even if there are none of them in the bed file.
Thank you very much,
sessionInfo()
R version 3.3.1 (2016-06-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] biovizBase_1.20.0 ggbio_1.20.2 GenomicRanges_1.24.3 GenomeInfoDb_1.8.7 IRanges_2.6.1 [6] S4Vectors_0.10.3 ggplot2_2.1.0 BiocGenerics_0.18.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.7 lattice_0.20-34 Rsamtools_1.24.0 [4] Biostrings_2.40.2 digest_0.6.10 mime_0.5 [7] R6_2.1.3 plyr_1.8.4 chron_2.3-47 [10] acepack_1.3-3.3 RSQLite_1.0.0 BiocInstaller_1.22.3 [13] httr_1.2.1 zlibbioc_1.18.0 GenomicFeatures_1.24.5 [16] data.table_1.9.6 rpart_4.1-10 Matrix_1.2-7.1 [19] labeling_0.3 splines_3.3.1 BiocParallel_1.6.6 [22] AnnotationHub_2.4.2 stringr_1.1.0 foreign_0.8-67 [25] RCurl_1.95-4.8 biomaRt_2.28.0 munsell_0.4.3 [28] shiny_0.14 httpuv_1.3.3 rtracklayer_1.32.2 [31] htmltools_0.3.5 nnet_7.3-12 SummarizedExperiment_1.2.3 [34] gridExtra_2.2.1 interactiveDisplayBase_1.10.3 Hmisc_3.17-4 [37] XML_3.98-1.4 reshape_0.8.5 GenomicAlignments_1.8.4 [40] bitops_1.0-6 RBGL_1.48.1 grid_3.3.1 [43] xtable_1.8-2 GGally_1.2.0 gtable_0.2.0 [46] DBI_0.5-1 magrittr_1.5 scales_0.4.0 [49] graph_1.50.0 stringi_1.1.1 XVector_0.12.1 [52] reshape2_1.4.1 latticeExtra_0.6-28 Formula_1.2-1 [55] RColorBrewer_1.1-2 ensembldb_1.4.7 tools_3.3.1 [58] dichromat_2.0-0 OrganismDbi_1.14.1 BSgenome_1.40.1 [61] Biobase_2.32.0 survival_2.39-5 AnnotationDbi_1.34.4 [64] colorspace_1.2-6 cluster_2.0.4 VariantAnnotation_1.18.7
This question was answered by aosmith in the following link: http://stackoverflow.com/questions/39535088/missing-levels-in-legend-using-autoplot-scale-color-manual-functions.