Question: Combining LD matrix plot with SNP and gene data with ggplot / ggbio tracks() function
gravatar for oswaldo.lorenzo
15 months ago by
oswaldo.lorenzo10 wrote:


Basically, what I'm trying to do is to plot the linkage disequilibrium (LD) in a certain genomic region in a similar way as this website does: The problem is that the figure from this website has a low quality and I need the figure for publication purposes. All the sample files I have used are at the end of the post so that you can reproduce my code.

This is what I have done so far:

data(hg19IdeogramCyto, package = "biovizBase")
data(hg19Ideogram, package = "biovizBase")
data(genesymbol, package = "biovizBase")
hg19 <- keepSeqlevels(hg19IdeogramCyto, paste0("chr", c(1:22, "X", "Y")))

I downloaded the matrix txt file from the LDlink website to use it for my LD plot. I melted the dataframe to obtain the grading colors:

# FIRST PLOT - LD plot for CEU population

#Load r-squared values in Table format
CEU_plot<- read.table("r2_SOD1_matrix.txt", header = TRUE)

#Load chromosomal positions for each rs ID
RS_location <- read.table("SOD1_SNPs.bed", header = FALSE)

#Melt table to be able to make a gradient
CEU_plot_m <- melt(CEU_plot)

# x-axis values - chromosomal position
CEU_plot_m$RS_number <- factor(RS_location[,2], levels = RS_location[,2])

# y-axis - variable
CEU_plot_m$variable <- factor(CEU_plot_m$variable, levels = sort(CEU_plot_m$variable, decreasing = TRUE))

CEU = ggplot(CEU_plot_m, aes(x = RS_number, y = variable)) +
        labs (title = "r2 plot for SOD1 gene in CEU population ",
              x = element_blank (),
              y = "rs IDs") +
        theme(text = element_text(size=8),
              axis.text.x = element_blank(),
              axis.text.y = element_text(hjust=0)) +
        geom_tile(aes(fill = value), colour = "white") +
        scale_fill_gradient(low = "white" ,high = "red")

Then I use ggbio package to generate the third and fourth track of my final plot

wh <- genesymbol[c("SOD1")]
chr_name <- as.vector(slot(wh@seqnames,"values"))
chr_start <- slot(wh@ranges, "start") + 0 - 2000
chr_end <- slot (wh@ranges, "start") + slot (wh@ranges, "width") - 1 + 2000
gene_name <- slot(wh@ranges, "NAMES")

SNP <- bed_to_granges("SOD1_SNPs.bed")
SNP_chr <- slot(SNP@seqnames,"values")
if (chr_name %in% SNP_chr) {
        seqlengths(SNP) <- seqlengths(hg19Ideogram)[names(seqlengths(SNP))]
        SNP_dn <- keepSeqlevels(unique(SNP), chr_name)

SNPs_plot <-  autoplot(SNP_dn,  xlab = chr_name) +
        guides (colour = TRUE) +
        theme(panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              panel.border = element_blank(),
              panel.background = element_blank()) +
        theme(legend.position="none") +
        scale_color_manual(values = score <- c("black")) +
        scale_fill_manual (breaks = score <- c("2"),
                           values= score <- c("black"),
                           name = "Variant type",
                           labels = expression(bold(SNP))) +
        xlim(chr_start,chr_end) +
fixed(SNP_IDs) <- TRUE

GENES_plot <-   autoplot(Homo.sapiens, which = wh, layout = "linear",  xlab = chr_name) +
        guides (colour = FALSE) +
        xlim(chr_start,chr_end) +
fixed(GENES_plot) <- TRUE

# Main tracks plot title
plot_title <- as.expression(bquote('Variation in'~italic(.(gene_name))~'gene'))

# X axis scale according to gene start and gene end
scale_combined <- GRanges(chr_name, IRanges(start = chr_start, end = chr_end))

# Combination of SNPs plot with Genes plot

Gene_SNPs <-     tracks(SNPs = SNP_IDs,
                    heights = c(0.5,2.0),
                    xlim = scale_combined,
                    xlab = paste("\n",chr_name,"\n"),
                    title = plot_title,
           = "grey60") +


The problem comes when I want to combine the CEU plot (first track) with the SNPs (third track) and the Gene (fourth track) plots. I need to combine a ggplot list (CEU) with two GGBio objects (SNPs_plot and GENES_plot). On top of that I would like to generate an additional track between the LD plot and the SNPs_plot that links the discrete values of the x axis from the CEU plot with the SNPs located in a continuous scale from the SNPs_plot.

In summary, this is what I have:

LD Plot:

Combined SNPs data with Gene track:

An this is the final combined plot that I want:

Thank you very much for your help in advance,


These are the original files I'm using:

'r2_SOD1_matrix.txt' input file

'SOD1_SNPs.bed' input file

'bed2granges.R' function

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] scales_0.4.1                            reshape_0.8.6                          
 [3] VariantAnnotation_1.20.2                Rsamtools_1.26.1                       
 [5] SummarizedExperiment_1.4.0              BSgenome.Hsapiens.UCSC.hg19_1.4.0      
 [7] BSgenome_1.42.0                         Biostrings_2.42.1                      
 [9] XVector_0.14.0                          rtracklayer_1.34.1                     
[11] Homo.sapiens_1.3.1                      TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[13]                      GO.db_3.4.0                            
[15] OrganismDbi_1.16.0                      GenomicFeatures_1.26.2                 
[17] AnnotationDbi_1.36.1                    Biobase_2.34.0                         
[19] GenomicRanges_1.26.2                    GenomeInfoDb_1.10.2                    
[21] IRanges_2.8.1                           S4Vectors_0.12.1                       
[23] ggbio_1.22.3                            BiocGenerics_0.20.0                    
[25] ggplot2_2.2.1                          

loaded via a namespace (and not attached):
 [1] httr_1.2.1                    AnnotationHub_2.6.4          
 [3] splines_3.3.2                 Formula_1.2-1                
 [5] shiny_1.0.0                   assertthat_0.1               
 [7] interactiveDisplayBase_1.12.0 latticeExtra_0.6-28          
 [9] RBGL_1.50.0                   yaml_2.1.14                  
[11] RSQLite_1.1-2                 backports_1.0.5              
[13] lattice_0.20-34               biovizBase_1.22.0            
[15] digest_0.6.11                 RColorBrewer_1.1-2           
[17] checkmate_1.8.2               colorspace_1.3-2             
[19] htmltools_0.3.5               httpuv_1.3.3                 
[21] Matrix_1.2-8                  plyr_1.8.4                   
[23] XML_3.98-1.5                  biomaRt_2.30.0               
[25] zlibbioc_1.20.0               xtable_1.8-2                 
[27] BiocParallel_1.8.1            htmlTable_1.8                
[29] tibble_1.2                    nnet_7.3-12                  
[31] lazyeval_0.2.0                survival_2.40-1              
[33] magrittr_1.5                  mime_0.5                     
[35] memoise_1.0.0                 GGally_1.3.0                 
[37] foreign_0.8-67                graph_1.52.0                 
[39] BiocInstaller_1.24.0          tools_3.3.2                  
[41] data.table_1.10.0             stringr_1.1.0                
[43] munsell_0.4.3                 cluster_2.0.5                
[45] ensembldb_1.6.2               grid_3.3.2                   
[47] RCurl_1.95-4.8                dichromat_2.0-0              
[49] labeling_0.3                  bitops_1.0-6                 
[51] base64enc_0.1-3               gtable_0.2.0                 
[53] DBI_0.5-1                     reshape2_1.4.2               
[55] R6_2.2.0                      GenomicAlignments_1.10.0     
[57] gridExtra_2.2.1               knitr_1.15.1                 
[59] Hmisc_4.0-2                   stringi_1.1.2                
[61] Rcpp_0.12.9                   rpart_4.1-10                 
[63] acepack_1.4.1  
ADD COMMENTlink modified 15 months ago by Michael Lawrence10.0k • written 15 months ago by oswaldo.lorenzo10
gravatar for Michael Lawrence
15 months ago by
Michael Lawrence10.0k
United States
Michael Lawrence10.0k wrote:

`plotRangesLinkedToData()` does this up to how the data are plotted (and you'll have to add the genes track in a separate call). Right now it makes a parallel coordinate plot, while you're looking for the equivalent heatmap. It actually wouldn't be that hard to hack the code for `plotRangesLinkedToData()` to produce a heatmap instead. I don't think I have the time for it though. A more modular approach would make this a lot easier.

ADD COMMENTlink written 15 months ago by Michael Lawrence10.0k

Dear Michael,

I'll have a look at the `plotRangesLinkedToData()` to see if I can get what I'm trying to do.

Anyway, what do you mean by using a more modular approach? Can I import the data in a different way that would make the plot easier to generate?

Thank you very much for your help

ADD REPLYlink written 15 months ago by oswaldo.lorenzo10

I just meant a more modular approach to the design of ggbio, which pretty close to abandonware. It would be good not to write your own BED parser though.

ADD REPLYlink written 15 months ago by Michael Lawrence10.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 239 users visited in the last hour