Combining LD matrix plot with SNP and gene data with ggplot / ggbio tracks() function
1
1
Entering edit mode
@oswaldolorenzo-11449
Last seen 7.2 years ago

Hello,

Basically, what I'm trying to do is to plot the linkage disequilibrium (LD) in a certain genomic region in a similar way as this website does: https://analysistools.nci.nih.gov/LDlink/?tab=ldmatrix. The problem is that the figure from this website has a low quality and I need the figure for publication purposes. All the sample files I have used are at the end of the post so that you can reproduce my code.

This is what I have done so far:

library(ggplot2)
library(reshape)
library(scales)
library(ggbio)
library(GenomicRanges)
library(Homo.sapiens)
library(rtracklayer)
library(GenomicRanges)
library(BSgenome.Hsapiens.UCSC.hg19)
library(VariantAnnotation)
source("bed2granges.R")
data(hg19IdeogramCyto, package = "biovizBase")
data(hg19Ideogram, package = "biovizBase")
data(genesymbol, package = "biovizBase")
hg19 <- keepSeqlevels(hg19IdeogramCyto, paste0("chr", c(1:22, "X", "Y")))

I downloaded the matrix txt file from the LDlink website to use it for my LD plot. I melted the dataframe to obtain the grading colors:

# FIRST PLOT - LD plot for CEU population

#Load r-squared values in Table format
CEU_plot<- read.table("r2_SOD1_matrix.txt", header = TRUE)

#Load chromosomal positions for each rs ID
RS_location <- read.table("SOD1_SNPs.bed", header = FALSE)

#Melt table to be able to make a gradient
CEU_plot_m <- melt(CEU_plot)

# x-axis values - chromosomal position
CEU_plot_m$RS_number <- factor(RS_location[,2], levels = RS_location[,2])

# y-axis - variable
CEU_plot_m$variable <- factor(CEU_plot_m$variable, levels = sort(CEU_plot_m$variable, decreasing = TRUE))

CEU = ggplot(CEU_plot_m, aes(x = RS_number, y = variable)) +
        labs (title = "r2 plot for SOD1 gene in CEU population ",
              x = element_blank (),
              y = "rs IDs") +
        theme(text = element_text(size=8),
              axis.text.x = element_blank(),
              axis.text.y = element_text(hjust=0)) +
        geom_tile(aes(fill = value), colour = "white") +
        scale_fill_gradient(low = "white" ,high = "red")
CEU

Then I use ggbio package to generate the third and fourth track of my final plot

# DATA FOR REST OF PLOTS
wh <- genesymbol[c("SOD1")]
chr_name <- as.vector(slot(wh@seqnames,"values"))
chr_start <- slot(wh@ranges, "start") + 0 - 2000
chr_end <- slot (wh@ranges, "start") + slot (wh@ranges, "width") - 1 + 2000
gene_name <- slot(wh@ranges, "NAMES")

#THIRD PLOT - SOD1 SNPs
SNP <- bed_to_granges("SOD1_SNPs.bed")
SNP_chr <- slot(SNP@seqnames,"values")
if (chr_name %in% SNP_chr) {
        seqlengths(SNP) <- seqlengths(hg19Ideogram)[names(seqlengths(SNP))]
        SNP_dn <- keepSeqlevels(unique(SNP), chr_name)
}

SNPs_plot <-  autoplot(SNP_dn,  xlab = chr_name) +
        guides (colour = TRUE) +
        theme_bw()+
        theme(panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              panel.border = element_blank(),
              panel.background = element_blank()) +
        theme(legend.position="none") +
        scale_color_manual(values = score <- c("black")) +
        scale_fill_manual (breaks = score <- c("2"),
                           values= score <- c("black"),
                           name = "Variant type",
                           labels = expression(bold(SNP))) +
        xlim(chr_start,chr_end) +
        scale_x_sequnit("Mb")
fixed(SNP_IDs) <- TRUE
SNPs_plot

##FOURTH PLOT - SOD1 gene
GENES_plot <-   autoplot(Homo.sapiens, which = wh, layout = "linear",  xlab = chr_name) +
        guides (colour = FALSE) +
        xlim(chr_start,chr_end) +
        scale_x_sequnit("Mb")
fixed(GENES_plot) <- TRUE
GENES_plot

# Main tracks plot title
plot_title <- as.expression(bquote('Variation in'~italic(.(gene_name))~'gene'))

# X axis scale according to gene start and gene end
scale_combined <- GRanges(chr_name, IRanges(start = chr_start, end = chr_end))

# Combination of SNPs plot with Genes plot

Gene_SNPs <-     tracks(SNPs = SNP_IDs,
                    Genes=GENES_plot,
                    heights = c(0.5,2.0),
                    xlim = scale_combined,
                    xlab = paste("\n",chr_name,"\n"),
                    title = plot_title,
                    label.bg.fill = "grey60") +
                    scale_x_sequnit("Mb")
Gene_SNPs

 

The problem comes when I want to combine the CEU plot (first track) with the SNPs (third track) and the Gene (fourth track) plots. I need to combine a ggplot list (CEU) with two GGBio objects (SNPs_plot and GENES_plot). On top of that I would like to generate an additional track between the LD plot and the SNPs_plot that links the discrete values of the x axis from the CEU plot with the SNPs located in a continuous scale from the SNPs_plot.

In summary, this is what I have:

LD Plot:

Combined SNPs data with Gene track:

An this is the final combined plot that I want:

Thank you very much for your help in advance,

 


These are the original files I'm using:

'r2_SOD1_matrix.txt' input file

https://drive.google.com/open?id=0B34ok3wh5PjTMklhNzBrT3JoM0k


'SOD1_SNPs.bed' input file

https://drive.google.com/open?id=0B34ok3wh5PjTaGx6QWdZd3pmbUE


'bed2granges.R' function

https://drive.google.com/open?id=0B34ok3wh5PjTMHNFa1pod0liYzA


R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] scales_0.4.1                            reshape_0.8.6                          
 [3] VariantAnnotation_1.20.2                Rsamtools_1.26.1                       
 [5] SummarizedExperiment_1.4.0              BSgenome.Hsapiens.UCSC.hg19_1.4.0      
 [7] BSgenome_1.42.0                         Biostrings_2.42.1                      
 [9] XVector_0.14.0                          rtracklayer_1.34.1                     
[11] Homo.sapiens_1.3.1                      TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[13] org.Hs.eg.db_3.4.0                      GO.db_3.4.0                            
[15] OrganismDbi_1.16.0                      GenomicFeatures_1.26.2                 
[17] AnnotationDbi_1.36.1                    Biobase_2.34.0                         
[19] GenomicRanges_1.26.2                    GenomeInfoDb_1.10.2                    
[21] IRanges_2.8.1                           S4Vectors_0.12.1                       
[23] ggbio_1.22.3                            BiocGenerics_0.20.0                    
[25] ggplot2_2.2.1                          

loaded via a namespace (and not attached):
 [1] httr_1.2.1                    AnnotationHub_2.6.4          
 [3] splines_3.3.2                 Formula_1.2-1                
 [5] shiny_1.0.0                   assertthat_0.1               
 [7] interactiveDisplayBase_1.12.0 latticeExtra_0.6-28          
 [9] RBGL_1.50.0                   yaml_2.1.14                  
[11] RSQLite_1.1-2                 backports_1.0.5              
[13] lattice_0.20-34               biovizBase_1.22.0            
[15] digest_0.6.11                 RColorBrewer_1.1-2           
[17] checkmate_1.8.2               colorspace_1.3-2             
[19] htmltools_0.3.5               httpuv_1.3.3                 
[21] Matrix_1.2-8                  plyr_1.8.4                   
[23] XML_3.98-1.5                  biomaRt_2.30.0               
[25] zlibbioc_1.20.0               xtable_1.8-2                 
[27] BiocParallel_1.8.1            htmlTable_1.8                
[29] tibble_1.2                    nnet_7.3-12                  
[31] lazyeval_0.2.0                survival_2.40-1              
[33] magrittr_1.5                  mime_0.5                     
[35] memoise_1.0.0                 GGally_1.3.0                 
[37] foreign_0.8-67                graph_1.52.0                 
[39] BiocInstaller_1.24.0          tools_3.3.2                  
[41] data.table_1.10.0             stringr_1.1.0                
[43] munsell_0.4.3                 cluster_2.0.5                
[45] ensembldb_1.6.2               grid_3.3.2                   
[47] RCurl_1.95-4.8                dichromat_2.0-0              
[49] labeling_0.3                  bitops_1.0-6                 
[51] base64enc_0.1-3               gtable_0.2.0                 
[53] DBI_0.5-1                     reshape2_1.4.2               
[55] R6_2.2.0                      GenomicAlignments_1.10.0     
[57] gridExtra_2.2.1               knitr_1.15.1                 
[59] Hmisc_4.0-2                   stringi_1.1.2                
[61] Rcpp_0.12.9                   rpart_4.1-10                 
[63] acepack_1.4.1  
ggplot2 ggbio plot tracks bioconductor • 3.2k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.1 years ago
United States

`plotRangesLinkedToData()` does this up to how the data are plotted (and you'll have to add the genes track in a separate call). Right now it makes a parallel coordinate plot, while you're looking for the equivalent heatmap. It actually wouldn't be that hard to hack the code for `plotRangesLinkedToData()` to produce a heatmap instead. I don't think I have the time for it though. A more modular approach would make this a lot easier.

ADD COMMENT
0
Entering edit mode

Dear Michael,

I'll have a look at the `plotRangesLinkedToData()` to see if I can get what I'm trying to do.

Anyway, what do you mean by using a more modular approach? Can I import the data in a different way that would make the plot easier to generate?

Thank you very much for your help

ADD REPLY
0
Entering edit mode

I just meant a more modular approach to the design of ggbio, which pretty close to abandonware. It would be good not to write your own BED parser though.

ADD REPLY

Login before adding your answer.

Traffic: 572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6