Hello everyone,
I have been using trackViewer package in R to make plots of RNAseq, ChIPseq and ATACseq data. We have now some methylation data and I was trying to make some lolliplots to show differential methylation levels in genes/promoters.
I have been following the vignettes, but I am really confused with the section dedicated to lolliplots for methylation data, and have been unable to make it work. My knowledge in Bioinformatics is basic and it's the first time we have methylation data.
What type of input data do I need for these plots? We would like to show basically differentially methylated regions (DMRs) in two conditions (cell types). I would appreciate some help.
Thanks,
Patricia
What kind of file do you have? Whatever files can be imported into GRanges is OK for trackViewer to do lolliplot. If you want to show differences in two conditions, you can show it one by one or show it in caterpillar layout. Because you may want to fix the methylation positions, you can try to set jitter="label".
The pseudo code could be:
dat <- read.delim("/path/to/data/with/seqname/start/end/sample1/sample2")
methy <- with(dat, GRanges(seqnames=seqname, ranges=IRanges(start, end), strand=strand, score=score1))
gene <- geneTrack(geneEntrizID, TxDb.object)[[1]]$dat
lolliplot(methy, features, jitter="label") ## plot for sample1, repeat to plot for sample2. Or you can use different color or shape to define different samples.
Let me know when you have trouble in understand my code.
Jianhong.
Hi Jianhong,
Thanks for you reply. I was completely unable to make a lolliplot before with my data, mainly I couldn't read my files. But using your code, I could plot some! However, I am confused with the type of data that I should use. I will try to explain what type of data I have (we did not perform the analysis ourselves, but we got the results from the sequencing facility, who also did the basic bioinformatics analysis).
On the one hand, I have one type of file that contains information for each methylated C (single nucleotide level). I have one of these files for each sample. Ex: sample 1
seqnames Start Strand Pattern Seq Score Copynumber Methyreads Nonmethyreads Biom_expect
chr1 129773068 - CHH TTGTT 1 0 14 2 0.00840124401116582
I tried plotting this data for this only one sample for one gene (IL10 - chr1:131019845..13102497) and it looks crazy because there are pins everywhere along the gene.
On the other hand, I have one other file after differential analysis was performed between conditions (group 1 vs group 2, each group including 3 samples), containing differentially methylated regions (DMRs). This file contains in one single table information for each group (g1 vs g2). This file has info on chr, start, end, q-value, mean difference between groups, #CpGs, mean g1 and mean g2:
Chr Start End q-value mean difference: mean g1 - mean g2 #CpGs mean g1 mean g2
chr1 7047215 7047553 0.0078483 0.589576 10 0.65566 0.066084
I tried using this file, but I get an error saying that I should have ranges with width = 1. I tried modifying the width of the range to 1 using the Start position as the location, and then it works, but obviously I am losing information, because there are several CpGs in the region, not only one nucleotide at the Start. So I am not sure what data I should use or if I can work with any of those files.
And then, how can I plot 2 samples in the same plot? Following the vignette I think I should create a list of A=methy g1, B=methy g2, does this make sense?
Sorry for my many questions, as you can see, I am a bit lost.
Thank you again!
Patricia.
Hi Patricia,
If there are pins everywhere, you may want to try dandelion plot.
If there are multiple CpGs in one region, split it for each CpGs. Currently, lolliplot only support the plots with width equal to 1. To plot multiple samples, please follow the guide in vignettes. Basically, your idea should work.
Jianhong.