Question: Graph of methylation data around TSS - Repitools or other package?
0
6.0 years ago by
Tim Smith1.1k
Tim Smith1.1k wrote:
Hi, I have a small RRBS methylation dataset comprising of about 10000 CpG positions (rows) and 10 subjects (columns). I would like make a graph that shows the mean methylation of the CpG around the TSS, i.e. mean methylation on the Y-axis, and distance from TSS on the X-axis. I also have a file, downloaded from UCSC, that shows the TSS start sites for each gene. I am trying to use the Repitools & GenomiRanges packages to help me do this. The user manual indicates that I need to Some sample code: ##---------------------------- probes <- data.frame(position=seq(1000, 3000, by = 200), chr = "chrX", strand = '-') genes <- data.frame(chr = "chrX", start=c(2100, 1000), end = c(3000, 2200),                     strand=c("+","-")) rownames(genes) <- paste("gene", 1:2, sep = '') r5 <- annotationLookup(probes, genes, 500, 500) tmat <- matrix(rnorm(253),11,23) ids <- paste(probes[,'chr'],probes[,'position'],sep='_') rownames(tmat) <- ids colnames(tmat) <- paste('s',1:ncol(tmat),sep='') ##------------------------- The user manual for Repitools indicates that I should be using the significancePlots function - but there is no method of that name listed (when I do "> ls("package:Repitools"). How should I go about using the sample data above to make my graph? Is Repitools the best (in my case, the simplest) package that I can use? Any help would be highly appreciated! thanks! [[alternative HTML version deleted]]
go repitools • 967 views
modified 6.0 years ago by Dario Strbenac1.5k • written 6.0 years ago by Tim Smith1.1k
Answer: Graph of methylation data around TSS - Repitools or other package?
0
6.0 years ago by
Dario Strbenac1.5k
Australia
Dario Strbenac1.5k wrote:
Hello Tim, Ensure you are using a current version of Repitools. I checked the manual, and there is no mention of significancePlots. That function was renamed to profilePlots a while ago. Since you want to see all of the sites and are not testing a subset of them, I would use binPlots, and just give all the scores the same ID so they are in the same bin. The code you use from the example is specific to microarrays. Since you know the coordinates of the CpG islands already, you can provide them as the p.anno argument to featureScores. The first argument is your matrix of values. The anno argument is your annotation of genes from UCSC. Also provide up and down values for how far away from the TSS to plot. Once you have a ScoresList object made by featureScores with this information, you can give that to binPlots. -------------------------------------- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia
Hi Dario, Thanks for the reply and the guidance. I think I'm almost there! So, here is my sample data. I make the featureScores object as you suggested, but get an error when I use it with binPlot: ################# ## test data ## CpG probe info probes <- data.frame(position=seq(1000, 3000, by = 200), chr = "chrX", strand = '-') ## Gene info genes <- data.frame(chr = "chrX", start=c(2100, 1000,15000), end = c(3000, 2200,15900),                     strand=c("+","-","+")) symbol <- name <- paste("gene", 1:3, sep = '') genes <- data.frame(name,genes,symbol) ## Methylation matrix tmat <- matrix(rnorm(253),11,23) #  Methylation values for 11 probes across 23 samples ids <- paste(probes[,'chr'],probes[,'position'],sep='_') rownames(tmat) <- ids colnames(tmat) <- paste('s',1:ncol(tmat),sep='') print(head(genes,2)) print(head(probes,2)) print(tmat[1:2,1:3]) fsx <- featureScores(tmat,anno=genes,p.anno=probes,up=500,down=200,freq = 100) names(fsx) <- "test" class(fsx) ### binplot omat <- data.frame(matrix(1:nrow(tmat))) binPlots(fsx,order = omat,ord.label="",summarize="median",plot.type="line",lwd=2,n.bins=1) ############## This gives an error: > binPlots(fsx,order = omat,ord.label="",summarize="median",plot.type="line",lwd=2,n.bins=1) Calculating intervals. Splitting scores into bins. Error in .Method(..., FUN = FUN, MoreArgs = MoreArgs, SIMPLIFY = SIMPLIFY,  :   zero-length inputs cannot be mixed with those of non-zero length I also notice that the values of my fsx object have a different structure from the ones that are given in the example code for the binPlots documentation (also reproduced below). I feel that I need to 'tweak' something, but am not sure what! thanks for the help! ###### Example from documentation of binPlots # data(chr21genes) # data(samplesList)  # Loads 'samples.list.subset'. # data(expr)  # Loads 'expr.subset'. # # fs <- featureScores(samples.list.subset, chr21genes, up = 5000, down = 1000, dist = "base", freq = 1000, #                     s.width = 500) # fs@scores <- list(tables(fs)[[2]] - tables(fs)[[4]]) # names(fs) <- "PC-Norm" # # binPlots(fs, ordering = expr.subset, ord.label = "expression", plot.type = "line", n.bins = 4) # binPlots(fs, ordering = expr.subset, ord.label = "expression", plot.type = "heatmap", n.bins = 8) On Wednesday, November 20, 2013 9:00 PM, Dario Strbenac <dstr7320@uni.sydney.edu.au> wrote: Hello Tim, Ensure you are using a current version of Repitools. I checked the manual, and there is no mention of significancePlots. That function was renamed to profilePlots a while ago. Since you want to see all of the sites and are not testing a subset of them, I would use binPlots, and just give all the scores the same ID so they are in the same bin. The code you use from the example is specific to microarrays. Since you know the coordinates of the CpG islands already, you can provide them as the p.anno argument to featureScores. The first argument is your matrix of values. The anno argument is your annotation of genes from UCSC. Also provide up and down values for how far away from the TSS to plot. Once you have a ScoresList object made by featureScores with this information, you can give that to binPlots. -------------------------------------- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia [[alternative HTML version deleted]]
It turns out that the problems were that the some of the variables were in the wrong format. Firstly, p.anno isn't being handled by the code correctly if it doesn't have an index column. The documentation says index is optional. Do probes$index <- 1:nrow(probes) before giving probes to featureScores. You'll also want to set log2.adj = FALSE for your type of data. Otherwise, zero methylation becomes negative infinity. Secondly, the ordering data frame expects numeric or factor type columns. You provided integer. Thirdly, the length of your ordering was the same as the number of probes, rather than the number of genes. The correct code for the ordering parameter is data.frame(factor(rep(1, nrow(genes)))). I will add more error checking code for the user inputs to the next version and ensure featureScores works when there isn't an index column for p.anno. [[alternative HTML version deleted]] ADD REPLYlink written 6.0 years ago by Dario Strbenac1.5k Hi Dario, Thanks for those corrections. I now have the graph made. However, I'd like to modify the main title for the plot, the axis labels and the legend. I tried using xlab,ylab etc. to set the labels for the axes but get an error. Is there a way that I can do this? Pankaj On Saturday, November 23, 2013 3:00 AM, Dario Strbenac <dstr7320@uni.sydney.edu.au> wrote: It turns out that the problems were that the some of the variables were in the wrong format. Firstly, p.anno isn't being handled by the code correctly if it doesn't have an index column. The documentation says index is optional. Do probes$index <- 1:nrow(probes) before giving probes to featureScores. You'll also want to set log2.adj = FALSE for your type of data. Otherwise, zero methylation becomes negative infinity. Secondly, the ordering data frame expects numeric or factor type columns. You provided integer. Thirdly, the length of your ordering was the same as the number of probes, rather than the number of genes. The  correct code for the ordering parameter is data.frame(factor(rep(1, nrow(genes)))). I will add more error checking code for the user inputs to the next version and ensure featureScores works when there isn't an index column for p.anno. [[alternative HTML version deleted]]