Finding the coordinates from a BCV plot in edgeR
1
0
Entering edit mode
pertille • 0
@pertille-9104
Last seen 5.5 years ago
Sweden

According to edgeR package, I can estimate the dispersion of my dataset using:

y <- estimateDisp(y, design, robust=TRUE)

y$common.dispersion The square root of the common dispersion gives the coefficient of biological variation. And, the dispersion estimates can be viewed in a BCV plot: plotBCV(y) How do I know the function that generate this graph? I need to know the coordinates (x,y) of the vertex (max and min value of "trend" in the Y-axis to each vertex) of the "trend dispersion" and the values of X where the "trend" (blue) intercepts the commun dispersion line (red) Don´t worry about the spread of the data, the high variability is expected in my case. Moderator edit: Here's a link to the BCV plot: https://www.dropbox.com/s/h9p7356rpwkwwh1/GraphDispersion.png?dl=0 r edgeR estimatedispersions coordinate • 3.3k views ADD COMMENT 0 Entering edit mode I'm not sure what you mean - are you referring to a bimodal distribution for the dispersions? The concept of a "binomial dispersion" doesn't make any sense, as the binomial distribution is underdispersed relative to the Poisson. Your question would be a lot clearer if you posted an image of the actual BCV plot. ADD REPLY 0 Entering edit mode Yes, bimodal, sorry. But my question is simple. I just need to know the "X1,x2,x3,x4,x5" (where the "trend" crosses the "common dispersion average line") and the "Yv" that is the maximum and minimum value to the Y(trend's valley) ADD REPLY 1 Entering edit mode The trend curve is not a parabola, and it never crosses the x-axis, so again, we're not really sure what you're asking for. Are you looking for where the trend intersects the red line representing the common dispersion estimate across all genes? If so, I'm not aware of any interesting or useful information that can be gleaned from determining these points. Any continuous curve is going to equal its average at some point. You should take Gordon's advice and figure out why your dispersions are so high. Chances are that if you can solve that problem, the funny-looking trend will go away too. ADD REPLY 0 Entering edit mode Yes, I just need to know the function (x) that generate this dispersion plot to be able to determine the maximum and minimum values for the Y axis "trend" (valleys). My data is not RNA seq, this is kind of expected dispersion for my data. Thank you so much ADD REPLY 1 Entering edit mode The name of your image suggests that it's MeDIP data. I don't work with MeDIP, but I have done a lot of ChIP-seq stuff which (obviously) also has an IP step. In ChIP-seq, the signal is moderately variable due to differences in IP efficiency across samples, and this can result in some larger BCVs. Even so, the common/trended values that I see are rarely above 0.5, and never above 1. Your values indicate that there is a strong batch effect between your replicates, which is not unsurprising due to the nature of the IP step. You may need to consider blocking on the batch effect, if you can; or doing some appropriate normalization to get rid of the differences between samples. ADD REPLY 0 Entering edit mode only an additional information (according to the manual): "The BCV is the square root of the "negative binomial dispersion". This function displays the common, trended and tagwise BCV estimates". ADD REPLY 2 Entering edit mode @gordon-smyth Last seen 3 hours ago WEHI, Melbourne, Australia I think you are worrying about the wobbles in the trend curve, but this really isn't of much importance. What the plot is showing you is that the dispersion of your data is HUGE. The BCV values are mostly above 100%, and some as high as 500%. This shows that there is a serious, serious problem with your data. ADD COMMENT 0 Entering edit mode Thank you Gordon. Well, I'm actually not worried about the distribution. After all, you are right, I redid the graph and it was different from what I am presenting on this chart (I'm updating on my post) Fact is that I need the coordinates, as each "tagwise" represents a CpG site and my idea is to define "states" for this CpGs depending on the variability between samples (y-axis) and the cover (x-axis). To set these states, I need to know exactly where the "trend" intersects the X-axis and what is the maximum and minimum(Vmax, Vmin) of the "trend". ADD REPLY 1 Entering edit mode The trend never intersects the x-axis, as the dispersion estimates are always positive (in theory, it's possible to obtain estimates of zero for Poisson-distributed data, but this is practically impossible as estimateDisp has a lower bound of approximately 1e-4 on the dispersion estimates). Like Ryan, I suspect you are referring to the intersection between the trend and the red line. This red line represents an estimate of the common dispersion, not the x-axis. There is not much useful information that can be obtained by looking at the intersection between the blue and red lines. I would not consider them to classify your CpG sites in any meaningful manner. Why would you expect a site that is just on one side of an intersection point to be different from a site that is just on the opposite side? For that matter, the spread of the dispersions across sites is so large that small wobbles in the trend are negligible. Even if you did partition sites according to the intersection points, the difference within the resulting partitions would be a lot greater than that between partitions. ADD REPLY 0 Entering edit mode I tried to fix that is not where intersects the x-axis, but the dispersion curve (red), but I don´t know what it is going wrong. Thank you so much for your explanation. But I keep my doubt as how to know the coordinates of this chart. Do you know how to help me on that?; Now, I fixed the plot too. Thank you again. ADD REPLY 1 Entering edit mode You seem to have overlooked the comments from Gordon, Ryan and I about the futility of this task. In my opinion, trying to interpret the intersections is a waste of time. Nonetheless, if you insist: afun <- approxfun(y$AveLogCPM, y$trended.dispersion) optim(par=10, function(x) { (afun(x) - y$common.dispersion)^2 })

... and tinker around with the value of par to find all the intersection points. You can then run which.min or which.max on all genes lying between two intersection points to find the local min/max.

0
Entering edit mode

You are right about the conventional methodology.
"I am not overlooked the comments from Gordon, Ryan and you about the futility of this task. I am appreciate the comments and I'll check carefully everything again", but we need this information to address the "biological question" that we have.

The data we're working on are fruits of a not yet published methodology to answer a question unanswered yet.
We want to divide our CpGs in states defined by the depth cover and the variability between samples (our interest is not to analyze differential enrichment between conditions). I can´t see other better graphics to define these two states if this is our goal.

Thank you so much to answer my question.

1
Entering edit mode

I you inspect the code of the plotBCV function, you'll see that it access the tagwise, trended, and common dispersions, and the abundance values, for each gene as y$tagwise.dispersion, y$trended.dispersion, y$common.dispersion, and y$AveLogCPM, respectively. Using these values, you can easily find the extrema of the trend and the points where the trend crosses the common dispersion. But again, the coordinates of these points are unlikely to provide any useful information about your data.