Dear Jianhong,
I'm working on a project to plot the distance between Differentially Methylated Cytosine (DMC) to CpG islands. The DMC are just 1bp wide, and the CpG islands has width. I want to get the shortest distance between DMC and CpG.
That is, if the relative location is
DMC------distance--------CpG.start-------width-------CpG.end
I want to get 'distance'.
If the relative location is
CpG.start-----width-------CpG.end-------distance2-----DMC
I want to get 'distance2'.
If the relative location is
CpG.start------DMC-----CpG.end
I want to get distance of zero.
If their are two CpG near DMC:
CpG1.start----CpG1.end------d1--------DMC---------d2------CpG2.start------CpG2.end
I want to get the shorter of {d1, d2}
I tried the following:
binOverFeatureCpG.gr, annotationData=mes.gr,
select = "nearest",
# PeakLocForDistance="middle",
# featureSite="FeatureStart",
PeakLocForDistance = "all",
radius=5000, nbins=100, FUN=length,
errFun=sd,
ylab="count",
main="Distribution of CpG around DMC")
And I got the following output: https://www.dropbox.com/s/re1lltzosvqw2oq/hist.pdf?dl=0
R workspace including gr objects: https://www.dropbox.com/s/hamw29p67kfgobr/workspace.RData?dl=0
Questions:
- Will the above code achieve my analysis goal? If not, how can I achieve that goal?
- Is it possible to get the output data (distance, count information) in the output of the function, so that users can plot with more custom style?
- Can we use 'density' rather than 'count' in the y-axis?
- Should we use 'errFun' for my purpose? what does it do?
- What does 'featureSite="bothEnd"' do? I got and error when I used that option.
Sorry about the long list of questions. Thanks!
Ray
You may want to try :
In this case, bothEnd will only consider outside of the feature. "bothEnd" is used to calculate the distance from peaks. output can be used to plot custom style. However, location 0 will ignored.
I think to answer your question, the best way is to split it into 2 steps: 1. annotatePeakInBatch to annotated the nearest features 2. use distance function to calculate the distance from peak to features, and then apply sign to the distance.
Hope this will help.
Jianhong.
Ray, You could also use annotatePeakInBatch with output="both" and maxgap = 0 to generate the nearest/overlapping features. Then select the features with the shortestDistance for each DMC after setting shortestDistance = 0 for features with fromOverlappingOrNearest = "Overlapping". Best, Julie