Question: How to get 'shortest' distance with ChIPpeakAnno::binOverFeature
0
4 months ago by
United States

Dear Jianhong,

I'm working on a project to plot the distance between Differentially Methylated Cytosine (DMC) to CpG islands. The DMC are just 1bp wide, and the CpG islands has width. I want to get the shortest distance between DMC and CpG.

That is, if the relative location is DMC------distance--------CpG.start-------width-------CpG.end I want to get 'distance'.

If the relative location is CpG.start-----width-------CpG.end-------distance2-----DMC I want to get 'distance2'.

If the relative location is CpG.start------DMC-----CpG.end I want to get distance of zero.

If their are two CpG near DMC: CpG1.start----CpG1.end------d1--------DMC---------d2------CpG2.start------CpG2.end I want to get the shorter of {d1, d2}

I tried the following:

binOverFeatureCpG.gr, annotationData=mes.gr,
select = "nearest",
# PeakLocForDistance="middle",
# featureSite="FeatureStart",
PeakLocForDistance = "all",
errFun=sd,
ylab="count",
main="Distribution of CpG around DMC")


And I got the following output: https://www.dropbox.com/s/re1lltzosvqw2oq/hist.pdf?dl=0

R workspace including gr objects: https://www.dropbox.com/s/hamw29p67kfgobr/workspace.RData?dl=0

Questions:

1. Will the above code achieve my analysis goal? If not, how can I achieve that goal?
2. Is it possible to get the output data (distance, count information) in the output of the function, so that users can plot with more custom style?
3. Can we use 'density' rather than 'count' in the y-axis?
4. Should we use 'errFun' for my purpose? what does it do?
5. What does 'featureSite="bothEnd"' do? I got and error when I used that option.

Sorry about the long list of questions. Thanks!

Ray

chippeakanno • 108 views
modified 4 months ago • written 4 months ago by liruiradiant0

You may want to try :

out <- binOverFeatureCpG.gr, annotationData=mes.gr,
select = "nearest",
PeakLocForDistance="middle",
featureSite="bothEnd",
errFun=sd,
ylab="count",
main="Distribution of CpG around DMC")


In this case, bothEnd will only consider outside of the feature. "bothEnd" is used to calculate the distance from peaks. output can be used to plot custom style. However, location 0 will ignored.

I think to answer your question, the best way is to split it into 2 steps: 1. annotatePeakInBatch to annotated the nearest features 2. use distance function to calculate the distance from peak to features, and then apply sign to the distance.

Hope this will help.

Jianhong.

Ray, You could also use annotatePeakInBatch with output="both" and maxgap = 0 to generate the nearest/overlapping features. Then select the features with the shortestDistance for each DMC after setting shortestDistance = 0 for features with fromOverlappingOrNearest = "Overlapping". Best, Julie

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.