Some particulars about rtracklayer::liftOver
1
0
Entering edit mode
@steve-lianoglou-2771
Last seen 22 months ago
United States
Hi all, I'm really not too well versed on the ins and outs of using `liftOver` from via the kent tools, so I'm comparing results from using rtracklayer::liftOver vs. a default run of liftOver from the command line. As a disclaimer, I'm lifting over coordinates between species. I know this isn't the best idea -- the party line from the UCSC folks seem to suggest that this is a bad idea because it is a "coarse" mapping, and you might really want to use pslMap in this case (which I will play with, too). For those who are interested, here's a few links for you: https://lists.soe.ucsc.edu/pipermail/genome/2007-November/015074.html https://lists.soe.ucsc.edu/pipermail/genome/2008-March/015908.html (summarizes next link) https://lists.soe.ucsc.edu/pipermail/genome/2007-November/015032.html http://article.gmane.org/gmane.science.biology.ucscgenome.general/1130 0 The correct command line switches to invoke pslMap using a liftover chain: https://lists.soe.ucsc.edu/pipermail/genome/2008-March/015908.html >From the first link, pslMap is apparently preferred because: """ pslMap shows more detail -- it translates a region using each block of the chain, introducing gaps where the chain has gaps. """ >From some results of rtracklayer::liftOver, it seems that this call implements this type of functionality from pslMap -- is that true? I say that because some of the ranges I liftOver are split into > 1 range into the target genome. Even though they are two ranges, sometimes they are contiguous, or only separated by a small gap. >From the few cases I've checked -- some examples below -- these multiple ranges returned from rtracklayer::liftOver correspond to the results from doing a pslMap For instance: The following mm9 range: chr1 [4836085, 4836142] + Gets lifted to hg19 here (notice small gap of width 5) with rtracklayer: chr8 [54959674, 54959715] - chr8 [54959654, 54959669] - but from the command line, it's lifted to: chr8 54959654 54959716 ... - At first I thought the rtracklayer fenceposts and the ucsc/cmd-line::liftOver results fence-posted the same interval, but the command line result is supposed to be ucsc-half-open (so start from 0), right? The GRanges were exported to a bed via rtracklayer, so I can confirm that coordinates were xlated to half-open correctly. But the result of the liftOver from the command line is off-by-one at its start position when compared to rtracklayer, no? Further -- these two target ranges are the same that come out of a pslMap, so -- that's nice. I started writing this post to inquire rtracklayer's insertion of gaps and to confirm that my by eye matching of this w/ pslMap output is intentional (I guess that's weird of me to ask, but just wanted to be sure). Then this off-by-one thing popped up as I was writing. There are other regions that are lifted over whose coords from the command line seem to have "the correct" -1 shift at the start ("correct" in as far as I'm mentally doing the half-open coord shift), and equal at the end, eg. the mm9 range: chr1 [9617177, 9617221] + usring rtracklayer::liftOver goes to hg19: chr8 [67430719, 67430763] + and from command line: chr8 67430718 67430763 ... + which seems to me like the same interval as rtracklayer. So, in a nut: (1) rtracklayer's gapped-liftOver magic seems to return the same info I would get from a call to pslMap. So, (a) thanks for that, and (b) I'm correct, right? :-) (2) This off by one thing -- am I just seeing things, or? Thanks for making it this far, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
Cancer rtracklayer Cancer rtracklayer • 1.7k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.1 years ago
United States
On Mon, Apr 23, 2012 at 11:43 PM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Hi all, > > I'm really not too well versed on the ins and outs of using `liftOver` > from via the kent tools, so I'm comparing results from using > rtracklayer::liftOver vs. a default run of liftOver from the command > line. > > As a disclaimer, I'm lifting over coordinates between species. I know > this isn't the best idea -- the party line from the UCSC folks seem to > suggest that this is a bad idea because it is a "coarse" mapping, and > you might really want to use pslMap in this case (which I will play > with, too). For those who are interested, here's a few links for you: > > https://lists.soe.ucsc.edu/pipermail/genome/2007-November/015074.html > https://lists.soe.ucsc.edu/pipermail/genome/2008-March/015908.html > (summarizes next link) > https://lists.soe.ucsc.edu/pipermail/genome/2007-November/015032.html > http://article.gmane.org/gmane.science.biology.ucscgenome.general/11 300 > > The correct command line switches to invoke pslMap using a liftover chain: > https://lists.soe.ucsc.edu/pipermail/genome/2008-March/015908.html > > >From the first link, pslMap is apparently preferred because: > > """ > pslMap shows more detail -- it translates a region using each block > of the chain, introducing gaps where the chain has gaps. > """ > > >From some results of rtracklayer::liftOver, it seems that this call > implements this type of functionality from pslMap -- is that true? > > I say that because some of the ranges I liftOver are split into > 1 > range into the target genome. Even though they are two ranges, > sometimes they are contiguous, or only separated by a small gap. > > >From the few cases I've checked -- some examples below -- these > multiple ranges returned from rtracklayer::liftOver correspond to the > results from doing a pslMap > > For instance: > > The following mm9 range: > > chr1 [4836085, 4836142] + > > Gets lifted to hg19 here (notice small gap of width 5) with rtracklayer: > > chr8 [54959674, 54959715] - > chr8 [54959654, 54959669] - > > but from the command line, it's lifted to: > > chr8 54959654 54959716 ... - > > At first I thought the rtracklayer fenceposts and the > ucsc/cmd-line::liftOver results fence-posted the same interval, but > the command line result is supposed to be ucsc-half-open (so start > from 0), right? The GRanges were exported to a bed via rtracklayer, so > I can confirm that coordinates were xlated to half-open correctly. But > the result of the liftOver from the command line is off-by-one at its > start position when compared to rtracklayer, no? > > Further -- these two target ranges are the same that come out of a > pslMap, so -- that's nice. > > I started writing this post to inquire rtracklayer's insertion of gaps > and to confirm that my by eye matching of this w/ pslMap output is > intentional (I guess that's weird of me to ask, but just wanted to be > sure). Then this off-by-one thing popped up as I was writing. There > are other regions that are lifted over whose coords from the command > line seem to have "the correct" -1 shift at the start ("correct" in as > far as I'm mentally doing the half-open coord shift), and equal at the > end, eg. the mm9 range: > > chr1 [9617177, 9617221] + > > usring rtracklayer::liftOver goes to hg19: > > chr8 [67430719, 67430763] + > > and from command line: > > chr8 67430718 67430763 ... + > > which seems to me like the same interval as rtracklayer. > > So, in a nut: > > (1) rtracklayer's gapped-liftOver magic seems to return the same info > I would get from a call to pslMap. So, (a) thanks for that, and (b) > I'm correct, right? :-) > > Yes, the rtracklayer algorithm seems to be like pslMap. Since the behavior of liftOver is not so well specified, I just implemented it the way that made sense to me. Actually, I was thinking of having liftOver() become a method on the "map" generic for Chain objects. So the name might become more representative soon. > (2) This off by one thing -- am I just seeing things, or? > > This is probably a bug. Are the correct ones always on the positive strand, and the wrong ones always on the minus strand? Thanks for testing my code, Michael Thanks for making it this far, > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 356 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6