Bug in rtracklayer liftOver function
1
0
Entering edit mode
@nickshrine-23772
Last seen 3.0 years ago
United Kingdom

https://www.ncbi.nlm.nih.gov/snp/rs1344426307

doesn't exist in hg19 but rtracklayer liftOver function instead seems to just map it to a random position chr10 101174514


R version 4.0.0 (2020-04-24) -- "Arbor Day"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(GenomicRanges)
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

> library(rtracklayer)
Warning message:
multiple methods tables found for ‘which’
> test <- data.frame(chr="chr20", pos=30886232)
> test.GRanges <- makeGRangesFromDataFrame(test, seqnames.field="chr", start.field="pos", end.field="pos")
> chain <- import.chain("../../hg38ToHg19.over.chain")
Warning messages:
1: In findGeneric(f, envir) :
  'import' is a formal generic function; S3 methods will not likely be found
2: In findGeneric(f, envir) :
  'import' is a formal generic function; S3 methods will not likely be found
> liftOver(test.GRanges, chain)
GRangesList object of length 1:
[[1]]
GRanges object with 1 range and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]    chr10 101174514      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
rtracklayer • 1.1k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 6 hours ago
United States

You are assuming that the rtracklayer implementation is doing something different than what the UCSC version does. And also assuming that liftOver somehow magically knows that the position you are lifting over is a SNP and that it then looks at dbSNP to see if that variant exists in the genome to which you are lifting. Neither of which is true (liftOver doesn't care what you are doing - it just maps genomic coordinates between versions).

> gr <- GRanges("chr20:30886232")
 > chain <- import.chain("hg38ToHg19.over.chain")
 > liftOver(gr, chain)
 GRangesList object of length 1:
 [[1]]
 GRanges object with 1 range and 0 metadata columns:
       seqnames    ranges strand
          <Rle> <IRanges>  <Rle>
   [1]    chr10 101174514      *
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths


## Now using UCSC liftOver
 > cat("chr20","30886232","30886233", sep = "\t", file = "tmp")
 > system("liftOver tmp hg38ToHg19.over.chain mapped unmapped")
 Reading liftover chains
 Mapping coordinates

> scan("mapped", "c")
 Read 3 items
 [1] "chr10"     "101174512" "101174513"

And yes, it's off by one. That's probably because it's mapping to the other strand:

 > gr <- GRanges(rep("chr20", 5), IRanges(30886230:30886234, width = 1))
!> liftOver(gr, chain)
 GRangesList object of length 5:
 [[1]]
 GRanges object with 1 range and 0 metadata columns:
       seqnames    ranges strand
          <Rle> <IRanges>  <Rle>
   [1]    chr10 101174516      *
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths

 [[2]]
 GRanges object with 1 range and 0 metadata columns:
       seqnames    ranges strand
          <Rle> <IRanges>  <Rle>
   [1]    chr10 101174515      *
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths

 [[3]]
 GRanges object with 1 range and 0 metadata columns:
       seqnames    ranges strand
          <Rle> <IRanges>  <Rle>
   [1]    chr10 101174514      *
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths

 [[4]]
 GRanges object with 1 range and 0 metadata columns:
       seqnames    ranges strand
          <Rle> <IRanges>  <Rle>
   [1]    chr10 101174513      *
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths

 [[5]]
 GRanges object with 1 range and 0 metadata columns:
       seqnames    ranges strand
          <Rle> <IRanges>  <Rle>
   [1]    chr10 101174512      *
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths
ADD COMMENT

Login before adding your answer.

Traffic: 531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6