Question: locateVariants function does not find an annotation for the locus
gravatar for Aleksandra
16 months ago by
Aleksandra0 wrote:


I have a list of loci that I would like to annotate with locateVariants function. However, the function does not annotate some variants I am interested in. Below I put an example for the locus chr5:1599742 (genome hg19). I know from UCSC that there is a gene "LOC728613" and a few transcripts. But the locateVariants function does not find it.

I run the code from the fresh R session:

txdb_hg19 <- TxDb.Hsapiens.UCSC.hg19.knownGene
chr1 = "chr5"
loc1 = 1599742
gr1 = GRanges(seqnames = chr1, ranges = IRanges(start=loc1, loc1))
genome(gr1) = "hg19"
gr1_annot <- locateVariants(gr1, txdb_hg19, AllVariants())

I get the empty output:

> gr1_annot
GRanges object with 0 ranges and 9 metadata columns:
   seqnames    ranges strand | LOCATION  LOCSTART    LOCEND   QUERYID      TXID         CDSID      GENEID
      <Rle> <IRanges>  <Rle> | <factor> <integer> <integer> <integer> <integer> <IntegerList> <character>
         PRECEDEID        FOLLOWID
   <CharacterList> <CharacterList>
  seqinfo: no sequences


But when I run the following code, it shows that there is a gene in that locus:

​genes <- genes(txdb_hg19)
olaps <- findOverlaps(gr1, genes)  

The result:

> olaps
Hits object with 1 hit and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1       18748
  queryLength: 1 / subjectLength: 23056

Do you have any advice, what am I doing wrong? What should i do to obtain a proper annotation of that locus by locateVariants?

Best regards,



> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.28.4                 
 [3] AnnotationDbi_1.38.2                    VariantAnnotation_1.22.3               
 [5] Rsamtools_1.28.0                        Biostrings_2.44.2                      
 [7] XVector_0.16.0                          SummarizedExperiment_1.6.3             
 [9] DelayedArray_0.2.7                      matrixStats_0.52.2                     
[11] Biobase_2.36.2                          GenomicRanges_1.28.4                   
[13] GenomeInfoDb_1.12.2                     IRanges_2.10.2                         
[15] S4Vectors_0.14.3                        BiocGenerics_0.22.0                    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12             compiler_3.4.1           bitops_1.0-6             tools_3.4.1             
 [5] zlibbioc_1.22.0          biomaRt_2.32.1           digest_0.6.12            bit_1.1-12              
 [9] RSQLite_2.0              memoise_1.1.0            tibble_1.3.3             lattice_0.20-35         
[13] BSgenome_1.44.0          pkgconfig_2.0.1          rlang_0.1.2              Matrix_1.2-10           
[17] DBI_0.7                  GenomeInfoDbData_0.99.0  rtracklayer_1.36.4       bit64_0.9-7             
[21] grid_3.4.1               XML_3.98-1.9             BiocParallel_1.10.1      blob_1.1.0              
[25] GenomicAlignments_1.12.1 RCurl_1.95-4.8
ADD COMMENTlink modified 16 months ago • written 16 months ago by Aleksandra0
gravatar for Valerie Obenchain
16 months ago by
Valerie Obenchain ♦♦ 6.7k
United States
Valerie Obenchain ♦♦ 6.7k wrote:


locateVariants() identifies hits in regions more specific than the at level of gene or transcript. Regions are documented on the ?AllVariants man page and are either parts of a gene or regions surrounding a gene.


To find hits at the more coarse level of gene or transcript you would use findOverlaps() with the genes() or transcripts() extractors as you have done. Though it's an empty set, your result does tell you that position 1599742 on chromosome 5 does not fall in a coding, intron, UTR or splice site of the gene. Intergenic and promoter catch ranges that fall outside a defined gene so you would expect those results to be empty.

As a side note, I'm not sure why you set chr1 = "chr5" and use chr1 when creating the GRanges. This makes the code less clear. You could instead use a character in the constructor:

gr1 = GRanges(seqnames = "chr5", ranges = IRanges(start=loc1, loc1))

> gr1
GRanges object with 1 range and 0 metadata columns:
      seqnames             ranges strand
         <Rle>          <IRanges>  <Rle>
  [1]     chr5 [1599742, 1599742]      *


ADD COMMENTlink written 16 months ago by Valerie Obenchain ♦♦ 6.7k
gravatar for Aleksandra
16 months ago by
Aleksandra0 wrote:

Thank you very much for your explanation. I thought that locateVariants function would describe a variant regardless of the genomic region type. Now I know that it does not cover exons in non-coding genes. It would be helpful if the ExonVariants() function also exists :)


ADD COMMENTlink written 16 months ago by Aleksandra0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 329 users visited in the last hour