locateVariants function does not find an annotation for the locus
2
0
Entering edit mode
Aleksandra • 0
@aleksandra-13756
Last seen 7.4 years ago

Hi,

I have a list of loci that I would like to annotate with locateVariants function. However, the function does not annotate some variants I am interested in. Below I put an example for the locus chr5:1599742 (genome hg19). I know from UCSC that there is a gene "LOC728613" and a few transcripts. But the locateVariants function does not find it.

I run the code from the fresh R session:

library(VariantAnnotation)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb_hg19 <- TxDb.Hsapiens.UCSC.hg19.knownGene
chr1 = "chr5"
loc1 = 1599742
gr1 = GRanges(seqnames = chr1, ranges = IRanges(start=loc1, loc1))
genome(gr1) = "hg19"
gr1_annot <- locateVariants(gr1, txdb_hg19, AllVariants())

I get the empty output:

> gr1_annot
GRanges object with 0 ranges and 9 metadata columns:
   seqnames    ranges strand | LOCATION  LOCSTART    LOCEND   QUERYID      TXID         CDSID      GENEID
      <Rle> <IRanges>  <Rle> | <factor> <integer> <integer> <integer> <integer> <IntegerList> <character>
         PRECEDEID        FOLLOWID
   <CharacterList> <CharacterList>
  -------
  seqinfo: no sequences

 

But when I run the following code, it shows that there is a gene in that locus:

​genes <- genes(txdb_hg19)
olaps <- findOverlaps(gr1, genes)  

The result:

> olaps
Hits object with 1 hit and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1       18748
  -------
  queryLength: 1 / subjectLength: 23056

Do you have any advice, what am I doing wrong? What should i do to obtain a proper annotation of that locus by locateVariants?

Best regards,

Aleksandra

 

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.28.4                 
 [3] AnnotationDbi_1.38.2                    VariantAnnotation_1.22.3               
 [5] Rsamtools_1.28.0                        Biostrings_2.44.2                      
 [7] XVector_0.16.0                          SummarizedExperiment_1.6.3             
 [9] DelayedArray_0.2.7                      matrixStats_0.52.2                     
[11] Biobase_2.36.2                          GenomicRanges_1.28.4                   
[13] GenomeInfoDb_1.12.2                     IRanges_2.10.2                         
[15] S4Vectors_0.14.3                        BiocGenerics_0.22.0                    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12             compiler_3.4.1           bitops_1.0-6             tools_3.4.1             
 [5] zlibbioc_1.22.0          biomaRt_2.32.1           digest_0.6.12            bit_1.1-12              
 [9] RSQLite_2.0              memoise_1.1.0            tibble_1.3.3             lattice_0.20-35         
[13] BSgenome_1.44.0          pkgconfig_2.0.1          rlang_0.1.2              Matrix_1.2-10           
[17] DBI_0.7                  GenomeInfoDbData_0.99.0  rtracklayer_1.36.4       bit64_0.9-7             
[21] grid_3.4.1               XML_3.98-1.9             BiocParallel_1.10.1      blob_1.1.0              
[25] GenomicAlignments_1.12.1 RCurl_1.95-4.8
locatevariants txdb.hsapiens.ucsc.hg19.knowngene variantannotation • 1.8k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 3.0 years ago
United States

Hi,

locateVariants() identifies hits in regions more specific than the at level of gene or transcript. Regions are documented on the ?AllVariants man page and are either parts of a gene or regions surrounding a gene.

CodingVariants()
IntronVariants()
FiveUTRVariants()
ThreeUTRVariants()
SpliceSiteVariants()
IntergenicVariants()
PromoterVariants()
AllVariants()

To find hits at the more coarse level of gene or transcript you would use findOverlaps() with the genes() or transcripts() extractors as you have done. Though it's an empty set, your result does tell you that position 1599742 on chromosome 5 does not fall in a coding, intron, UTR or splice site of the gene. Intergenic and promoter catch ranges that fall outside a defined gene so you would expect those results to be empty.

As a side note, I'm not sure why you set chr1 = "chr5" and use chr1 when creating the GRanges. This makes the code less clear. You could instead use a character in the constructor:

gr1 = GRanges(seqnames = "chr5", ranges = IRanges(start=loc1, loc1))

> gr1
GRanges object with 1 range and 0 metadata columns:
      seqnames             ranges strand
         <Rle>          <IRanges>  <Rle>
  [1]     chr5 [1599742, 1599742]      *

Valerie

ADD COMMENT
0
Entering edit mode
Aleksandra • 0
@aleksandra-13756
Last seen 7.4 years ago

Thank you very much for your explanation. I thought that locateVariants function would describe a variant regardless of the genomic region type. Now I know that it does not cover exons in non-coding genes. It would be helpful if the ExonVariants() function also exists :)

Aleksandra

ADD COMMENT

Login before adding your answer.

Traffic: 438 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6