Search
Question: locateVariants function does not find an annotation for the locus
0
gravatar for Aleksandra
3 months ago by
Aleksandra0
Aleksandra0 wrote:

Hi,

I have a list of loci that I would like to annotate with locateVariants function. However, the function does not annotate some variants I am interested in. Below I put an example for the locus chr5:1599742 (genome hg19). I know from UCSC that there is a gene "LOC728613" and a few transcripts. But the locateVariants function does not find it.

I run the code from the fresh R session:

library(VariantAnnotation)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb_hg19 <- TxDb.Hsapiens.UCSC.hg19.knownGene
chr1 = "chr5"
loc1 = 1599742
gr1 = GRanges(seqnames = chr1, ranges = IRanges(start=loc1, loc1))
genome(gr1) = "hg19"
gr1_annot <- locateVariants(gr1, txdb_hg19, AllVariants())

I get the empty output:

> gr1_annot
GRanges object with 0 ranges and 9 metadata columns:
   seqnames    ranges strand | LOCATION  LOCSTART    LOCEND   QUERYID      TXID         CDSID      GENEID
      <Rle> <IRanges>  <Rle> | <factor> <integer> <integer> <integer> <integer> <IntegerList> <character>
         PRECEDEID        FOLLOWID
   <CharacterList> <CharacterList>
  -------
  seqinfo: no sequences

 

But when I run the following code, it shows that there is a gene in that locus:

​genes <- genes(txdb_hg19)
olaps <- findOverlaps(gr1, genes)  

The result:

> olaps
Hits object with 1 hit and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1       18748
  -------
  queryLength: 1 / subjectLength: 23056

Do you have any advice, what am I doing wrong? What should i do to obtain a proper annotation of that locus by locateVariants?

Best regards,

Aleksandra

 

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.28.4                 
 [3] AnnotationDbi_1.38.2                    VariantAnnotation_1.22.3               
 [5] Rsamtools_1.28.0                        Biostrings_2.44.2                      
 [7] XVector_0.16.0                          SummarizedExperiment_1.6.3             
 [9] DelayedArray_0.2.7                      matrixStats_0.52.2                     
[11] Biobase_2.36.2                          GenomicRanges_1.28.4                   
[13] GenomeInfoDb_1.12.2                     IRanges_2.10.2                         
[15] S4Vectors_0.14.3                        BiocGenerics_0.22.0                    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12             compiler_3.4.1           bitops_1.0-6             tools_3.4.1             
 [5] zlibbioc_1.22.0          biomaRt_2.32.1           digest_0.6.12            bit_1.1-12              
 [9] RSQLite_2.0              memoise_1.1.0            tibble_1.3.3             lattice_0.20-35         
[13] BSgenome_1.44.0          pkgconfig_2.0.1          rlang_0.1.2              Matrix_1.2-10           
[17] DBI_0.7                  GenomeInfoDbData_0.99.0  rtracklayer_1.36.4       bit64_0.9-7             
[21] grid_3.4.1               XML_3.98-1.9             BiocParallel_1.10.1      blob_1.1.0              
[25] GenomicAlignments_1.12.1 RCurl_1.95-4.8
ADD COMMENTlink modified 3 months ago • written 3 months ago by Aleksandra0
0
gravatar for Valerie Obenchain
3 months ago by
Valerie Obenchain ♦♦ 6.4k
United States
Valerie Obenchain ♦♦ 6.4k wrote:

Hi,

locateVariants() identifies hits in regions more specific than the at level of gene or transcript. Regions are documented on the ?AllVariants man page and are either parts of a gene or regions surrounding a gene.

CodingVariants()
IntronVariants()
FiveUTRVariants()
ThreeUTRVariants()
SpliceSiteVariants()
IntergenicVariants()
PromoterVariants()
AllVariants()

To find hits at the more coarse level of gene or transcript you would use findOverlaps() with the genes() or transcripts() extractors as you have done. Though it's an empty set, your result does tell you that position 1599742 on chromosome 5 does not fall in a coding, intron, UTR or splice site of the gene. Intergenic and promoter catch ranges that fall outside a defined gene so you would expect those results to be empty.

As a side note, I'm not sure why you set chr1 = "chr5" and use chr1 when creating the GRanges. This makes the code less clear. You could instead use a character in the constructor:

gr1 = GRanges(seqnames = "chr5", ranges = IRanges(start=loc1, loc1))

> gr1
GRanges object with 1 range and 0 metadata columns:
      seqnames             ranges strand
         <Rle>          <IRanges>  <Rle>
  [1]     chr5 [1599742, 1599742]      *

Valerie

ADD COMMENTlink written 3 months ago by Valerie Obenchain ♦♦ 6.4k
0
gravatar for Aleksandra
3 months ago by
Aleksandra0
Aleksandra0 wrote:

Thank you very much for your explanation. I thought that locateVariants function would describe a variant regardless of the genomic region type. Now I know that it does not cover exons in non-coding genes. It would be helpful if the ExonVariants() function also exists :)

Aleksandra

ADD COMMENTlink written 3 months ago by Aleksandra0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 238 users visited in the last hour