Question: SNP location annotation problem in "VariantAnnotation" package
gravatar for yduan004
13 months ago by
yduan00420 wrote:

I couldn't get location annotation for some SNPs by using "VariantAnnotation" package.

For some SNP ids, eventhough their GRanges data are normal as others, they couldn't be annotated by using "locateVariants" function. Here is the code I ran. The link for "abnorm_snpid.txt" is, the link for "rd_abnorm.rds" is

abnorm_snpid <- readLines("abnorm_snpid.txt")

rd_abnorm <- readRDS("rd_abnorm.rds")


fa <- BSgenome.Hsapiens.UCSC.hg38

download.file("", "Homo_sapiens.GRCh38.90.gtf.gz")



library(GenomicFeatures); library(VariantAnnotation)

txdb <- makeTxDbFromGFF(file="Homo_sapiens.GRCh38.90.gtf", format="gtf", dataSource="Ensembl", organism="Homo sapiens")

allvar <- locateVariants(rd_abnorm, txdb, AllVariants()) # allvar is empty, no location annotation

# GRanges object with 0 ranges and 9 metadata columns:
#   seqnames    ranges strand | LOCATION  LOCSTART    LOCEND   QUERYID      TXID         CDSID      GENEID
# <Rle> <IRanges>  <Rle> | <factor> <integer> <integer> <integer> <integer> <IntegerList> <character>
# <CharacterList> <CharacterList>
#   -------
#   seqinfo: no sequences

## NOTE: the SNP ids are in dbSNP build 150, which uses hg38 (GRCh38) as reference genome.


# R version 3.4.1 (2017-06-30)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 16.04.1 LTS
# Matrix products: default
# BLAS: /usr/lib/atlas-base/atlas/
# LAPACK: /usr/lib/atlas-base/atlas/
# locale:
#   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
# [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
# attached base packages:
#   [1] parallel  stats4    stats     graphics  utils     datasets  grDevices methods   base     
# other attached packages:
#   [1] VariantAnnotation_1.20.3          Rsamtools_1.26.2                  SummarizedExperiment_1.4.0       
# [4] GenomicFeatures_1.26.4            AnnotationDbi_1.36.2              GEOquery_2.40.0                  
# [7] Biobase_2.34.0                    BiocInstaller_1.24.0              BSgenome.Hsapiens.UCSC.hg38_1.4.1
# [10] BSgenome_1.42.0                   rtracklayer_1.34.2                Biostrings_2.42.1                
# [13] XVector_0.14.1                    GenomicRanges_1.26.4              GenomeInfoDb_1.10.3              
# [16] IRanges_2.8.2                     S4Vectors_0.12.2                  BiocGenerics_0.20.0              
# [19] setwidth_1.0-4                    colorout_1.1-2                   
# loaded via a namespace (and not attached):
#   [1] Rcpp_0.12.12             compiler_3.4.1           bitops_1.0-6             tools_3.4.1             
# [5] zlibbioc_1.20.0          biomaRt_2.30.0           digest_0.6.12            bit_1.1-12              
# [9] RSQLite_2.0              memoise_1.1.0            tibble_1.3.4             lattice_0.20-35         
# [13] pkgconfig_2.0.1          rlang_0.1.2              Matrix_1.2-11            DBI_0.7                 
# [17] yaml_2.1.14              httr_1.3.1               bit64_0.9-7              grid_3.4.1              
# [21] R6_2.2.2                 XML_3.98-1.9             BiocParallel_1.8.2       blob_1.1.0              
# [25] GenomicAlignments_1.10.1 RCurl_1.95-4.8

Could any one please check and explain to me where the problem is? Thanks!

ADD COMMENTlink modified 13 months ago by Valerie Obenchain ♦♦ 6.6k • written 13 months ago by yduan00420

You need to make your Google Drive links publicly accessible.

ADD REPLYlink written 13 months ago by Pariksheet Nanda110

Hi Nanda,

Sorry about that. I have now figured out how to make the links publicly accessible. Could you please check it again? Thanks!


ADD REPLYlink written 13 months ago by yduan00420
gravatar for Valerie Obenchain
13 months ago by
Valerie Obenchain ♦♦ 6.6k
United States
Valerie Obenchain ♦♦ 6.6k wrote:

The IntergenicVariants class has upstream and downstream values that define "how far" from the gene to search. Defaults are shown on the ?IntergenicVariants man page:

IntergenicVariants(upstream = 1e+06L, downstream = 1e+06L, idType=c("gene", "tx"))

If your SNP falls outside of these ranges it won't be returned by locateVariants().



ADD COMMENTlink written 13 months ago by Valerie Obenchain ♦♦ 6.6k

Ok, I see. Thanks for your answering!


ADD REPLYlink written 13 months ago by yduan00420
gravatar for Pariksheet Nanda
13 months ago by
University of Connecticut
Pariksheet Nanda110 wrote:

None of those snips fall within transcripts:

 > intersect(transcripts(txdb), rd_abnorm)
 GRanges object with 0 ranges and 0 metadata columns:
    seqnames    ranges strand
       <Rle> <IRanges>  <Rle>
   seqinfo: 47 sequences (1 circular) from 2 genomes (hg38, NA); no seqlengths
ADD COMMENTlink written 13 months ago by Pariksheet Nanda110

I think if the snps don't fall within transcripts, then the location annotation should be intergenic, intron or somewhere else. But in this case, all of the locations are NA... What do you think about it? Thanks!

ADD REPLYlink written 13 months ago by yduan00420
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 319 users visited in the last hour