Question: SNP location annotation problem in "VariantAnnotation" package
gravatar for yduan004
11 weeks ago by
yduan00420 wrote:

I couldn't get location annotation for some SNPs by using "VariantAnnotation" package.

For some SNP ids, eventhough their GRanges data are normal as others, they couldn't be annotated by using "locateVariants" function. Here is the code I ran. The link for "abnorm_snpid.txt" is, the link for "rd_abnorm.rds" is

abnorm_snpid <- readLines("abnorm_snpid.txt")

rd_abnorm <- readRDS("rd_abnorm.rds")


fa <- BSgenome.Hsapiens.UCSC.hg38

download.file("", "Homo_sapiens.GRCh38.90.gtf.gz")



library(GenomicFeatures); library(VariantAnnotation)

txdb <- makeTxDbFromGFF(file="Homo_sapiens.GRCh38.90.gtf", format="gtf", dataSource="Ensembl", organism="Homo sapiens")

allvar <- locateVariants(rd_abnorm, txdb, AllVariants()) # allvar is empty, no location annotation

# GRanges object with 0 ranges and 9 metadata columns:
#   seqnames    ranges strand | LOCATION  LOCSTART    LOCEND   QUERYID      TXID         CDSID      GENEID
# <Rle> <IRanges>  <Rle> | <factor> <integer> <integer> <integer> <integer> <IntegerList> <character>
# <CharacterList> <CharacterList>
#   -------
#   seqinfo: no sequences

## NOTE: the SNP ids are in dbSNP build 150, which uses hg38 (GRCh38) as reference genome.


# R version 3.4.1 (2017-06-30)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 16.04.1 LTS
# Matrix products: default
# BLAS: /usr/lib/atlas-base/atlas/
# LAPACK: /usr/lib/atlas-base/atlas/
# locale:
#   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
# [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
# attached base packages:
#   [1] parallel  stats4    stats     graphics  utils     datasets  grDevices methods   base     
# other attached packages:
#   [1] VariantAnnotation_1.20.3          Rsamtools_1.26.2                  SummarizedExperiment_1.4.0       
# [4] GenomicFeatures_1.26.4            AnnotationDbi_1.36.2              GEOquery_2.40.0                  
# [7] Biobase_2.34.0                    BiocInstaller_1.24.0              BSgenome.Hsapiens.UCSC.hg38_1.4.1
# [10] BSgenome_1.42.0                   rtracklayer_1.34.2                Biostrings_2.42.1                
# [13] XVector_0.14.1                    GenomicRanges_1.26.4              GenomeInfoDb_1.10.3              
# [16] IRanges_2.8.2                     S4Vectors_0.12.2                  BiocGenerics_0.20.0              
# [19] setwidth_1.0-4                    colorout_1.1-2                   
# loaded via a namespace (and not attached):
#   [1] Rcpp_0.12.12             compiler_3.4.1           bitops_1.0-6             tools_3.4.1             
# [5] zlibbioc_1.20.0          biomaRt_2.30.0           digest_0.6.12            bit_1.1-12              
# [9] RSQLite_2.0              memoise_1.1.0            tibble_1.3.4             lattice_0.20-35         
# [13] pkgconfig_2.0.1          rlang_0.1.2              Matrix_1.2-11            DBI_0.7                 
# [17] yaml_2.1.14              httr_1.3.1               bit64_0.9-7              grid_3.4.1              
# [21] R6_2.2.2                 XML_3.98-1.9             BiocParallel_1.8.2       blob_1.1.0              
# [25] GenomicAlignments_1.10.1 RCurl_1.95-4.8

Could any one please check and explain to me where the problem is? Thanks!

ADD COMMENTlink modified 10 weeks ago by Valerie Obenchain ♦♦ 6.4k • written 11 weeks ago by yduan00420

You need to make your Google Drive links publicly accessible.

ADD REPLYlink written 11 weeks ago by Pariksheet Nanda70

Hi Nanda,

Sorry about that. I have now figured out how to make the links publicly accessible. Could you please check it again? Thanks!


ADD REPLYlink written 10 weeks ago by yduan00420
gravatar for Valerie Obenchain
10 weeks ago by
Valerie Obenchain ♦♦ 6.4k
United States
Valerie Obenchain ♦♦ 6.4k wrote:

The IntergenicVariants class has upstream and downstream values that define "how far" from the gene to search. Defaults are shown on the ?IntergenicVariants man page:

IntergenicVariants(upstream = 1e+06L, downstream = 1e+06L, idType=c("gene", "tx"))

If your SNP falls outside of these ranges it won't be returned by locateVariants().



ADD COMMENTlink written 10 weeks ago by Valerie Obenchain ♦♦ 6.4k

Ok, I see. Thanks for your answering!


ADD REPLYlink written 9 weeks ago by yduan00420
gravatar for Pariksheet Nanda
10 weeks ago by
University of Connecticut
Pariksheet Nanda70 wrote:

None of those snips fall within transcripts:

 > intersect(transcripts(txdb), rd_abnorm)
 GRanges object with 0 ranges and 0 metadata columns:
    seqnames    ranges strand
       <Rle> <IRanges>  <Rle>
   seqinfo: 47 sequences (1 circular) from 2 genomes (hg38, NA); no seqlengths
ADD COMMENTlink written 10 weeks ago by Pariksheet Nanda70

I think if the snps don't fall within transcripts, then the location annotation should be intergenic, intron or somewhere else. But in this case, all of the locations are NA... What do you think about it? Thanks!

ADD REPLYlink written 10 weeks ago by yduan00420
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 113 users visited in the last hour