SNP location annotation problem in "VariantAnnotation" package
2
1
Entering edit mode
yduan004 ▴ 20
@yduan004-13872
Last seen 7.3 years ago
Riverside

I couldn't get location annotation for some SNPs by using "VariantAnnotation" package.

For some SNP ids, eventhough their GRanges data are normal as others, they couldn't be annotated by using "locateVariants" function. Here is the code I ran. The link for "abnorm_snpid.txt" is https://drive.google.com/open?id=0BwtKdDxl2iyQX2lQZVZsbmJQWWs, the link for "rd_abnorm.rds" is https://drive.google.com/open?id=0BwtKdDxl2iyQSklvNGlla3F6UGs

abnorm_snpid <- readLines("abnorm_snpid.txt")

rd_abnorm <- readRDS("rd_abnorm.rds")

library(BSgenome.Hsapiens.UCSC.hg38)

fa <- BSgenome.Hsapiens.UCSC.hg38

download.file("ftp://ftp.ensembl.org/pub/release-90/gtf/homo_sapiens/Homo_sapiens.GRCh38.90.gtf.gz", "Homo_sapiens.GRCh38.90.gtf.gz")

library(GEOquery)

gunzip("Homo_sapiens.GRCh38.90.gtf.gz")

library(GenomicFeatures); library(VariantAnnotation)

txdb <- makeTxDbFromGFF(file="Homo_sapiens.GRCh38.90.gtf", format="gtf", dataSource="Ensembl", organism="Homo sapiens")

allvar <- locateVariants(rd_abnorm, txdb, AllVariants()) # allvar is empty, no location annotation

# GRanges object with 0 ranges and 9 metadata columns:
#   seqnames    ranges strand | LOCATION  LOCSTART    LOCEND   QUERYID      TXID         CDSID      GENEID
# <Rle> <IRanges>  <Rle> | <factor> <integer> <integer> <integer> <integer> <IntegerList> <character>
#   PRECEDEID        FOLLOWID
# <CharacterList> <CharacterList>
#   -------
#   seqinfo: no sequences

## NOTE: the SNP ids are in dbSNP build 150, which uses hg38 (GRCh38) as reference genome.

sessionInfo()

# R version 3.4.1 (2017-06-30)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 16.04.1 LTS
#
# Matrix products: default
# BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
# LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
#
# locale:
#   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
# [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
# [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#
# attached base packages:
#   [1] parallel  stats4    stats     graphics  utils     datasets  grDevices methods   base     
#
# other attached packages:
#   [1] VariantAnnotation_1.20.3          Rsamtools_1.26.2                  SummarizedExperiment_1.4.0       
# [4] GenomicFeatures_1.26.4            AnnotationDbi_1.36.2              GEOquery_2.40.0                  
# [7] Biobase_2.34.0                    BiocInstaller_1.24.0              BSgenome.Hsapiens.UCSC.hg38_1.4.1
# [10] BSgenome_1.42.0                   rtracklayer_1.34.2                Biostrings_2.42.1                
# [13] XVector_0.14.1                    GenomicRanges_1.26.4              GenomeInfoDb_1.10.3              
# [16] IRanges_2.8.2                     S4Vectors_0.12.2                  BiocGenerics_0.20.0              
# [19] setwidth_1.0-4                    colorout_1.1-2                   
#
# loaded via a namespace (and not attached):
#   [1] Rcpp_0.12.12             compiler_3.4.1           bitops_1.0-6             tools_3.4.1             
# [5] zlibbioc_1.20.0          biomaRt_2.30.0           digest_0.6.12            bit_1.1-12              
# [9] RSQLite_2.0              memoise_1.1.0            tibble_1.3.4             lattice_0.20-35         
# [13] pkgconfig_2.0.1          rlang_0.1.2              Matrix_1.2-11            DBI_0.7                 
# [17] yaml_2.1.14              httr_1.3.1               bit64_0.9-7              grid_3.4.1              
# [21] R6_2.2.2                 XML_3.98-1.9             BiocParallel_1.8.2       blob_1.1.0              
# [25] GenomicAlignments_1.10.1 RCurl_1.95-4.8

Could any one please check and explain to me where the problem is? Thanks!

annotation variantannotation • 2.0k views
ADD COMMENT
0
Entering edit mode

You need to make your Google Drive links publicly accessible.

ADD REPLY
0
Entering edit mode

Hi Nanda,

Sorry about that. I have now figured out how to make the links publicly accessible. Could you please check it again? Thanks!

Yuzhu

ADD REPLY
3
Entering edit mode
@valerie-obenchain-4275
Last seen 3.0 years ago
United States

The IntergenicVariants class has upstream and downstream values that define "how far" from the gene to search. Defaults are shown on the ?IntergenicVariants man page:

IntergenicVariants(upstream = 1e+06L, downstream = 1e+06L, idType=c("gene", "tx"))

If your SNP falls outside of these ranges it won't be returned by locateVariants().

Valerie

 

ADD COMMENT
0
Entering edit mode

Ok, I see. Thanks for your answering!

Yuzhu

ADD REPLY
3
Entering edit mode
@pariksheet-nanda-10892
Last seen 2.6 years ago
University of Connecticut

None of those snips fall within transcripts:

 > intersect(transcripts(txdb), rd_abnorm)
 GRanges object with 0 ranges and 0 metadata columns:
    seqnames    ranges strand
       <Rle> <IRanges>  <Rle>
   -------
   seqinfo: 47 sequences (1 circular) from 2 genomes (hg38, NA); no seqlengths
 >
ADD COMMENT
0
Entering edit mode

I think if the snps don't fall within transcripts, then the location annotation should be intergenic, intron or somewhere else. But in this case, all of the locations are NA... What do you think about it? Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 398 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6