Search
Question: SNP location annotation problem in "VariantAnnotation" package
1
gravatar for yduan004
11 weeks ago by
yduan00420
Riverside
yduan00420 wrote:

I couldn't get location annotation for some SNPs by using "VariantAnnotation" package.

For some SNP ids, eventhough their GRanges data are normal as others, they couldn't be annotated by using "locateVariants" function. Here is the code I ran. The link for "abnorm_snpid.txt" is https://drive.google.com/open?id=0BwtKdDxl2iyQX2lQZVZsbmJQWWs, the link for "rd_abnorm.rds" is https://drive.google.com/open?id=0BwtKdDxl2iyQSklvNGlla3F6UGs

abnorm_snpid <- readLines("abnorm_snpid.txt")

rd_abnorm <- readRDS("rd_abnorm.rds")

library(BSgenome.Hsapiens.UCSC.hg38)

fa <- BSgenome.Hsapiens.UCSC.hg38

download.file("ftp://ftp.ensembl.org/pub/release-90/gtf/homo_sapiens/Homo_sapiens.GRCh38.90.gtf.gz", "Homo_sapiens.GRCh38.90.gtf.gz")

library(GEOquery)

gunzip("Homo_sapiens.GRCh38.90.gtf.gz")

library(GenomicFeatures); library(VariantAnnotation)

txdb <- makeTxDbFromGFF(file="Homo_sapiens.GRCh38.90.gtf", format="gtf", dataSource="Ensembl", organism="Homo sapiens")

allvar <- locateVariants(rd_abnorm, txdb, AllVariants()) # allvar is empty, no location annotation

# GRanges object with 0 ranges and 9 metadata columns:
#   seqnames    ranges strand | LOCATION  LOCSTART    LOCEND   QUERYID      TXID         CDSID      GENEID
# <Rle> <IRanges>  <Rle> | <factor> <integer> <integer> <integer> <integer> <IntegerList> <character>
#   PRECEDEID        FOLLOWID
# <CharacterList> <CharacterList>
#   -------
#   seqinfo: no sequences

## NOTE: the SNP ids are in dbSNP build 150, which uses hg38 (GRCh38) as reference genome.

sessionInfo()

# R version 3.4.1 (2017-06-30)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 16.04.1 LTS
#
# Matrix products: default
# BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
# LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
#
# locale:
#   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
# [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
# [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#
# attached base packages:
#   [1] parallel  stats4    stats     graphics  utils     datasets  grDevices methods   base     
#
# other attached packages:
#   [1] VariantAnnotation_1.20.3          Rsamtools_1.26.2                  SummarizedExperiment_1.4.0       
# [4] GenomicFeatures_1.26.4            AnnotationDbi_1.36.2              GEOquery_2.40.0                  
# [7] Biobase_2.34.0                    BiocInstaller_1.24.0              BSgenome.Hsapiens.UCSC.hg38_1.4.1
# [10] BSgenome_1.42.0                   rtracklayer_1.34.2                Biostrings_2.42.1                
# [13] XVector_0.14.1                    GenomicRanges_1.26.4              GenomeInfoDb_1.10.3              
# [16] IRanges_2.8.2                     S4Vectors_0.12.2                  BiocGenerics_0.20.0              
# [19] setwidth_1.0-4                    colorout_1.1-2                   
#
# loaded via a namespace (and not attached):
#   [1] Rcpp_0.12.12             compiler_3.4.1           bitops_1.0-6             tools_3.4.1             
# [5] zlibbioc_1.20.0          biomaRt_2.30.0           digest_0.6.12            bit_1.1-12              
# [9] RSQLite_2.0              memoise_1.1.0            tibble_1.3.4             lattice_0.20-35         
# [13] pkgconfig_2.0.1          rlang_0.1.2              Matrix_1.2-11            DBI_0.7                 
# [17] yaml_2.1.14              httr_1.3.1               bit64_0.9-7              grid_3.4.1              
# [21] R6_2.2.2                 XML_3.98-1.9             BiocParallel_1.8.2       blob_1.1.0              
# [25] GenomicAlignments_1.10.1 RCurl_1.95-4.8

Could any one please check and explain to me where the problem is? Thanks!

ADD COMMENTlink modified 10 weeks ago by Valerie Obenchain ♦♦ 6.4k • written 11 weeks ago by yduan00420

You need to make your Google Drive links publicly accessible.

ADD REPLYlink written 11 weeks ago by Pariksheet Nanda70

Hi Nanda,

Sorry about that. I have now figured out how to make the links publicly accessible. Could you please check it again? Thanks!

Yuzhu

ADD REPLYlink written 10 weeks ago by yduan00420
2
gravatar for Valerie Obenchain
10 weeks ago by
Valerie Obenchain ♦♦ 6.4k
United States
Valerie Obenchain ♦♦ 6.4k wrote:

The IntergenicVariants class has upstream and downstream values that define "how far" from the gene to search. Defaults are shown on the ?IntergenicVariants man page:

IntergenicVariants(upstream = 1e+06L, downstream = 1e+06L, idType=c("gene", "tx"))

If your SNP falls outside of these ranges it won't be returned by locateVariants().

Valerie

 

ADD COMMENTlink written 10 weeks ago by Valerie Obenchain ♦♦ 6.4k

Ok, I see. Thanks for your answering!

Yuzhu

ADD REPLYlink written 9 weeks ago by yduan00420
2
gravatar for Pariksheet Nanda
10 weeks ago by
University of Connecticut
Pariksheet Nanda70 wrote:

None of those snips fall within transcripts:

 > intersect(transcripts(txdb), rd_abnorm)
 GRanges object with 0 ranges and 0 metadata columns:
    seqnames    ranges strand
       <Rle> <IRanges>  <Rle>
   -------
   seqinfo: 47 sequences (1 circular) from 2 genomes (hg38, NA); no seqlengths
 >
ADD COMMENTlink written 10 weeks ago by Pariksheet Nanda70

I think if the snps don't fall within transcripts, then the location annotation should be intergenic, intron or somewhere else. But in this case, all of the locations are NA... What do you think about it? Thanks!

ADD REPLYlink written 10 weeks ago by yduan00420
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 113 users visited in the last hour