gwascat Liftover Fails Silently
0
0
Entering edit mode
anailis • 0
@user-24555
Last seen 3.2 years ago

I would like to obtain GWAS Catalog data on hg19 co-ordinates so I use makeCurrentGwascat(genome = "GRCh37"). Everything appears to go smoothly and the object says that it is on hg19 co-ordinates. But if you compare the co-ordinates to what they should be, you will quickly realise that they are still on hg38 co-ordinates despite the liftover appearing to complete successfully. For example, rs56106601 hg38's co-ordinates are chr9:128008205 and its hg19 co-ordinates are chr9:130770484 (https://www.ncbi.nlm.nih.gov/snp/rs56106601), but attempt@elementMetadata[attempt@elementMetadata$SNPS=="rs56106601","CHR_POS"] yields 128008205 - hg38 co-ordinates.

> library(gwascat)
gwascat loaded.  Use makeCurrentGwascat() to extract current image.
 from EBI.  The data folder of this package has some legacy extracts.
> library(liftOver)
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind,
    colnames, dirname, do.call, duplicated, eval, evalq, Filter,
    Find, get, grep, grepl, intersect, is.unsorted, lapply, Map,
    mapply, match, mget, order, paste, pmax, pmax.int, pmin,
    pmin.int, Position, rank, rbind, Reduce, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit,
    which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges

Attaching package: ‘IRanges’

The following object is masked from ‘package:grDevices’:

    windows

Loading required package: GenomeInfoDb
Loading required package: rtracklayer
Loading required package: Homo.sapiens
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages
    'citation("pkgname")'.

Loading required package: OrganismDbi
Loading required package: GenomicFeatures
Loading required package: GO.db

Loading required package: org.Hs.eg.db

Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene
> attempt <- makeCurrentGwascat(genome = "GRCh37", withOnt = F)
trying URL 'http://www.ebi.ac.uk/gwas/api/search/downloads/full'
Content type 'text/tsv' length unknown
ownloaded 125.9 MB

|============================================================| 100% 126 MB
Warning: 5303 parsing failures.
row     col               expected            actual                                                                      file
669 CHR_POS no trailing characters 27556782;27543283 'C:\Users\XXX\AppData\Local\Temp\RtmpO0PFpO\file3a1c3aa8659b'
670 CHR_POS no trailing characters 27556782;27543283 'C:\Users\XXX\AppData\Local\Temp\RtmpO0PFpO\file3a1c3aa8659b'
851 CHR_POS no trailing characters 34150806;34141638 'C:\Users\XXX\AppData\Local\Temp\RtmpO0PFpO\file3a1c3aa8659b'
852 CHR_POS no trailing characters 34150806;34141638 'C:\Users\XXX\AppData\Local\Temp\RtmpO0PFpO\file3a1c3aa8659b'
853 CHR_POS no trailing characters 34150806;34141638 'C:\Users\XXX\AppData\Local\Temp\RtmpO0PFpO\file3a1c3aa8659b'
... ....... ...................... ................. .........................................................................
See problems(...) for more details.

formatting gwaswloc instance...
NOTE: input data had non-ASCII characters replaced by '*'.
Loading required namespace: AnnotationHub
starting liftover from GRCh38 to GRCh37
snapshotDate(): 2020-10-27
loading from cache
liftover complete.
done.
> attempt
gwasloc instance with 235055 records and 34 attributes per record.
Extracted:  Wed Feb 03 16:30:36 2021 
metadata()$badpos includes records for which no unique locus was given.
Genome:  GRCh37 
Excerpt:
GRanges object with 5 ranges and 3 metadata columns:
      seqnames    ranges strand |   DISEASE/TRAIT        SNPS   P-VALUE
         <Rle> <IRanges>  <Rle> |     <character> <character> <numeric>
  [1]        8  19844222      * | HDL cholesterol  rs12678919     2e-34
  [2]       18  47167214      * | HDL cholesterol   rs4939883     7e-15
  [3]       11 116648917      * | HDL cholesterol    rs964184     1e-12
  [4]        9 107664301      * | HDL cholesterol   rs1883025     1e-09
  [5]        1 230295691      * | HDL cholesterol   rs4846914     4e-08
  -------
  seqinfo: 24 sequences from GRCh37 genome
> attempt@elementMetadata[attempt@elementMetadata$SNPS=="rs56106601","CHR_POS"]
[1] 128008205

Session info etc:

> BiocManager::valid()

* sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252 
[2] LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] liftOver_1.14.0                        
 [2] Homo.sapiens_1.3.1                     
 [3] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [4] org.Hs.eg.db_3.12.0                    
 [5] GO.db_3.12.1                           
 [6] OrganismDbi_1.32.0                     
 [7] GenomicFeatures_1.42.1                 
 [8] AnnotationDbi_1.52.0                   
 [9] Biobase_2.50.0                         
[10] rtracklayer_1.49.5                     
[11] GenomicRanges_1.42.0                   
[12] GenomeInfoDb_1.26.2                    
[13] IRanges_2.24.1                         
[14] S4Vectors_0.28.1                       
[15] BiocGenerics_0.36.0                    
[16] gwascat_2.22.0                         

loaded via a namespace (and not attached):
 [1] bitops_1.0-6                  matrixStats_0.58.0           
 [3] bit64_4.0.5                   progress_1.2.2               
 [5] httr_1.4.2                    tools_4.0.3                  
 [7] R6_2.5.0                      DBI_1.1.1                    
 [9] withr_2.4.1                   tidyselect_1.1.0             
[11] prettyunits_1.1.1             bit_4.0.4                    
[13] curl_4.3                      compiler_4.0.3               
[15] graph_1.68.0                  pacman_0.5.1                 
[17] xml2_1.3.2                    DelayedArray_0.16.1          
[19] readr_1.4.0                   RBGL_1.66.0                  
[21] askpass_1.1                   rappdirs_0.3.3               
[23] stringr_1.4.0                 digest_0.6.27                
[25] Rsamtools_2.6.0               XVector_0.30.0               
[27] pkgconfig_2.0.3               htmltools_0.5.1.1            
[29] MatrixGenerics_1.2.1          dbplyr_2.0.0                 
[31] fastmap_1.1.0                 BSgenome_1.58.0              
[33] rlang_0.4.10                  rstudioapi_0.13              
[35] RSQLite_2.2.3                 shiny_1.6.0                  
[37] generics_0.1.0                BiocParallel_1.24.1          
[39] dplyr_1.0.4                   VariantAnnotation_1.36.0     
[41] RCurl_1.98-1.2                magrittr_2.0.1               
[43] GenomeInfoDbData_1.2.4        Matrix_1.3-2                 
[45] Rcpp_1.0.6                    lifecycle_0.2.0              
[47] stringi_1.5.3                 yaml_2.2.1                   
[49] SummarizedExperiment_1.20.0   zlibbioc_1.36.0              
[51] BiocFileCache_1.14.0          AnnotationHub_2.22.0         
[53] grid_4.0.3                    blob_1.2.1                   
[55] promises_1.1.1                snpStats_1.40.0              
[57] crayon_1.4.0                  lattice_0.20-41              
[59] Biostrings_2.58.0             splines_4.0.3                
[61] hms_1.0.0                     pillar_1.4.7                 
[63] biomaRt_2.46.2                XML_3.99-0.5                 
[65] glue_1.4.2                    BiocVersion_3.12.0           
[67] BiocManager_1.30.10           vctrs_0.3.6                  
[69] httpuv_1.5.5                  openssl_1.4.3                
[71] purrr_0.3.4                   assertthat_0.2.1             
[73] cachem_1.0.1                  mime_0.9                     
[75] xtable_1.8-4                  later_1.1.0.1                
[77] survival_3.2-7                tibble_3.0.6                 
[79] GenomicAlignments_1.26.0      memoise_2.0.0                
[81] ellipsis_0.3.1                interactiveDisplayBase_1.28.0

Bioconductor version '3.12'

  * 1 packages out-of-date
  * 0 packages too new

create a valid installation with

  BiocManager::install("rtracklayer", update = TRUE, ask = FALSE)

more details: BiocManager::valid()$too_new, BiocManager::valid()$out_of_date

Warning message:
1 packages out-of-date; 0 packages too new

Note that rtracklayer is still outdated because I'm a Windows user (see https://support.bioconductor.org/p/p133891/).

gwascat • 522 views
ADD COMMENT

Login before adding your answer.

Traffic: 732 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6