[gwascat] Updated GWAS catalog from file
2
1
Entering edit mode
enricoferrero ▴ 660
@enricoferrero-6037
Last seen 3.0 years ago
Switzerland

Hello,

Is there any way to get an up-to-date version of the GWAS catalog starting from an existing file? Currently the version accessible with data(ebicat38) is outdated (January 2016) and the function makeCurrentGwascat() only accepts a URL as argument, not an existing file.

I'm preparing some training material where students won't be able to rely on internet connection so I need a way to create a gwasloc object from a file on the hard drive. The file in question is exactly the same that gets downloaded and parsed by makeCurrentGwascat(), i.e.: https://www.ebi.ac.uk/gwas/api/search/downloads/alternative

Thank you!

Code illustrating the problem:

> library(gwascat)
Loading required package: Homo.sapiens
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, cbind, colMeans, colnames, colSums, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff,
    sort, table, tapply, union, unique, unsplit, which, which.max, which.min

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: OrganismDbi
Loading required package: GenomicFeatures
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: GO.db

Loading required package: org.Hs.eg.db

Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene
gwascat loaded.  Use data(ebicat38) for hg38 coordinates;
 data(ebicat37) for hg19 coordinates.

> packageVersion("gwascat")
[1] '2.10.0'

# this version is out of date, see date in 'Extracted'
> data(ebicat38)
> ebicat38
gwasloc instance with 22714 records and 36 attributes per record.
Extracted:  2016-01-18
Genome:  GRCh38
Excerpt:
GRanges object with 5 ranges and 3 metadata columns:
      seqnames               ranges strand |                  DISEASE/TRAIT        SNPS   P-VALUE
         <Rle>            <IRanges>  <Rle> |                    <character> <character> <numeric>
  [1]       11 [41798900, 41798900]      * | Post-traumatic stress disorder  rs10768747     5e-06
  [2]       15 [34768262, 34768262]      * | Post-traumatic stress disorder  rs12232346     2e-06
  [3]        8 [96500749, 96500749]      * | Post-traumatic stress disorder   rs2437772     6e-06
  [4]        9 [98221544, 98221544]      * | Post-traumatic stress disorder   rs7866350     1e-06
  [5]       15 [54423444, 54423444]      * | Post-traumatic stress disorder  rs73419609     6e-06
  -------
  seqinfo: 23 sequences from GRCh38 genome

# this fails because I'm not connected
> makeCurrentGwascat()
running read.delim on http://www.ebi.ac.uk/gwas/api/search/downloads/alternative...
Error in open.connection(file, "rt") : cannot open the connection

# this also fails because the function expects a URL, not a file
> makeCurrentGwascat("gwas.catalog.txt")
running read.delim on gwas.catalog.txt...
Error in url(table.url) : URL scheme unsupported by this method

 

gwascat • 2.1k views
ADD COMMENT
2
Entering edit mode
enricoferrero ▴ 660
@enricoferrero-6037
Last seen 3.0 years ago
Switzerland

Based on Robert Castelo's answer below, I ended up doing this:

download.file("http://www.ebi.ac.uk/gwas/api/search/downloads/alternative", destfile = "gwas_catalog_v1.0.1-associations_e90_r2017-12-04.tsv")
snps <- read.delim("gwas_catalog_v1.0.1-associations_e90_r2017-12-04.tsv", check.names = FALSE, stringsAsFactors = FALSE)
snps <- gwascat:::gwdf2GRanges(snps, extractDate = "2017-12-04")
genome(snps) <- "GRCh38"

Which returns an object similar to what you would get with makeCurrentGwascat()

ADD COMMENT
1
Entering edit mode
Robert Castelo ★ 3.4k
@rcastelo
Last seen 2 days ago
Barcelona/Universitat Pompeu Fabra

Assuming the downloaded file is called 'gwas_catalog_v1.0.1-associations_e90_r2017-09-12.tsv', what about

gwascat <- read.delim("gwas_catalog_v1.0.1-associations_e90_r2017-09-12.tsv", sep="\t", header=TRUE, stringsAsFactors=FALSE)

?

just look at the source code of 'makeCurrentGwascat()' and you'll find out the few instructions to build the object from this file.

cheers,

robert.

ADD COMMENT
0
Entering edit mode

Thanks. Yes, this would probably work but it's not very elegant - especially considering this is training material for students.

I wonder wheter it would not be easier and more straightforward to follow to simply create a GRanges object at that point?

ADD REPLY
1
Entering edit mode

I see three options: write a wrapper function for your students, parse the gwascat file yourself and provide your students directly with the 'GRanges' object, or contact the package maintainer to request further functionality.

ADD REPLY

Login before adding your answer.

Traffic: 947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6