[gwascat] Updated GWAS catalog from file
Entering edit mode
enricoferrero ▴ 630
Last seen 12 months ago


Is there any way to get an up-to-date version of the GWAS catalog starting from an existing file? Currently the version accessible with data(ebicat38) is outdated (January 2016) and the function makeCurrentGwascat() only accepts a URL as argument, not an existing file.

I'm preparing some training material where students won't be able to rely on internet connection so I need a way to create a gwasloc object from a file on the hard drive. The file in question is exactly the same that gets downloaded and parsed by makeCurrentGwascat(), i.e.: https://www.ebi.ac.uk/gwas/api/search/downloads/alternative

Thank you!

Code illustrating the problem:

> library(gwascat)
Loading required package: Homo.sapiens
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, cbind, colMeans, colnames, colSums, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff,
    sort, table, tapply, union, unique, unsplit, which, which.max, which.min

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:


Loading required package: OrganismDbi
Loading required package: GenomicFeatures
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: GO.db

Loading required package: org.Hs.eg.db

Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene
gwascat loaded.  Use data(ebicat38) for hg38 coordinates;
 data(ebicat37) for hg19 coordinates.

> packageVersion("gwascat")
[1] '2.10.0'

# this version is out of date, see date in 'Extracted'
> data(ebicat38)
> ebicat38
gwasloc instance with 22714 records and 36 attributes per record.
Extracted:  2016-01-18
Genome:  GRCh38
GRanges object with 5 ranges and 3 metadata columns:
      seqnames               ranges strand |                  DISEASE/TRAIT        SNPS   P-VALUE
         <Rle>            <IRanges>  <Rle> |                    <character> <character> <numeric>
  [1]       11 [41798900, 41798900]      * | Post-traumatic stress disorder  rs10768747     5e-06
  [2]       15 [34768262, 34768262]      * | Post-traumatic stress disorder  rs12232346     2e-06
  [3]        8 [96500749, 96500749]      * | Post-traumatic stress disorder   rs2437772     6e-06
  [4]        9 [98221544, 98221544]      * | Post-traumatic stress disorder   rs7866350     1e-06
  [5]       15 [54423444, 54423444]      * | Post-traumatic stress disorder  rs73419609     6e-06
  seqinfo: 23 sequences from GRCh38 genome

# this fails because I'm not connected
> makeCurrentGwascat()
running read.delim on http://www.ebi.ac.uk/gwas/api/search/downloads/alternative...
Error in open.connection(file, "rt") : cannot open the connection

# this also fails because the function expects a URL, not a file
> makeCurrentGwascat("gwas.catalog.txt")
running read.delim on gwas.catalog.txt...
Error in url(table.url) : URL scheme unsupported by this method


gwascat • 1.2k views
Entering edit mode
enricoferrero ▴ 630
Last seen 12 months ago

Based on Robert Castelo's answer below, I ended up doing this:

download.file("http://www.ebi.ac.uk/gwas/api/search/downloads/alternative", destfile = "gwas_catalog_v1.0.1-associations_e90_r2017-12-04.tsv")
snps <- read.delim("gwas_catalog_v1.0.1-associations_e90_r2017-12-04.tsv", check.names = FALSE, stringsAsFactors = FALSE)
snps <- gwascat:::gwdf2GRanges(snps, extractDate = "2017-12-04")
genome(snps) <- "GRCh38"

Which returns an object similar to what you would get with makeCurrentGwascat()

Entering edit mode
Robert Castelo ★ 3.0k
Last seen 6 days ago
Barcelona/Universitat Pompeu Fabra

Assuming the downloaded file is called 'gwas_catalog_v1.0.1-associations_e90_r2017-09-12.tsv', what about

gwascat <- read.delim("gwas_catalog_v1.0.1-associations_e90_r2017-09-12.tsv", sep="\t", header=TRUE, stringsAsFactors=FALSE)


just look at the source code of 'makeCurrentGwascat()' and you'll find out the few instructions to build the object from this file.



Entering edit mode

Thanks. Yes, this would probably work but it's not very elegant - especially considering this is training material for students.

I wonder wheter it would not be easier and more straightforward to follow to simply create a GRanges object at that point?

Entering edit mode

I see three options: write a wrapper function for your students, parse the gwascat file yourself and provide your students directly with the 'GRanges' object, or contact the package maintainer to request further functionality.


Login before adding your answer.

Traffic: 601 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6