Question

GenomicScores cannot find phastCons30way.UCSC.hg38 though package is loaded

0

Entering edit mode

Paul Shannon ▴ 470

@paul-shannon-5944

Last seen 2.5 years ago

United States

This works fine:

gscores(phastCons100way.UCSC.hg38, GRanges(tbl.fimo[, c("chrom", "start", "end")]))
 GRanges object with 72 ranges and 1 metadata column:

as does this:

gscores(phastCons7way.UCSC.hg38, GRanges(tbl.fimo[, c("chrom", "start", "end")]))
 GRanges object with 72 ranges and 1 metadata column:

but this fails:

gscores(phastCons30way.UCSC.hg38, GRanges(tbl.fimo[, c("chrom", "start", "end")]))
 Error in gscores(phastCons30way.UCSC.hg38, GRanges(tbl.fimo[, c("chrom",  :
   object 'phastCons30way.UCSC.hg38' not found

I explicitly load all three phast files (using the 30way kindly provided by Robert Castelo a couple of months ago, and now also available in the AnnotationHub).`

sessionInfo() # with lots of information deleted, but the relevant information preserved

R Under development (unstable) (2020-01-29 r77745)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
...
[13] TrenaProjectErythropoiesis_1.0.4   TrenaProjectHG38_1.2.7
[15] TrenaMultiScore_1.0.19             MotifDb_1.29.6
[17] Biostrings_2.55.6                  XVector_0.27.1
[19] GenomicRanges_1.39.2               GenomeInfoDb_1.23.13
[21] IRanges_2.21.5                     S4Vectors_0.25.13
[23] BiocGenerics_0.33.2                TrenaProject_1.2.5

loaded via a namespace (and not attached):
  AnnotationHub_2.19.7
  phastCons100way.UCSC.hg38_3.7.1
  phastCons7way.UCSC.hg38_3.7.1
  phastCons30way.UCSC.hg38_3.11.1
  GenomicFeatures_1.39.7
  GenomicScores_1.11.5

annotation • 809 views

ADD COMMENT • link updated 4.7 years ago by Robert Castelo ★ 3.4k • written 4.7 years ago by Paul Shannon ▴ 470

score 1 · Answer 1 · 2020-03-24

hi Paul,

i think i know what's happening. one quick solution is to download the scores as AnnotationHub resources from your current R-devel BioC installation. this means doing:

library(GenomicScores)

phast <- getGScores("phastCons30way.UCSC.hg38")
snapshotDate(): 2020-02-28
download 333 resources? [y/n] y
  |======================================================================| 100%
[...]
  |======================================================================| 100%
phast
GScores object 
# organism: Homo sapiens (UCSC, hg38)
# provider: UCSC
# provider version: 03Nov2017
# download date: Jan 10, 2020
# loaded sequences: default
# maximum abs. error: 0.05
# use 'citation()' to cite these data in publications
gscores(phast, GRanges("chr7:117592326-117592330"))
GRanges object with 1 range and 1 metadata column:
      seqnames              ranges strand |   default
         <Rle>           <IRanges>  <Rle> | <numeric>
  [1]     chr7 117592326-117592330      * |      0.58
  -------
  seqinfo: 1 sequence from Genome Reference Consortium GRCh38 genome; no seqlengths

now, the problem is with generating a standard annotation package with the function makeGScoresPackage(), installing the package and then updating BioC packages. by default the generated package is going to have, as of March 2020, version 3.11.0, note that your phastCons30way.UCSC.hg38 package does not have version 3.11.0 but 3.11.1. this is because the BioC core team has decided that AnnotationHub resources must have a corresponding annotation package, not holding the data, but metadata, that is just having a manual page with the information about how the data was contributed and by whom. you can see that this package already exists in BioC devel here http://www.bioconductor.org/packages/devel/data/annotation/html/phastCons30way.UCSC.hg38.html. therefore, if you've self-created annotation package has version 3.11.0, when BioC is updated, it will be replaced by this metadata package.

there are situations in which one may want to have genomic scores stored in an standard annotation package, as for instance, when deploying its use in a cluster behind a firewall that precludes online access to the AnnotationHub, so this metadata package gets a bit in the way of that purpose because of having identical name. so, by now, if you want to use the annotation package, the only workaround i see is to increase the version of your annotation package to 3.11.2, re-build and install and it won't get updated when you update BioC packages.

cheers,

robert.