Question

generate a phastCons GScores annotation object from 7G hg38.phastCons30way.bw file?

0

Entering edit mode

Paul Shannon ▴ 470

@paul-shannon-5944

Last seen 2.5 years ago

United States

An introduction to the GenomicScores package explains how to use already-built annotation package, for instance,

phastCons100way.UCSC.hg19

and how to retrieve one from the AnnotationHub.

I have been unable to find instructions for creating a (lossy) annotation package from a bigwig file, such as the 7G 30-way hg38.phastCons30way.bw file. rtracklayer makes it easy to read the bigwig file, but for consistency with other code I'd like to use GenomicScores and the gscores method.

Any suggestions?

Paul

annotation • 744 views

ADD COMMENT • link updated 4.9 years ago by Robert Castelo ★ 3.4k • written 4.9 years ago by Paul Shannon ▴ 470

score 1 · Answer 1 · 2020-01-10

hi Paul,

the way in which annotation packages can be created is not explained and i realize now i should write something about that in the vignette, although we already wrote in the conclusions of the GenomicScores paper that "Additional score sets can be added on request at the Bioconductor support site".

the process in fact requires some manual intervention and for that reason what i do is to build the resources myself once i get such a request. in this case, i've already processed the scores for phastCons30way in hg38 and submit the corresponding records to be added to the AnnotationHub. this means that in a few days, you'll be able to do:

gsco <- getGScores("phastCons30way.UCSC.hg38")

however, this will only work, as of January 2020, running it on the development version of R and Bioconductor (BioC 3.11 and GenomicScores version 1.11.3). Only after the next BioC release in April 2020, these AH resources will become available as part of the release version of R and Bioconductor.

if you or anybody else in this forum wants to find out how to produce these scores, you can find the scripts in the scripts directory of the package:

head(list.files(system.file("scripts", package="GenomicScores")))
[1] "make-data_CADD.v1.3.hg19.R"               
[2] "make-data_fitCons.UCSC.hg19.R"            
[3] "make-data_linsight.UCSC.hg19.R"           
[4] "make-data_MafDb.1Kgenomes.phase1.GRCh38.R"
[5] "make-data_MafDb.1Kgenomes.phase1.hs37d5.R"
[6] "make-data_MafDb.1Kgenomes.phase3.GRCh38.R"

and the process consists of taking one of those scripts that may process similar scores to the ones you want to process and adapt the script. because this may be non-trivial to most users, my recommendation is to just ask here at the forum as i mentioned before, or open an issue in the GitHub repo.

let me know if you encounter any problem using these new set of scores as i have not tested them.

cheers,

robert.