generate a phastCons GScores annotation object from 7G hg38.phastCons30way.bw file?
1
0
Entering edit mode
Paul Shannon ▴ 470
@paul-shannon-5944
Last seen 23 months ago
United States

An introduction to the GenomicScores package explains how to use already-built annotation package, for instance,

phastCons100way.UCSC.hg19

and how to retrieve one from the AnnotationHub.

I have been unable to find instructions for creating a (lossy) annotation package from a bigwig file, such as the 7G 30-way hg38.phastCons30way.bw file. rtracklayer makes it easy to read the bigwig file, but for consistency with other code I'd like to use GenomicScores and the gscores method.

Any suggestions?

  • Paul
annotation • 603 views
ADD COMMENT
1
Entering edit mode
Robert Castelo ★ 3.3k
@rcastelo
Last seen 5 days ago
Barcelona/Universitat Pompeu Fabra

hi Paul,

the way in which annotation packages can be created is not explained and i realize now i should write something about that in the vignette, although we already wrote in the conclusions of the GenomicScores paper that "Additional score sets can be added on request at the Bioconductor support site".

the process in fact requires some manual intervention and for that reason what i do is to build the resources myself once i get such a request. in this case, i've already processed the scores for phastCons30way in hg38 and submit the corresponding records to be added to the AnnotationHub. this means that in a few days, you'll be able to do:

gsco <- getGScores("phastCons30way.UCSC.hg38")

however, this will only work, as of January 2020, running it on the development version of R and Bioconductor (BioC 3.11 and GenomicScores version 1.11.3). Only after the next BioC release in April 2020, these AH resources will become available as part of the release version of R and Bioconductor.

if you or anybody else in this forum wants to find out how to produce these scores, you can find the scripts in the scripts directory of the package:

head(list.files(system.file("scripts", package="GenomicScores")))
[1] "make-data_CADD.v1.3.hg19.R"               
[2] "make-data_fitCons.UCSC.hg19.R"            
[3] "make-data_linsight.UCSC.hg19.R"           
[4] "make-data_MafDb.1Kgenomes.phase1.GRCh38.R"
[5] "make-data_MafDb.1Kgenomes.phase1.hs37d5.R"
[6] "make-data_MafDb.1Kgenomes.phase3.GRCh38.R"

and the process consists of taking one of those scripts that may process similar scores to the ones you want to process and adapt the script. because this may be non-trivial to most users, my recommendation is to just ask here at the forum as i mentioned before, or open an issue in the GitHub repo.

let me know if you encounter any problem using these new set of scores as i have not tested them.

cheers,

robert.

ADD COMMENT
0
Entering edit mode

Many thanks, Robert! I am downloading the annotation package now.

ADD REPLY

Login before adding your answer.

Traffic: 708 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6