Question

mm10 phastCons data for use with ATACseqQC

1

Entering edit mode

crysis405 ▴ 10

@crysis405-10875

Last seen 5.9 years ago

Just started testing the new ATACseqQC package. Its pretty neat until you want to do nucleosome positioning in something other than human.

One of the things you need is the library(phastCons100way.UCSC.hg19)

No one has made an equivalent library for mouse yet. So I was going to go get it from UCSC: http://hgdownload.cse.ucsc.edu/goldenPath/mm10/phastCons60way/

But now I'm a bit stuck how to convert the bigwig file into GScores. I tried rtracklayer: import("my.bw", as="Rle") which ate up all my RAM.

Any ideas? Thank you

ATACseqQC bsgenome phastcons rtracklayer • 4.3k views

ADD COMMENT • link updated 7.5 years ago by Ou, Jianhong ★ 1.3k • written 7.5 years ago by crysis405 ▴ 10

0

Entering edit mode

Just a comment on the import, it would probably be best to use as="NumericList" for that type of data. It will probably still be pretty big though. You could loop over the chromosomes and use the which argument.

ADD REPLY • link 7.5 years ago Michael Lawrence ★ 11k

score 1 · Answer 1 · 2017-05-23

1

Entering edit mode

Robert Castelo ★ 3.4k

@rcastelo

Last seen 3 hours ago

Barcelona/Universitat Pompeu Fabra

hi, i'm the maintainer of phastCons100way.UCSC.hg19, this package reduces the memory requirements by rounding the phastCons scores to 1-decimal digit, i'm in the process of generating other types of GScores objects, including for instance phyloP scores with a different quantization adapted to this kind of scores, that will become available through the GenomicScores package and the AnnotationHub, if you think this 1-decimal digit rounding is acceptable for your purposes, i could also generate the GScores object for phastCons60way.UCSC.mm10, what do you think?

cheers,

robert.

=====================EDIT==============

hi, a GScores object for phastCons60way.UCSC.mm10 is now available through the devel version of the GenomicScores package as follows:

library(GenomicScores)

gsco <- getGScores("phastCons60way.UCSC.mm10")
gsco
GScores object
# organism: Mus musculus (UCSC)
# provider: UCSC
# provider version: 17Apr2014
# download date: May 24, 2017
# loaded sequences: chr1, chr12
# maximum abs. error: 0.05

please consult http://www.bioconductor.org/developers/how-to/useDevel to see how to install the devel version of Bioconductor including the GenomicScores package version >= 1.1.3.

cheers,

robert.

ADD COMMENT • link 7.5 years ago Robert Castelo ★ 3.4k

0

Entering edit mode

Hi Robert, the authors of ATACseqQC use the phastCons100way.UCSC.hg19 library so I think the rounding should not be a problem. Would be great if you could generate phastCons60way.UCSC.mm10. Thank you.

ADD REPLY • link 7.5 years ago crysis405 ▴ 10

0

Entering edit mode

hi, thanks for your patience, i've edited my answer above, a 'GScores' object has now become available for phastCons60way.UCSC.mm10.

cheers,

robert.

ADD REPLY • link 7.5 years ago Robert Castelo ★ 3.4k

0

Entering edit mode

Hi Robert,

It seems like getGScores is unable to download resources for "phastCons60way.UCSC.mm10". Have the steps changed recently?

> phast <- getGScores("phastCons60way.UCSC.mm10")
snapshotDate(): 2019-12-17
download 20 resources? [y/n] y
  |===========================================================================================================| 100%

  |===============================================================                                            |  59%Error: 19 resources failed to download
In addition: There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: download failed
  web resource path: ‘https://annotationhub.bioconductor.org/fetch/61419’
  local file path: ‘/Users/vats/Library/Caches/AnnotationHub/3f2349922936_61419’
  reason: transfer closed with 5860152 bytes remaining to read
2: bfcadd() failed; resource removed
  rid: BFC62
  fpath: ‘https://annotationhub.bioconductor.org/fetch/61419’
  reason: download failed
3: download failed
  hub path: ‘https://annotationhub.bioconductor.org/fetch/61419’
  cache resource: ‘AH54681 : 61419’ 
reason: bfcadd() failed; see warnings()
4: download failed
  web resource path: ‘https://annotationhub.bioconductor.org/fetch/61420’
  local file path: ‘/Users/vats/Library/Caches/AnnotationHub/390c421e6eb_61420’
  reason: transfer closed with 7947816 bytes remaining to read
5: bfcadd() failed; resource removed
  rid: BFC4
  fpath: ‘https://annotationhub.bioconductor.org/fetch/61420’
  reason: download failed
6: download failed
  hub path: ‘https://annotationhub.bioconductor.org/fetch/61420’
  cache resource: ‘AH54682 : 61420’
  reason: bfcadd() failed; see warnings()
7: download failed
  web resource path: ‘https://annotationhub.bioconductor.org/fetch/61421’
  local file path: ‘/Users/vats/Library/Caches/AnnotationHub/390c4dbf506b_61421’
  reason: transfer closed with 8983667 bytes remaining to read
8: bfcadd() failed; resource removed
  rid: BFC5
  fpath: ‘https://annotationhub.bioconductor.org/fetch/61421’
  reason: download failed

Thanks!

ADD REPLY • link 4.9 years ago zerotimer • 0

1

Entering edit mode

hi,

thanks for reporting this problem. We're transitioning the server that hosts genomic scores available through the AnnotationHub from a http:// protocol to a SSL-compliant https:// protocol and it will take a few days until this transition is complete. in the meantime, we've just autogenerated a certificate that seems to make it work at least in the release version of Bioconductor.

so, please try the above with the latest Bioconductor release version of GenomicScores (1.10.0), it seems to work in my computer:

library(GenomicScores)

phast <- getGScores("phastCons60way.UCSC.mm10")
snapshotDate(): 2019-10-29
download 59 resources? [y/n] y
  |======================================================================| 100%

[...]

  |======================================================================| 100%

loading from cache
phast
GScores object 
# organism: Mus musculus (UCSC, mm10)
# provider: UCSC
# provider version: 17Apr2014
# download date: May 24, 2017
# loaded sequences: default
# maximum abs. error: 0.05
# use 'citation()' to cite these data in publications

let me clarify that in my 2+ years old answer above, i asked to use the devel version of Bioconductor because, back in that moment, that was the only way to get immediate access to the just added resources for phastCons60way.UCSC.mm10. However, these are currently available in release and there's no need for you to use the Bioconductor devel version to download these genomic scores.

cheers,

robert.

ADD REPLY • link 4.9 years ago Robert Castelo ★ 3.4k

0

Entering edit mode

Oh thanks Robert! I initially thought my error is due to not using devel version so I upgraded R (4.0) and then bioconductor but of course that didn't work :)

Thanks for autogenerating a certificate. I tried downloading with same R and bioconductor version (3.11) and it still giving me the same error. I guess I should downgrade now and then try the download? I am on 1.11.2 for GenomicScores

ADD REPLY • link 4.9 years ago zerotimer • 0

1

Entering edit mode

hi,

to use the release version of Bioconductor you should follow the instructions at http://bioconductor.org/install, which implies installing a release version of R (3.6.x) and then following the rest of the instructions.

to check whether you've successfully installed the current release version of Bioconductor (as of December 2019), when you call the function version() from the BiocManager package, you should get 3.10:

BiocManager::version()
[1] ‘3.10’

then you may proceed to install the current release version of GenomicScores, which is 1.10.0 by simply doing:

BiocManager::install("GenomicScores")

using this version of GenomicScores i'm able, in my linux box, to successfully download the phastCons60way.UCSC.mm10 score set, as shown above. due to the current problem with the SSL certificate, the development version does not work and that's why my advice for you is to use the release version of GenomicScores. i don't know why, for this particular situation with the temporary autogenerated certificate, the release version works and the development doesn't.

cheers,

robert.

ADD REPLY • link 4.9 years ago Robert Castelo ★ 3.4k

score 1 · Answer 2 · 2017-05-23

Here is my old code to generate phastCons60w.UCSC.mm10. It may not work because it is not in GScores format. But maybe you can have a try.

system("wget http://hgdownload.cse.ucsc.edu/goldenPath/mm10/phastCons60way/mm10.60way.phastCons.bw")

library(BSgenome.Mmusculus.UCSC.mm10)
library(GenomeInfoDb)
library(rtracklayer)
seqlen <- seqlengths(Mmusculus)
gr <- GRanges(names(seqlen), IRanges(1, seqlen))
for(i in 1:length(gr)){
 g <- gr[i]
 d <- import("mm10.60way.phastCons.bw", selection=BigWigSelection(ranges=g), as="RleList")
 saveRDS(d[[as.character(seqnames(g))]], paste0("phastCons60way.UCSC.mm10.", as.character(seqnames(g)), ".rds"))
}

refgenomeGD <- GenomeDescription(organism=organism(Mmusculus),
                                 common_name=commonName(Mmusculus),
                                 provider=provider(Mmusculus),
                                 provider_version=providerVersion(Mmusculus),
                                 release_date=releaseDate(Mmusculus),
                                 release_name=releaseName(Mmusculus),
                                 seqinfo=Seqinfo(seqnames=seqnames(Mmusculus),
                                                 seqlengths=seqlengths(Mmusculus),
                                                 isCircular=isCircular(Mmusculus),
                                                 genome=releaseName(Mmusculus)))

saveRDS(refgenomeGD, "refgenomeGD.rds")
unlink("mm10.60way.phastCons.bw")