Search
Question: mm10 phastCons data for use with ATACseqQC
1
13 months ago by
crysis40510
crysis40510 wrote:

Just started testing the new ATACseqQC package. Its pretty neat until you want to do nucleosome positioning in something other than human.

One of the things you need is the library(phastCons100way.UCSC.hg19)

No one has made an equivalent library for mouse yet. So I was going to go get it from UCSC: http://hgdownload.cse.ucsc.edu/goldenPath/mm10/phastCons60way/

But now I'm a bit stuck how to convert the bigwig file into GScores. I tried rtracklayer: import("my.bw", as="Rle") which ate up all my RAM.

Any ideas? Thank you

modified 13 months ago by Ou, Jianhong1.1k • written 13 months ago by crysis40510

Just a comment on the import, it would probably be best to use as="NumericList" for that type of data. It will probably still be pretty big though. You could loop over the chromosomes and use the which argument.

1
13 months ago by
Robert Castelo2.2k
Spain/Barcelona/Universitat Pompeu Fabra
Robert Castelo2.2k wrote:

hi, i'm the maintainer of phastCons100way.UCSC.hg19, this package reduces the memory requirements by rounding the phastCons scores to 1-decimal digit, i'm in the process of generating other types of GScores objects, including for instance phyloP scores with a different quantization adapted to this kind of scores, that will become available through the GenomicScores package and the AnnotationHub, if you think this 1-decimal digit rounding is acceptable for your purposes, i could also generate the GScores object for phastCons60way.UCSC.mm10, what do you think?

cheers,

robert.

=====================EDIT==============

hi, a GScores object for phastCons60way.UCSC.mm10 is now available through the devel version of the GenomicScores package as follows:

library(GenomicScores)

gsco <- getGScores("phastCons60way.UCSC.mm10")
gsco
GScores object
# organism: Mus musculus (UCSC)
# provider: UCSC
# provider version: 17Apr2014
# maximum abs. error: 0.05

please consult http://www.bioconductor.org/developers/how-to/useDevel to see how to install the devel version of Bioconductor including the GenomicScores package version >= 1.1.3.

cheers,

robert.

Hi Robert, the authors of ATACseqQC use the phastCons100way.UCSC.hg19 library so I think the rounding should not be a problem. Would be great if you could generate phastCons60way.UCSC.mm10. Thank you.

hi, thanks for your patience, i've edited my answer above, a 'GScores' object has now become available for phastCons60way.UCSC.mm10.

cheers,

robert.

1
13 months ago by
Ou, Jianhong1.1k
United States
Ou, Jianhong1.1k wrote:

Here is my old code to generate phastCons60w.UCSC.mm10. It may not work because it is not in GScores format. But maybe you can have a try.

system("wget http://hgdownload.cse.ucsc.edu/goldenPath/mm10/phastCons60way/mm10.60way.phastCons.bw")

library(BSgenome.Mmusculus.UCSC.mm10)
library(GenomeInfoDb)
library(rtracklayer)
seqlen <- seqlengths(Mmusculus)
gr <- GRanges(names(seqlen), IRanges(1, seqlen))
for(i in 1:length(gr)){
g <- gr[i]
d <- import("mm10.60way.phastCons.bw", selection=BigWigSelection(ranges=g), as="RleList")
saveRDS(d[[as.character(seqnames(g))]], paste0("phastCons60way.UCSC.mm10.", as.character(seqnames(g)), ".rds"))
}

refgenomeGD <- GenomeDescription(organism=organism(Mmusculus),
common_name=commonName(Mmusculus),
provider=provider(Mmusculus),
provider_version=providerVersion(Mmusculus),
release_date=releaseDate(Mmusculus),
release_name=releaseName(Mmusculus),
seqinfo=Seqinfo(seqnames=seqnames(Mmusculus),
seqlengths=seqlengths(Mmusculus),
isCircular=isCircular(Mmusculus),
genome=releaseName(Mmusculus)))

saveRDS(refgenomeGD, "refgenomeGD.rds")
unlink("mm10.60way.phastCons.bw")