Search
Question: mm10 phastCons data for use with ATACseqQC
1
gravatar for crysis405
13 months ago by
crysis40510
crysis40510 wrote:

Just started testing the new ATACseqQC package. Its pretty neat until you want to do nucleosome positioning in something other than human.

One of the things you need is the library(phastCons100way.UCSC.hg19)

No one has made an equivalent library for mouse yet. So I was going to go get it from UCSC: http://hgdownload.cse.ucsc.edu/goldenPath/mm10/phastCons60way/

But now I'm a bit stuck how to convert the bigwig file into GScores. I tried rtracklayer: import("my.bw", as="Rle") which ate up all my RAM.

Any ideas? Thank you

ADD COMMENTlink modified 13 months ago by Ou, Jianhong1.1k • written 13 months ago by crysis40510

Just a comment on the import, it would probably be best to use as="NumericList" for that type of data. It will probably still be pretty big though. You could loop over the chromosomes and use the which argument.

ADD REPLYlink written 13 months ago by Michael Lawrence10.0k
1
gravatar for Robert Castelo
13 months ago by
Robert Castelo2.2k
Spain/Barcelona/Universitat Pompeu Fabra
Robert Castelo2.2k wrote:

hi, i'm the maintainer of phastCons100way.UCSC.hg19, this package reduces the memory requirements by rounding the phastCons scores to 1-decimal digit, i'm in the process of generating other types of GScores objects, including for instance phyloP scores with a different quantization adapted to this kind of scores, that will become available through the GenomicScores package and the AnnotationHub, if you think this 1-decimal digit rounding is acceptable for your purposes, i could also generate the GScores object for phastCons60way.UCSC.mm10, what do you think?

cheers,

robert.

=====================EDIT==============

hi, a GScores object for phastCons60way.UCSC.mm10 is now available through the devel version of the GenomicScores package as follows:

library(GenomicScores)

gsco <- getGScores("phastCons60way.UCSC.mm10")
gsco
GScores object
# organism: Mus musculus (UCSC)
# provider: UCSC
# provider version: 17Apr2014
# download date: May 24, 2017
# loaded sequences: chr1, chr12
# maximum abs. error: 0.05

please consult http://www.bioconductor.org/developers/how-to/useDevel to see how to install the devel version of Bioconductor including the GenomicScores package version >= 1.1.3.

cheers,

robert.

ADD COMMENTlink modified 12 months ago • written 13 months ago by Robert Castelo2.2k

Hi Robert, the authors of ATACseqQC use the phastCons100way.UCSC.hg19 library so I think the rounding should not be a problem. Would be great if you could generate phastCons60way.UCSC.mm10. Thank you.

ADD REPLYlink written 13 months ago by crysis40510

hi, thanks for your patience, i've edited my answer above, a 'GScores' object has now become available for phastCons60way.UCSC.mm10.

cheers,

robert.

ADD REPLYlink written 12 months ago by Robert Castelo2.2k
1
gravatar for Ou, Jianhong
13 months ago by
Ou, Jianhong1.1k
United States
Ou, Jianhong1.1k wrote:

Here is my old code to generate phastCons60w.UCSC.mm10. It may not work because it is not in GScores format. But maybe you can have a try.

 

system("wget http://hgdownload.cse.ucsc.edu/goldenPath/mm10/phastCons60way/mm10.60way.phastCons.bw")

library(BSgenome.Mmusculus.UCSC.mm10)
library(GenomeInfoDb)
library(rtracklayer)
seqlen <- seqlengths(Mmusculus)
gr <- GRanges(names(seqlen), IRanges(1, seqlen))
for(i in 1:length(gr)){
 g <- gr[i]
 d <- import("mm10.60way.phastCons.bw", selection=BigWigSelection(ranges=g), as="RleList")
 saveRDS(d[[as.character(seqnames(g))]], paste0("phastCons60way.UCSC.mm10.", as.character(seqnames(g)), ".rds"))
}

refgenomeGD <- GenomeDescription(organism=organism(Mmusculus),
                                 common_name=commonName(Mmusculus),
                                 provider=provider(Mmusculus),
                                 provider_version=providerVersion(Mmusculus),
                                 release_date=releaseDate(Mmusculus),
                                 release_name=releaseName(Mmusculus),
                                 seqinfo=Seqinfo(seqnames=seqnames(Mmusculus),
                                                 seqlengths=seqlengths(Mmusculus),
                                                 isCircular=isCircular(Mmusculus),
                                                 genome=releaseName(Mmusculus)))

saveRDS(refgenomeGD, "refgenomeGD.rds")
unlink("mm10.60way.phastCons.bw")

 

 

ADD COMMENTlink written 13 months ago by Ou, Jianhong1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 205 users visited in the last hour