ChIPpeakAnno as.data.frame error

0

Entering edit mode

Mark Cowley ▴ 910

@mark-cowley-2951

Last seen 11.4 years ago

Dear list, i've had an odd error for a few days now, which i'm really struggling to pinpoint. Essentially, i'm trying to annotate ChIP-Seq peaks using ChIPpeakAnno. after doing annotatePeakInBatch, I convert to data.frame & I get this error: > peaks.bed <- read.delim(bed.file, header=F) > peaks.RangedData <- BED2RangedData(peaks.bed, FALSE) > peaks.annot <- annotatePeakInBatch(peaks.RangedData, AnnotationData = annoData, output="both", multiple=TRUE, maxgap=0) > peaks.annot <- as.data.frame(peaks.annot) Error in as.data.frame.default(peaks.annot) : cannot coerce class 'structure("RangedData", package = "IRanges")' into a data.frame A combination of debug, getMethod & traceback (see below) prove that my peaks.annot object is of class RangedData, the S4 method "as.data.frame" for "RangedData" exists, but is not actually called; instead as.data.frame.default is called, causing the error. Oddly enough, the error only occurs when run inside my function, which i think implies that my package NAMESPACE is incorrectly setup. To this end, I put this code in its own package, deliberately do not import any code from my other [potentially dodgy] packages & I still get this error. My code & the debug trace/getMethod/traceback is below & any advice would be greatly recommended. cheers, Mark ###################################################################### ########## The function: #' annotate MACS peaks using ChIPpeakAnno #' #' This compares the BED regions from running MACS to the nearest ENSEMBL #' Gene, and to CpG islands. This uses \code{\link[ChIPpeakAnno]{annotatePeakInBatch}} #' from ChIPpeakAnno, which has a very loose definition of close, ie #' within 0.5Mb. #' #' This uses pre-built CpG island definitions downloaded from UCSC. #' #' @param dir the path to a MACS result directory #' @param genome one of \sQuote{hg18}, or \sQuote{hg19} #' @return nothing. it writes a <prefix>_peaks_annot.xls file. To interpret #' the new columns, see \code{\link[ChIPpeakAnno]{annotatePeakInBatch}} #' @author Mark Cowley, 2012-03-01 #' @importFrom ChIPpeakAnno BED2RangedData annotatePeakInBatch #' @importFrom plyr join #' @export #' @examples #' \dontrun{ #' dir <- "./MACS/TAMR.H3k4Me3.vs.MCF7.H3k4Me3/" #' macs.ChIPpeakAnno(dir, "hg19") #' } annotate.macs.output <- function(dir, genome=c("hg19", "hg18", "mm9", "rn4"), tss=TRUE, cpg=TRUE) { length(dir) == 1 && is.dir(dir) || stop("dir must be the path to a macs result directory.") SUPPORTED.GENOMES <- c("hg19", "hg18") genome <- genome[1] genome %in% SUPPORTED.GENOMES || stop("genome is unsupported.") ############################################################## ################## saf <- getOption("stringsAsFactors") on.exit(options(stringsAsFactors=saf)) options(stringsAsFactors=FALSE) ############################################################## ################## ############################################################## ################## # which genome? annoData <- switch(genome, hg19={data(TSS.human.GRCh37); TSS.human.GRCh37}, hg18={data(TSS.human.NCBI36); TSS.human.NCBI36}, mm9={data(TSS.mouse.NCBIM37); TSS.mouse.NCBIM37}, rn4={data(TSS.rat.RGSC3.4); TSS.rat.RGSC3.4}, stop("unsupported genome") ) cpgData <- switch(genome, hg19={data(CpG.human.GRCh37); CpG.human.GRCh37}, hg18={data(CpG.human.NCBI36); CpG.human.NCBI36}, mm9={data(CpG.mouse.NCBIM37); CpG.mouse.NCBIM37}, rn4={data(CpG.rat.RGSC3.4); CpG.rat.RGSC3.4}, stop("unsupported genome") ) ############################################################## ################## ############################################################## ################## # setup the input/output file paths # bed.file <- "TAMR.vs.MCF7.MBD2_peaks.bed" bed.file <- dir(dir, pattern="_peaks.bed", full=TRUE) file.exists(bed.file) || stop("bed.file must exist") # macs.peaks.file <- "TAMR.vs.MCF7.MBD2_peaks.xls" macs.peaks.file <- rev(dir(dir, pattern="_peaks.xls", full=TRUE))[1] file.exists(macs.peaks.file) || stop("macs.peaks.file must exist") result.file <- sub("_peaks.xls", "_peaks_annot.xls", macs.peaks.file) ############################################################## ################## ############################################################## ################## # import peaks peaks.bed <- read.delim(bed.file, header=F) # head(peaks.bed) peaks.RangedData <- BED2RangedData(peaks.bed, FALSE) # peaks.RangedData ############################################################## ################## # # import peaks & join to closest gene and CpG island # peaks.stats <- import.macs.peaks.file(macs.peaks.file) # head(peaks.stats) res <- peaks.stats ############################################################## ################## message("Annotating peaks to nearest ENSG TSS...") if( tss ) { peaks.annot <- annotatePeakInBatch(peaks.RangedData, AnnotationData = annoData, output="both", multiple=TRUE, maxgap=0) message("done") peaks.annot <- as.data.frame(peaks.annot) peaks.annot <- peaks.annot[,-match(c("space", "start", "end", "width", "names"), colnames(peaks.annot))] library(org.Hs.eg.db) peaks.annot$GeneSymbol <- mget.chain(as.character(peaks.annot$feature), org.Hs.egENSEMBL2EG, org.Hs.egSYMBOL) peaks.annot <- rename.column(peaks.annot, "feature", "Ensembl ID") peaks.annot <- rename.column(peaks.annot, "peak", "name") peaks.annot <- move.column(peaks.annot, "strand", "insideFeature") peaks.annot <- move.column(peaks.annot, "GeneSymbol", "start_position") res <- plyr::join(peaks.stats, peaks.annot, by="name", type="full") } else { message("skipping") } ############################################################## ################## ############################################################## ################## message("Annotating peaks to nearest CpG...") if( cpg ) { peaks.annot.cpg <- annotatePeakInBatch(peaks.RangedData, AnnotationData = cpgData, output="both", multiple=T, maxgap=0) message("done") peaks.annot.cpg <- as.data.frame(peaks.annot.cpg) peaks.annot.cpg <- peaks.annot.cpg[,-match(c("space", "start", "end", "width", "names"), colnames(peaks.annot.cpg))] library(org.Hs.eg.db) peaks.annot.cpg <- rename.column(peaks.annot.cpg, "feature", "CpG island") peaks.annot.cpg <- rename.column(peaks.annot.cpg, "peak", "name") peaks.annot.cpg <- move.column(peaks.annot.cpg, "strand", "insideFeature") res <- plyr::join(res, peaks.annot.cpg, by="name", type="full") } else { message("skipping") } ############################################################## ################## write.xls(res, result.file) } ###################################################################### ########## ###################################################################### ########## Selection from Debug showing that the method is at least visible: ... Browse[2]> class(peaks.annot) [1] "RangedData" attr(,"package") [1] "IRanges" Browse[2]> getMethod("as.data.frame", "RangedData") Method Definition: function (x, row.names = NULL, optional = FALSE, ...) { if (!(is.null(row.names) || is.character(row.names))) stop("'row.names' must be NULL or a character vector") if (!missing(optional) || length(list(...))) warning("'optional' and arguments in '...' ignored") data.frame(as.data.frame(ranges(x)), as.data.frame(values(x))[-1L], row.names = row.names, stringsAsFactors = FALSE) } <environment: namespace:iranges=""> Signatures: x target "RangedData" defined "RangedData" Browse[2]> Error in as.data.frame.default(peaks.annot) : cannot coerce class 'structure("RangedData", package = "IRanges")' into a data.frame traceback() 4: stop(gettextf("cannot coerce class '%s' into a data.frame", deparse(class(x))), domain = NA) 3: as.data.frame.default(peaks.annot) 2: as.data.frame(peaks.annot) 1: annotate.macs.output("MACS/TAMR.MBD2.vs.MCF7.MBD2") ###################################################################### ########## sessionInfo() R version 2.13.1 (2011-07-08) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] gdata_2.8.1 macsR_1.0 [3] ChIPpeakAnno_1.9.0 limma_3.8.3 [5] org.Hs.eg.db_2.5.0 GO.db_2.5.0 [7] RSQLite_0.9-4 DBI_0.2-5 [9] AnnotationDbi_1.14.1 BSgenome.Ecoli.NCBI.20080805_1.3.17 [11] BSgenome_1.20.0 GenomicRanges_1.4.6 [13] Biostrings_2.20.2 IRanges_1.10.5 [15] multtest_2.8.0 Biobase_2.12.2 [17] biomaRt_2.8.1 plyr_1.6 loaded via a namespace (and not attached): [1] gtools_2.6.2 MASS_7.3-14 RCurl_1.6-7 splines_2.13.1 [5] survival_2.36-9 tools_2.13.1 XML_3.4-2

GO AnnotationData annotate convert ChIPpeakAnno GO AnnotationData annotate convert • 2.7k views

ADD COMMENT • link updated 13.9 years ago by Julie Zhu ★ 4.3k • written 13.9 years ago by Mark Cowley ▴ 910

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 2.3 years ago

United States

Dear Mark, I am wondering what happens if you add library(RangedData) after all the packages have been imported. Best regards, Julie On 3/5/12 10:31 PM, "Mark Cowley" <m.cowley at="" garvan.org.au=""> wrote: > Dear list, > i've had an odd error for a few days now, which i'm really struggling to > pinpoint. > Essentially, i'm trying to annotate ChIP-Seq peaks using ChIPpeakAnno. after > doing annotatePeakInBatch, I convert to data.frame & I get this error: > >> peaks.bed <- read.delim(bed.file, header=F) >> peaks.RangedData <- BED2RangedData(peaks.bed, FALSE) >> peaks.annot <- annotatePeakInBatch(peaks.RangedData, AnnotationData = >> annoData, output="both", multiple=TRUE, maxgap=0) >> peaks.annot <- as.data.frame(peaks.annot) > Error in as.data.frame.default(peaks.annot) : > cannot coerce class 'structure("RangedData", package = "IRanges")' into a > data.frame > > A combination of debug, getMethod & traceback (see below) prove that my > peaks.annot object is of class RangedData, the S4 method "as.data.frame" for > "RangedData" exists, but is not actually called; instead as.data.frame.default > is called, causing the error. > Oddly enough, the error only occurs when run inside my function, which i think > implies that my package NAMESPACE is incorrectly setup. To this end, I put > this code in its own package, deliberately do not import any code from my > other [potentially dodgy] packages & I still get this error. > > My code & the debug trace/getMethod/traceback is below & any advice would be > greatly recommended. > cheers, > Mark > > > #################################################################### ########## > ## > The function: > #' annotate MACS peaks using ChIPpeakAnno > #' > #' This compares the BED regions from running MACS to the nearest ENSEMBL > #' Gene, and to CpG islands. This uses > \code{\link[ChIPpeakAnno]{annotatePeakInBatch}} > #' from ChIPpeakAnno, which has a very loose definition of close, ie > #' within 0.5Mb. > #' > #' This uses pre-built CpG island definitions downloaded from UCSC. > #' > #' @param dir the path to a MACS result directory > #' @param genome one of \sQuote{hg18}, or \sQuote{hg19} > #' @return nothing. it writes a <prefix>_peaks_annot.xls file. To interpret > #' the new columns, see \code{\link[ChIPpeakAnno]{annotatePeakInBatch}} > #' @author Mark Cowley, 2012-03-01 > #' @importFrom ChIPpeakAnno BED2RangedData annotatePeakInBatch > #' @importFrom plyr join > #' @export > #' @examples > #' \dontrun{ > #' dir <- "./MACS/TAMR.H3k4Me3.vs.MCF7.H3k4Me3/" > #' macs.ChIPpeakAnno(dir, "hg19") > #' } > annotate.macs.output <- function(dir, genome=c("hg19", "hg18", "mm9", "rn4"), > tss=TRUE, cpg=TRUE) { > length(dir) == 1 && is.dir(dir) || stop("dir must be the path to a macs result > directory.") > SUPPORTED.GENOMES <- c("hg19", "hg18") > genome <- genome[1] > genome %in% SUPPORTED.GENOMES || stop("genome is unsupported.") > #################################################################### ########## > ## > saf <- getOption("stringsAsFactors") > on.exit(options(stringsAsFactors=saf)) > options(stringsAsFactors=FALSE) > #################################################################### ########## > ## > > #################################################################### ########## > ## > # which genome? > annoData <- switch(genome, > hg19={data(TSS.human.GRCh37); TSS.human.GRCh37}, > hg18={data(TSS.human.NCBI36); TSS.human.NCBI36}, > mm9={data(TSS.mouse.NCBIM37); TSS.mouse.NCBIM37}, > rn4={data(TSS.rat.RGSC3.4); TSS.rat.RGSC3.4}, > stop("unsupported genome") > ) > cpgData <- switch(genome, > hg19={data(CpG.human.GRCh37); CpG.human.GRCh37}, > hg18={data(CpG.human.NCBI36); CpG.human.NCBI36}, > mm9={data(CpG.mouse.NCBIM37); CpG.mouse.NCBIM37}, > rn4={data(CpG.rat.RGSC3.4); CpG.rat.RGSC3.4}, > stop("unsupported genome") > ) > #################################################################### ########## > ## > > #################################################################### ########## > ## > # setup the input/output file paths > # bed.file <- "TAMR.vs.MCF7.MBD2_peaks.bed" > bed.file <- dir(dir, pattern="_peaks.bed", full=TRUE) > file.exists(bed.file) || stop("bed.file must exist") > > # macs.peaks.file <- "TAMR.vs.MCF7.MBD2_peaks.xls" > macs.peaks.file <- rev(dir(dir, pattern="_peaks.xls", full=TRUE))[1] > file.exists(macs.peaks.file) || stop("macs.peaks.file must exist") > > result.file <- sub("_peaks.xls", "_peaks_annot.xls", macs.peaks.file) > #################################################################### ########## > ## > > #################################################################### ########## > ## > # import peaks > peaks.bed <- read.delim(bed.file, header=F) > # head(peaks.bed) > > peaks.RangedData <- BED2RangedData(peaks.bed, FALSE) > # peaks.RangedData > #################################################################### ########## > ## > > # > # import peaks & join to closest gene and CpG island > # > peaks.stats <- import.macs.peaks.file(macs.peaks.file) > # head(peaks.stats) > > res <- peaks.stats > > #################################################################### ########## > ## > message("Annotating peaks to nearest ENSG TSS...") > if( tss ) { > peaks.annot <- annotatePeakInBatch(peaks.RangedData, AnnotationData = > annoData, output="both", multiple=TRUE, maxgap=0) > message("done") > > peaks.annot <- as.data.frame(peaks.annot) > peaks.annot <- peaks.annot[,-match(c("space", "start", "end", "width", > "names"), colnames(peaks.annot))] > library(org.Hs.eg.db) > peaks.annot$GeneSymbol <- mget.chain(as.character(peaks.annot$feature), > org.Hs.egENSEMBL2EG, org.Hs.egSYMBOL) > peaks.annot <- rename.column(peaks.annot, "feature", "Ensembl ID") > peaks.annot <- rename.column(peaks.annot, "peak", "name") > peaks.annot <- move.column(peaks.annot, "strand", "insideFeature") > peaks.annot <- move.column(peaks.annot, "GeneSymbol", "start_position") > > res <- plyr::join(peaks.stats, peaks.annot, by="name", type="full") > } > else { > message("skipping") > } > #################################################################### ########## > ## > > #################################################################### ########## > ## > message("Annotating peaks to nearest CpG...") > if( cpg ) { > peaks.annot.cpg <- annotatePeakInBatch(peaks.RangedData, AnnotationData = > cpgData, output="both", multiple=T, maxgap=0) > message("done") > > peaks.annot.cpg <- as.data.frame(peaks.annot.cpg) > peaks.annot.cpg <- peaks.annot.cpg[,-match(c("space", "start", "end", "width", > "names"), colnames(peaks.annot.cpg))] > library(org.Hs.eg.db) > peaks.annot.cpg <- rename.column(peaks.annot.cpg, "feature", "CpG island") > peaks.annot.cpg <- rename.column(peaks.annot.cpg, "peak", "name") > peaks.annot.cpg <- move.column(peaks.annot.cpg, "strand", "insideFeature") > > res <- plyr::join(res, peaks.annot.cpg, by="name", type="full") > } > else { > message("skipping") > } > #################################################################### ########## > ## > > write.xls(res, result.file) > } > #################################################################### ########## > ## > > > #################################################################### ########## > ## > Selection from Debug showing that the method is at least visible: > ... > Browse[2]> class(peaks.annot) > [1] "RangedData" > attr(,"package") > [1] "IRanges" > Browse[2]> getMethod("as.data.frame", "RangedData") > Method Definition: > > function (x, row.names = NULL, optional = FALSE, ...) > { > if (!(is.null(row.names) || is.character(row.names))) > stop("'row.names' must be NULL or a character vector") > if (!missing(optional) || length(list(...))) > warning("'optional' and arguments in '...' ignored") > data.frame(as.data.frame(ranges(x)), as.data.frame(values(x))[-1L], > row.names = row.names, stringsAsFactors = FALSE) > } > <environment: namespace:iranges=""> > > Signatures: > x > target "RangedData" > defined "RangedData" > Browse[2]> > Error in as.data.frame.default(peaks.annot) : > cannot coerce class 'structure("RangedData", package = "IRanges")' into a > data.frame > traceback() > 4: stop(gettextf("cannot coerce class '%s' into a data.frame", > deparse(class(x))), > domain = NA) > 3: as.data.frame.default(peaks.annot) > 2: as.data.frame(peaks.annot) > 1: annotate.macs.output("MACS/TAMR.MBD2.vs.MCF7.MBD2") > #################################################################### ########## > ## > > sessionInfo() > R version 2.13.1 (2011-07-08) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] gdata_2.8.1 macsR_1.0 > [3] ChIPpeakAnno_1.9.0 limma_3.8.3 > [5] org.Hs.eg.db_2.5.0 GO.db_2.5.0 > [7] RSQLite_0.9-4 DBI_0.2-5 > [9] AnnotationDbi_1.14.1 BSgenome.Ecoli.NCBI.20080805_1.3.17 > [11] BSgenome_1.20.0 GenomicRanges_1.4.6 > [13] Biostrings_2.20.2 IRanges_1.10.5 > [15] multtest_2.8.0 Biobase_2.12.2 > [17] biomaRt_2.8.1 plyr_1.6 > > loaded via a namespace (and not attached): > [1] gtools_2.6.2 MASS_7.3-14 RCurl_1.6-7 splines_2.13.1 > [5] survival_2.36-9 tools_2.13.1 XML_3.4-2 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 13.9 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Dear Mark, Sorry that I meant adding library(ChIPpeakAnno) after importing all other packages. Also, I noticed that you are using an old version of ChIPpeakAnno. I am wondering what happens if you update your R and ChIPpeakAnno. Best regards, Julie On 3/6/12 9:49 AM, "Julie Zhu" <julie.zhu at="" umassmed.edu=""> wrote: > Dear Mark, > > I am wondering what happens if you add library(RangedData) after all the > packages have been imported. > > Best regards, > > Julie > > > On 3/5/12 10:31 PM, "Mark Cowley" <m.cowley at="" garvan.org.au=""> wrote: > >> Dear list, >> i've had an odd error for a few days now, which i'm really struggling to >> pinpoint. >> Essentially, i'm trying to annotate ChIP-Seq peaks using ChIPpeakAnno. after >> doing annotatePeakInBatch, I convert to data.frame & I get this error: >> >>> peaks.bed <- read.delim(bed.file, header=F) >>> peaks.RangedData <- BED2RangedData(peaks.bed, FALSE) >>> peaks.annot <- annotatePeakInBatch(peaks.RangedData, AnnotationData = >>> annoData, output="both", multiple=TRUE, maxgap=0) >>> peaks.annot <- as.data.frame(peaks.annot) >> Error in as.data.frame.default(peaks.annot) : >> cannot coerce class 'structure("RangedData", package = "IRanges")' into a >> data.frame >> >> A combination of debug, getMethod & traceback (see below) prove that my >> peaks.annot object is of class RangedData, the S4 method "as.data.frame" for >> "RangedData" exists, but is not actually called; instead >> as.data.frame.default >> is called, causing the error. >> Oddly enough, the error only occurs when run inside my function, which i >> think >> implies that my package NAMESPACE is incorrectly setup. To this end, I put >> this code in its own package, deliberately do not import any code from my >> other [potentially dodgy] packages & I still get this error. >> >> My code & the debug trace/getMethod/traceback is below & any advice would be >> greatly recommended. >> cheers, >> Mark >> >> >> ###################################################################### #######>> # >> ## >> The function: >> #' annotate MACS peaks using ChIPpeakAnno >> #' >> #' This compares the BED regions from running MACS to the nearest ENSEMBL >> #' Gene, and to CpG islands. This uses >> \code{\link[ChIPpeakAnno]{annotatePeakInBatch}} >> #' from ChIPpeakAnno, which has a very loose definition of close, ie >> #' within 0.5Mb. >> #' >> #' This uses pre-built CpG island definitions downloaded from UCSC. >> #' >> #' @param dir the path to a MACS result directory >> #' @param genome one of \sQuote{hg18}, or \sQuote{hg19} >> #' @return nothing. it writes a <prefix>_peaks_annot.xls file. To interpret >> #' the new columns, see \code{\link[ChIPpeakAnno]{annotatePeakInBatch}} >> #' @author Mark Cowley, 2012-03-01 >> #' @importFrom ChIPpeakAnno BED2RangedData annotatePeakInBatch >> #' @importFrom plyr join >> #' @export >> #' @examples >> #' \dontrun{ >> #' dir <- "./MACS/TAMR.H3k4Me3.vs.MCF7.H3k4Me3/" >> #' macs.ChIPpeakAnno(dir, "hg19") >> #' } >> annotate.macs.output <- function(dir, genome=c("hg19", "hg18", "mm9", "rn4"), >> tss=TRUE, cpg=TRUE) { >> length(dir) == 1 && is.dir(dir) || stop("dir must be the path to a macs >> result >> directory.") >> SUPPORTED.GENOMES <- c("hg19", "hg18") >> genome <- genome[1] >> genome %in% SUPPORTED.GENOMES || stop("genome is unsupported.") >> ###################################################################### #######>> # >> ## >> saf <- getOption("stringsAsFactors") >> on.exit(options(stringsAsFactors=saf)) >> options(stringsAsFactors=FALSE) >> ###################################################################### #######>> # >> ## >> >> ###################################################################### #######>> # >> ## >> # which genome? >> annoData <- switch(genome, >> hg19={data(TSS.human.GRCh37); TSS.human.GRCh37}, >> hg18={data(TSS.human.NCBI36); TSS.human.NCBI36}, >> mm9={data(TSS.mouse.NCBIM37); TSS.mouse.NCBIM37}, >> rn4={data(TSS.rat.RGSC3.4); TSS.rat.RGSC3.4}, >> stop("unsupported genome") >> ) >> cpgData <- switch(genome, >> hg19={data(CpG.human.GRCh37); CpG.human.GRCh37}, >> hg18={data(CpG.human.NCBI36); CpG.human.NCBI36}, >> mm9={data(CpG.mouse.NCBIM37); CpG.mouse.NCBIM37}, >> rn4={data(CpG.rat.RGSC3.4); CpG.rat.RGSC3.4}, >> stop("unsupported genome") >> ) >> ###################################################################### #######>> # >> ## >> >> ###################################################################### #######>> # >> ## >> # setup the input/output file paths >> # bed.file <- "TAMR.vs.MCF7.MBD2_peaks.bed" >> bed.file <- dir(dir, pattern="_peaks.bed", full=TRUE) >> file.exists(bed.file) || stop("bed.file must exist") >> >> # macs.peaks.file <- "TAMR.vs.MCF7.MBD2_peaks.xls" >> macs.peaks.file <- rev(dir(dir, pattern="_peaks.xls", full=TRUE))[1] >> file.exists(macs.peaks.file) || stop("macs.peaks.file must exist") >> >> result.file <- sub("_peaks.xls", "_peaks_annot.xls", macs.peaks.file) >> ###################################################################### #######>> # >> ## >> >> ###################################################################### #######>> # >> ## >> # import peaks >> peaks.bed <- read.delim(bed.file, header=F) >> # head(peaks.bed) >> >> peaks.RangedData <- BED2RangedData(peaks.bed, FALSE) >> # peaks.RangedData >> ###################################################################### #######>> # >> ## >> >> # >> # import peaks & join to closest gene and CpG island >> # >> peaks.stats <- import.macs.peaks.file(macs.peaks.file) >> # head(peaks.stats) >> >> res <- peaks.stats >> >> ###################################################################### #######>> # >> ## >> message("Annotating peaks to nearest ENSG TSS...") >> if( tss ) { >> peaks.annot <- annotatePeakInBatch(peaks.RangedData, AnnotationData = >> annoData, output="both", multiple=TRUE, maxgap=0) >> message("done") >> >> peaks.annot <- as.data.frame(peaks.annot) >> peaks.annot <- peaks.annot[,-match(c("space", "start", "end", "width", >> "names"), colnames(peaks.annot))] >> library(org.Hs.eg.db) >> peaks.annot$GeneSymbol <- mget.chain(as.character(peaks.annot$feature), >> org.Hs.egENSEMBL2EG, org.Hs.egSYMBOL) >> peaks.annot <- rename.column(peaks.annot, "feature", "Ensembl ID") >> peaks.annot <- rename.column(peaks.annot, "peak", "name") >> peaks.annot <- move.column(peaks.annot, "strand", "insideFeature") >> peaks.annot <- move.column(peaks.annot, "GeneSymbol", "start_position") >> >> res <- plyr::join(peaks.stats, peaks.annot, by="name", type="full") >> } >> else { >> message("skipping") >> } >> ###################################################################### #######>> # >> ## >> >> ###################################################################### #######>> # >> ## >> message("Annotating peaks to nearest CpG...") >> if( cpg ) { >> peaks.annot.cpg <- annotatePeakInBatch(peaks.RangedData, AnnotationData = >> cpgData, output="both", multiple=T, maxgap=0) >> message("done") >> >> peaks.annot.cpg <- as.data.frame(peaks.annot.cpg) >> peaks.annot.cpg <- peaks.annot.cpg[,-match(c("space", "start", "end", >> "width", >> "names"), colnames(peaks.annot.cpg))] >> library(org.Hs.eg.db) >> peaks.annot.cpg <- rename.column(peaks.annot.cpg, "feature", "CpG island") >> peaks.annot.cpg <- rename.column(peaks.annot.cpg, "peak", "name") >> peaks.annot.cpg <- move.column(peaks.annot.cpg, "strand", "insideFeature") >> >> res <- plyr::join(res, peaks.annot.cpg, by="name", type="full") >> } >> else { >> message("skipping") >> } >> ###################################################################### #######>> # >> ## >> >> write.xls(res, result.file) >> } >> ###################################################################### #######>> # >> ## >> >> >> ###################################################################### #######>> # >> ## >> Selection from Debug showing that the method is at least visible: >> ... >> Browse[2]> class(peaks.annot) >> [1] "RangedData" >> attr(,"package") >> [1] "IRanges" >> Browse[2]> getMethod("as.data.frame", "RangedData") >> Method Definition: >> >> function (x, row.names = NULL, optional = FALSE, ...) >> { >> if (!(is.null(row.names) || is.character(row.names))) >> stop("'row.names' must be NULL or a character vector") >> if (!missing(optional) || length(list(...))) >> warning("'optional' and arguments in '...' ignored") >> data.frame(as.data.frame(ranges(x)), as.data.frame(values(x))[-1L], >> row.names = row.names, stringsAsFactors = FALSE) >> } >> <environment: namespace:iranges=""> >> >> Signatures: >> x >> target "RangedData" >> defined "RangedData" >> Browse[2]> >> Error in as.data.frame.default(peaks.annot) : >> cannot coerce class 'structure("RangedData", package = "IRanges")' into a >> data.frame >> traceback() >> 4: stop(gettextf("cannot coerce class '%s' into a data.frame", >> deparse(class(x))), >> domain = NA) >> 3: as.data.frame.default(peaks.annot) >> 2: as.data.frame(peaks.annot) >> 1: annotate.macs.output("MACS/TAMR.MBD2.vs.MCF7.MBD2") >> ###################################################################### #######>> # >> ## >> >> sessionInfo() >> R version 2.13.1 (2011-07-08) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] gdata_2.8.1 macsR_1.0 >> [3] ChIPpeakAnno_1.9.0 limma_3.8.3 >> [5] org.Hs.eg.db_2.5.0 GO.db_2.5.0 >> [7] RSQLite_0.9-4 DBI_0.2-5 >> [9] AnnotationDbi_1.14.1 BSgenome.Ecoli.NCBI.20080805_1.3.17 >> [11] BSgenome_1.20.0 GenomicRanges_1.4.6 >> [13] Biostrings_2.20.2 IRanges_1.10.5 >> [15] multtest_2.8.0 Biobase_2.12.2 >> [17] biomaRt_2.8.1 plyr_1.6 >> >> loaded via a namespace (and not attached): >> [1] gtools_2.6.2 MASS_7.3-14 RCurl_1.6-7 splines_2.13.1 >> [5] survival_2.36-9 tools_2.13.1 XML_3.4-2 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.9 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Dear Julie, I already had ChIPpeakAnno in the Depends section of DESCRIPTION. here's my search() output after loading my 'macsR' package & thus the state of the search() when my function gets run: > search() [1] ".GlobalEnv" [2] "package:macsR" [3] "package:ChIPpeakAnno" [4] "package:limma" [5] "package:org.Hs.eg.db" [6] "package:GO.db" [7] "package:RSQLite" [8] "package:DBI" [9] "package:AnnotationDbi" [10] "package:BSgenome.Ecoli.NCBI.20080805" [11] "package:BSgenome" [12] "package:GenomicRanges" [13] "package:Biostrings" [14] "package:IRanges" [15] "package:multtest" [16] "package:Biobase" [17] "package:biomaRt" [18] "package:plyr" [19] "package:stats" [20] "package:graphics" [21] "package:grDevices" [22] "package:utils" [23] "package:datasets" [24] "package:methods" [25] "Autoloads" [26] "package:base" I did add another library(ChIPpeakAnno) command in the relevant function, but since it was already loaded, there's no effect. The error is due to not finding the S4 method defined in IRanges. I tried moving IRanges up the search() list, but it's a dependency of too many packaged to be moved. I'll upgrade R (I've been putting this off for too long) and will report back cheers, Mark On 07/03/2012, at 2:37 AM, Zhu, Lihua (Julie) wrote: > Dear Mark, > > Sorry that I meant adding library(ChIPpeakAnno) after importing all other > packages. Also, I noticed that you are using an old version of ChIPpeakAnno. > I am wondering what happens if you update your R and ChIPpeakAnno. > > Best regards, > > Julie > > On 3/6/12 9:49 AM, "Julie Zhu" <julie.zhu at="" umassmed.edu=""> wrote: > >> Dear Mark, >> >> I am wondering what happens if you add library(RangedData) after all the >> packages have been imported. >> >> Best regards, >> >> Julie >> >> >> On 3/5/12 10:31 PM, "Mark Cowley" <m.cowley at="" garvan.org.au=""> wrote: >> >>> Dear list, >>> i've had an odd error for a few days now, which i'm really struggling to >>> pinpoint. >>> Essentially, i'm trying to annotate ChIP-Seq peaks using ChIPpeakAnno. after >>> doing annotatePeakInBatch, I convert to data.frame & I get this error: >>> >>>> peaks.bed <- read.delim(bed.file, header=F) >>>> peaks.RangedData <- BED2RangedData(peaks.bed, FALSE) >>>> peaks.annot <- annotatePeakInBatch(peaks.RangedData, AnnotationData = >>>> annoData, output="both", multiple=TRUE, maxgap=0) >>>> peaks.annot <- as.data.frame(peaks.annot) >>> Error in as.data.frame.default(peaks.annot) : >>> cannot coerce class 'structure("RangedData", package = "IRanges")' into a >>> data.frame >>> >>> A combination of debug, getMethod & traceback (see below) prove that my >>> peaks.annot object is of class RangedData, the S4 method "as.data.frame" for >>> "RangedData" exists, but is not actually called; instead >>> as.data.frame.default >>> is called, causing the error. >>> Oddly enough, the error only occurs when run inside my function, which i >>> think >>> implies that my package NAMESPACE is incorrectly setup. To this end, I put >>> this code in its own package, deliberately do not import any code from my >>> other [potentially dodgy] packages & I still get this error. >>> >>> My code & the debug trace/getMethod/traceback is below & any advice would be >>> greatly recommended. >>> cheers, >>> Mark >>> >>> >>> > #################################################################### #########>> > # >>> ## >>> The function: >>> #' annotate MACS peaks using ChIPpeakAnno >>> #' >>> #' This compares the BED regions from running MACS to the nearest ENSEMBL >>> #' Gene, and to CpG islands. This uses >>> \code{\link[ChIPpeakAnno]{annotatePeakInBatch}} >>> #' from ChIPpeakAnno, which has a very loose definition of close, ie >>> #' within 0.5Mb. >>> #' >>> #' This uses pre-built CpG island definitions downloaded from UCSC. >>> #' >>> #' @param dir the path to a MACS result directory >>> #' @param genome one of \sQuote{hg18}, or \sQuote{hg19} >>> #' @return nothing. it writes a <prefix>_peaks_annot.xls file. To interpret >>> #' the new columns, see \code{\link[ChIPpeakAnno]{annotatePeakInBatch}} >>> #' @author Mark Cowley, 2012-03-01 >>> #' @importFrom ChIPpeakAnno BED2RangedData annotatePeakInBatch >>> #' @importFrom plyr join >>> #' @export >>> #' @examples >>> #' \dontrun{ >>> #' dir <- "./MACS/TAMR.H3k4Me3.vs.MCF7.H3k4Me3/" >>> #' macs.ChIPpeakAnno(dir, "hg19") >>> #' } >>> annotate.macs.output <- function(dir, genome=c("hg19", "hg18", "mm9", "rn4"), >>> tss=TRUE, cpg=TRUE) { >>> length(dir) == 1 && is.dir(dir) || stop("dir must be the path to a macs >>> result >>> directory.") >>> SUPPORTED.GENOMES <- c("hg19", "hg18") >>> genome <- genome[1] >>> genome %in% SUPPORTED.GENOMES || stop("genome is unsupported.") >>> > #################################################################### #########>> > # >>> ## >>> saf <- getOption("stringsAsFactors") >>> on.exit(options(stringsAsFactors=saf)) >>> options(stringsAsFactors=FALSE) >>> > #################################################################### #########>> > # >>> ## >>> >>> > #################################################################### #########>> > # >>> ## >>> # which genome? >>> annoData <- switch(genome, >>> hg19={data(TSS.human.GRCh37); TSS.human.GRCh37}, >>> hg18={data(TSS.human.NCBI36); TSS.human.NCBI36}, >>> mm9={data(TSS.mouse.NCBIM37); TSS.mouse.NCBIM37}, >>> rn4={data(TSS.rat.RGSC3.4); TSS.rat.RGSC3.4}, >>> stop("unsupported genome") >>> ) >>> cpgData <- switch(genome, >>> hg19={data(CpG.human.GRCh37); CpG.human.GRCh37}, >>> hg18={data(CpG.human.NCBI36); CpG.human.NCBI36}, >>> mm9={data(CpG.mouse.NCBIM37); CpG.mouse.NCBIM37}, >>> rn4={data(CpG.rat.RGSC3.4); CpG.rat.RGSC3.4}, >>> stop("unsupported genome") >>> ) >>> > #################################################################### #########>> > # >>> ## >>> >>> > #################################################################### #########>> > # >>> ## >>> # setup the input/output file paths >>> # bed.file <- "TAMR.vs.MCF7.MBD2_peaks.bed" >>> bed.file <- dir(dir, pattern="_peaks.bed", full=TRUE) >>> file.exists(bed.file) || stop("bed.file must exist") >>> >>> # macs.peaks.file <- "TAMR.vs.MCF7.MBD2_peaks.xls" >>> macs.peaks.file <- rev(dir(dir, pattern="_peaks.xls", full=TRUE))[1] >>> file.exists(macs.peaks.file) || stop("macs.peaks.file must exist") >>> >>> result.file <- sub("_peaks.xls", "_peaks_annot.xls", macs.peaks.file) >>> > #################################################################### #########>> > # >>> ## >>> >>> > #################################################################### #########>> > # >>> ## >>> # import peaks >>> peaks.bed <- read.delim(bed.file, header=F) >>> # head(peaks.bed) >>> >>> peaks.RangedData <- BED2RangedData(peaks.bed, FALSE) >>> # peaks.RangedData >>> > #################################################################### #########>> > # >>> ## >>> >>> # >>> # import peaks & join to closest gene and CpG island >>> # >>> peaks.stats <- import.macs.peaks.file(macs.peaks.file) >>> # head(peaks.stats) >>> >>> res <- peaks.stats >>> >>> > #################################################################### #########>> > # >>> ## >>> message("Annotating peaks to nearest ENSG TSS...") >>> if( tss ) { >>> peaks.annot <- annotatePeakInBatch(peaks.RangedData, AnnotationData = >>> annoData, output="both", multiple=TRUE, maxgap=0) >>> message("done") >>> >>> peaks.annot <- as.data.frame(peaks.annot) >>> peaks.annot <- peaks.annot[,-match(c("space", "start", "end", "width", >>> "names"), colnames(peaks.annot))] >>> library(org.Hs.eg.db) >>> peaks.annot$GeneSymbol <- mget.chain(as.character(peaks.annot$feature), >>> org.Hs.egENSEMBL2EG, org.Hs.egSYMBOL) >>> peaks.annot <- rename.column(peaks.annot, "feature", "Ensembl ID") >>> peaks.annot <- rename.column(peaks.annot, "peak", "name") >>> peaks.annot <- move.column(peaks.annot, "strand", "insideFeature") >>> peaks.annot <- move.column(peaks.annot, "GeneSymbol", "start_position") >>> >>> res <- plyr::join(peaks.stats, peaks.annot, by="name", type="full") >>> } >>> else { >>> message("skipping") >>> } >>> > #################################################################### #########>> > # >>> ## >>> >>> > #################################################################### #########>> > # >>> ## >>> message("Annotating peaks to nearest CpG...") >>> if( cpg ) { >>> peaks.annot.cpg <- annotatePeakInBatch(peaks.RangedData, AnnotationData = >>> cpgData, output="both", multiple=T, maxgap=0) >>> message("done") >>> >>> peaks.annot.cpg <- as.data.frame(peaks.annot.cpg) >>> peaks.annot.cpg <- peaks.annot.cpg[,-match(c("space", "start", "end", >>> "width", >>> "names"), colnames(peaks.annot.cpg))] >>> library(org.Hs.eg.db) >>> peaks.annot.cpg <- rename.column(peaks.annot.cpg, "feature", "CpG island") >>> peaks.annot.cpg <- rename.column(peaks.annot.cpg, "peak", "name") >>> peaks.annot.cpg <- move.column(peaks.annot.cpg, "strand", "insideFeature") >>> >>> res <- plyr::join(res, peaks.annot.cpg, by="name", type="full") >>> } >>> else { >>> message("skipping") >>> } >>> > #################################################################### #########>> > # >>> ## >>> >>> write.xls(res, result.file) >>> } >>> > #################################################################### #########>> > # >>> ## >>> >>> >>> > #################################################################### #########>> > # >>> ## >>> Selection from Debug showing that the method is at least visible: >>> ... >>> Browse[2]> class(peaks.annot) >>> [1] "RangedData" >>> attr(,"package") >>> [1] "IRanges" >>> Browse[2]> getMethod("as.data.frame", "RangedData") >>> Method Definition: >>> >>> function (x, row.names = NULL, optional = FALSE, ...) >>> { >>> if (!(is.null(row.names) || is.character(row.names))) >>> stop("'row.names' must be NULL or a character vector") >>> if (!missing(optional) || length(list(...))) >>> warning("'optional' and arguments in '...' ignored") >>> data.frame(as.data.frame(ranges(x)), as.data.frame(values(x))[-1L], >>> row.names = row.names, stringsAsFactors = FALSE) >>> } >>> <environment: namespace:iranges=""> >>> >>> Signatures: >>> x >>> target "RangedData" >>> defined "RangedData" >>> Browse[2]> >>> Error in as.data.frame.default(peaks.annot) : >>> cannot coerce class 'structure("RangedData", package = "IRanges")' into a >>> data.frame >>> traceback() >>> 4: stop(gettextf("cannot coerce class '%s' into a data.frame", >>> deparse(class(x))), >>> domain = NA) >>> 3: as.data.frame.default(peaks.annot) >>> 2: as.data.frame(peaks.annot) >>> 1: annotate.macs.output("MACS/TAMR.MBD2.vs.MCF7.MBD2") >>> > #################################################################### #########>> > # >>> ## >>> >>> sessionInfo() >>> R version 2.13.1 (2011-07-08) >>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>> >>> locale: >>> [1] en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] gdata_2.8.1 macsR_1.0 >>> [3] ChIPpeakAnno_1.9.0 limma_3.8.3 >>> [5] org.Hs.eg.db_2.5.0 GO.db_2.5.0 >>> [7] RSQLite_0.9-4 DBI_0.2-5 >>> [9] AnnotationDbi_1.14.1 BSgenome.Ecoli.NCBI.20080805_1.3.17 >>> [11] BSgenome_1.20.0 GenomicRanges_1.4.6 >>> [13] Biostrings_2.20.2 IRanges_1.10.5 >>> [15] multtest_2.8.0 Biobase_2.12.2 >>> [17] biomaRt_2.8.1 plyr_1.6 >>> >>> loaded via a namespace (and not attached): >>> [1] gtools_2.6.2 MASS_7.3-14 RCurl_1.6-7 splines_2.13.1 >>> [5] survival_2.36-9 tools_2.13.1 XML_3.4-2 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 13.9 years ago Mark Cowley ▴ 910

0

Entering edit mode

Dear list, as usual this was a PEBKAC error. My package was missing the importMethodsFrom(IRanges, as.data.frame) in NAMESPACE, and Imports: IRanges in DESCRIPTION. What threw me off was by taking working code from an example within ChIPpeakAnno which worked because that package had imports(IRanges) in its NAMESPACE... Mark On 07/03/2012, at 9:57 AM, Mark Cowley wrote: > Dear Julie, > I already had ChIPpeakAnno in the Depends section of DESCRIPTION. here's my search() output after loading my 'macsR' package & thus the state of the search() when my function gets run: >> search() > [1] ".GlobalEnv" > [2] "package:macsR" > [3] "package:ChIPpeakAnno" > [4] "package:limma" > [5] "package:org.Hs.eg.db" > [6] "package:GO.db" > [7] "package:RSQLite" > [8] "package:DBI" > [9] "package:AnnotationDbi" > [10] "package:BSgenome.Ecoli.NCBI.20080805" > [11] "package:BSgenome" > [12] "package:GenomicRanges" > [13] "package:Biostrings" > [14] "package:IRanges" > [15] "package:multtest" > [16] "package:Biobase" > [17] "package:biomaRt" > [18] "package:plyr" > [19] "package:stats" > [20] "package:graphics" > [21] "package:grDevices" > [22] "package:utils" > [23] "package:datasets" > [24] "package:methods" > [25] "Autoloads" > [26] "package:base" > > I did add another library(ChIPpeakAnno) command in the relevant function, but since it was already loaded, there's no effect. > The error is due to not finding the S4 method defined in IRanges. I tried moving IRanges up the search() list, but it's a dependency of too many packaged to be moved. > > I'll upgrade R (I've been putting this off for too long) and will report back > > cheers, > Mark > > On 07/03/2012, at 2:37 AM, Zhu, Lihua (Julie) wrote: > >> Dear Mark, >> >> Sorry that I meant adding library(ChIPpeakAnno) after importing all other >> packages. Also, I noticed that you are using an old version of ChIPpeakAnno. >> I am wondering what happens if you update your R and ChIPpeakAnno. >> >> Best regards, >> >> Julie >> >> On 3/6/12 9:49 AM, "Julie Zhu" <julie.zhu at="" umassmed.edu=""> wrote: >> >>> Dear Mark, >>> >>> I am wondering what happens if you add library(RangedData) after all the >>> packages have been imported. >>> >>> Best regards, >>> >>> Julie >>> >>> >>> On 3/5/12 10:31 PM, "Mark Cowley" <m.cowley at="" garvan.org.au=""> wrote: >>> >>>> Dear list, >>>> i've had an odd error for a few days now, which i'm really struggling to >>>> pinpoint. >>>> Essentially, i'm trying to annotate ChIP-Seq peaks using ChIPpeakAnno. after >>>> doing annotatePeakInBatch, I convert to data.frame & I get this error: >>>> >>>>> peaks.bed <- read.delim(bed.file, header=F) >>>>> peaks.RangedData <- BED2RangedData(peaks.bed, FALSE) >>>>> peaks.annot <- annotatePeakInBatch(peaks.RangedData, AnnotationData = >>>>> annoData, output="both", multiple=TRUE, maxgap=0) >>>>> peaks.annot <- as.data.frame(peaks.annot) >>>> Error in as.data.frame.default(peaks.annot) : >>>> cannot coerce class 'structure("RangedData", package = "IRanges")' into a >>>> data.frame >>>> >>>> A combination of debug, getMethod & traceback (see below) prove that my >>>> peaks.annot object is of class RangedData, the S4 method "as.data.frame" for >>>> "RangedData" exists, but is not actually called; instead >>>> as.data.frame.default >>>> is called, causing the error. >>>> Oddly enough, the error only occurs when run inside my function, which i >>>> think >>>> implies that my package NAMESPACE is incorrectly setup. To this end, I put >>>> this code in its own package, deliberately do not import any code from my >>>> other [potentially dodgy] packages & I still get this error. >>>> >>>> My code & the debug trace/getMethod/traceback is below & any advice would be >>>> greatly recommended. >>>> cheers, >>>> Mark >>>> >>>> >>>> >> ################################################################### ##########>> >> # >>>> ## >>>> The function: >>>> #' annotate MACS peaks using ChIPpeakAnno >>>> #' >>>> #' This compares the BED regions from running MACS to the nearest ENSEMBL >>>> #' Gene, and to CpG islands. This uses >>>> \code{\link[ChIPpeakAnno]{annotatePeakInBatch}} >>>> #' from ChIPpeakAnno, which has a very loose definition of close, ie >>>> #' within 0.5Mb. >>>> #' >>>> #' This uses pre-built CpG island definitions downloaded from UCSC. >>>> #' >>>> #' @param dir the path to a MACS result directory >>>> #' @param genome one of \sQuote{hg18}, or \sQuote{hg19} >>>> #' @return nothing. it writes a <prefix>_peaks_annot.xls file. To interpret >>>> #' the new columns, see \code{\link[ChIPpeakAnno]{annotatePeakInBatch}} >>>> #' @author Mark Cowley, 2012-03-01 >>>> #' @importFrom ChIPpeakAnno BED2RangedData annotatePeakInBatch >>>> #' @importFrom plyr join >>>> #' @export >>>> #' @examples >>>> #' \dontrun{ >>>> #' dir <- "./MACS/TAMR.H3k4Me3.vs.MCF7.H3k4Me3/" >>>> #' macs.ChIPpeakAnno(dir, "hg19") >>>> #' } >>>> annotate.macs.output <- function(dir, genome=c("hg19", "hg18", "mm9", "rn4"), >>>> tss=TRUE, cpg=TRUE) { >>>> length(dir) == 1 && is.dir(dir) || stop("dir must be the path to a macs >>>> result >>>> directory.") >>>> SUPPORTED.GENOMES <- c("hg19", "hg18") >>>> genome <- genome[1] >>>> genome %in% SUPPORTED.GENOMES || stop("genome is unsupported.") >>>> >> ################################################################### ##########>> >> # >>>> ## >>>> saf <- getOption("stringsAsFactors") >>>> on.exit(options(stringsAsFactors=saf)) >>>> options(stringsAsFactors=FALSE) >>>> >> ################################################################### ##########>> >> # >>>> ## >>>> >>>> >> ################################################################### ##########>> >> # >>>> ## >>>> # which genome? >>>> annoData <- switch(genome, >>>> hg19={data(TSS.human.GRCh37); TSS.human.GRCh37}, >>>> hg18={data(TSS.human.NCBI36); TSS.human.NCBI36}, >>>> mm9={data(TSS.mouse.NCBIM37); TSS.mouse.NCBIM37}, >>>> rn4={data(TSS.rat.RGSC3.4); TSS.rat.RGSC3.4}, >>>> stop("unsupported genome") >>>> ) >>>> cpgData <- switch(genome, >>>> hg19={data(CpG.human.GRCh37); CpG.human.GRCh37}, >>>> hg18={data(CpG.human.NCBI36); CpG.human.NCBI36}, >>>> mm9={data(CpG.mouse.NCBIM37); CpG.mouse.NCBIM37}, >>>> rn4={data(CpG.rat.RGSC3.4); CpG.rat.RGSC3.4}, >>>> stop("unsupported genome") >>>> ) >>>> >> ################################################################### ##########>> >> # >>>> ## >>>> >>>> >> ################################################################### ##########>> >> # >>>> ## >>>> # setup the input/output file paths >>>> # bed.file <- "TAMR.vs.MCF7.MBD2_peaks.bed" >>>> bed.file <- dir(dir, pattern="_peaks.bed", full=TRUE) >>>> file.exists(bed.file) || stop("bed.file must exist") >>>> >>>> # macs.peaks.file <- "TAMR.vs.MCF7.MBD2_peaks.xls" >>>> macs.peaks.file <- rev(dir(dir, pattern="_peaks.xls", full=TRUE))[1] >>>> file.exists(macs.peaks.file) || stop("macs.peaks.file must exist") >>>> >>>> result.file <- sub("_peaks.xls", "_peaks_annot.xls", macs.peaks.file) >>>> >> ################################################################### ##########>> >> # >>>> ## >>>> >>>> >> ################################################################### ##########>> >> # >>>> ## >>>> # import peaks >>>> peaks.bed <- read.delim(bed.file, header=F) >>>> # head(peaks.bed) >>>> >>>> peaks.RangedData <- BED2RangedData(peaks.bed, FALSE) >>>> # peaks.RangedData >>>> >> ################################################################### ##########>> >> # >>>> ## >>>> >>>> # >>>> # import peaks & join to closest gene and CpG island >>>> # >>>> peaks.stats <- import.macs.peaks.file(macs.peaks.file) >>>> # head(peaks.stats) >>>> >>>> res <- peaks.stats >>>> >>>> >> ################################################################### ##########>> >> # >>>> ## >>>> message("Annotating peaks to nearest ENSG TSS...") >>>> if( tss ) { >>>> peaks.annot <- annotatePeakInBatch(peaks.RangedData, AnnotationData = >>>> annoData, output="both", multiple=TRUE, maxgap=0) >>>> message("done") >>>> >>>> peaks.annot <- as.data.frame(peaks.annot) >>>> peaks.annot <- peaks.annot[,-match(c("space", "start", "end", "width", >>>> "names"), colnames(peaks.annot))] >>>> library(org.Hs.eg.db) >>>> peaks.annot$GeneSymbol <- mget.chain(as.character(peaks.annot$feature), >>>> org.Hs.egENSEMBL2EG, org.Hs.egSYMBOL) >>>> peaks.annot <- rename.column(peaks.annot, "feature", "Ensembl ID") >>>> peaks.annot <- rename.column(peaks.annot, "peak", "name") >>>> peaks.annot <- move.column(peaks.annot, "strand", "insideFeature") >>>> peaks.annot <- move.column(peaks.annot, "GeneSymbol", "start_position") >>>> >>>> res <- plyr::join(peaks.stats, peaks.annot, by="name", type="full") >>>> } >>>> else { >>>> message("skipping") >>>> } >>>> >> ################################################################### ##########>> >> # >>>> ## >>>> >>>> >> ################################################################### ##########>> >> # >>>> ## >>>> message("Annotating peaks to nearest CpG...") >>>> if( cpg ) { >>>> peaks.annot.cpg <- annotatePeakInBatch(peaks.RangedData, AnnotationData = >>>> cpgData, output="both", multiple=T, maxgap=0) >>>> message("done") >>>> >>>> peaks.annot.cpg <- as.data.frame(peaks.annot.cpg) >>>> peaks.annot.cpg <- peaks.annot.cpg[,-match(c("space", "start", "end", >>>> "width", >>>> "names"), colnames(peaks.annot.cpg))] >>>> library(org.Hs.eg.db) >>>> peaks.annot.cpg <- rename.column(peaks.annot.cpg, "feature", "CpG island") >>>> peaks.annot.cpg <- rename.column(peaks.annot.cpg, "peak", "name") >>>> peaks.annot.cpg <- move.column(peaks.annot.cpg, "strand", "insideFeature") >>>> >>>> res <- plyr::join(res, peaks.annot.cpg, by="name", type="full") >>>> } >>>> else { >>>> message("skipping") >>>> } >>>> >> ################################################################### ##########>> >> # >>>> ## >>>> >>>> write.xls(res, result.file) >>>> } >>>> >> ################################################################### ##########>> >> # >>>> ## >>>> >>>> >>>> >> ################################################################### ##########>> >> # >>>> ## >>>> Selection from Debug showing that the method is at least visible: >>>> ... >>>> Browse[2]> class(peaks.annot) >>>> [1] "RangedData" >>>> attr(,"package") >>>> [1] "IRanges" >>>> Browse[2]> getMethod("as.data.frame", "RangedData") >>>> Method Definition: >>>> >>>> function (x, row.names = NULL, optional = FALSE, ...) >>>> { >>>> if (!(is.null(row.names) || is.character(row.names))) >>>> stop("'row.names' must be NULL or a character vector") >>>> if (!missing(optional) || length(list(...))) >>>> warning("'optional' and arguments in '...' ignored") >>>> data.frame(as.data.frame(ranges(x)), as.data.frame(values(x))[-1L], >>>> row.names = row.names, stringsAsFactors = FALSE) >>>> } >>>> <environment: namespace:iranges=""> >>>> >>>> Signatures: >>>> x >>>> target "RangedData" >>>> defined "RangedData" >>>> Browse[2]> >>>> Error in as.data.frame.default(peaks.annot) : >>>> cannot coerce class 'structure("RangedData", package = "IRanges")' into a >>>> data.frame >>>> traceback() >>>> 4: stop(gettextf("cannot coerce class '%s' into a data.frame", >>>> deparse(class(x))), >>>> domain = NA) >>>> 3: as.data.frame.default(peaks.annot) >>>> 2: as.data.frame(peaks.annot) >>>> 1: annotate.macs.output("MACS/TAMR.MBD2.vs.MCF7.MBD2") >>>> >> ################################################################### ##########>> >> # >>>> ## >>>> >>>> sessionInfo() >>>> R version 2.13.1 (2011-07-08) >>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>>> >>>> locale: >>>> [1] en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8 >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] gdata_2.8.1 macsR_1.0 >>>> [3] ChIPpeakAnno_1.9.0 limma_3.8.3 >>>> [5] org.Hs.eg.db_2.5.0 GO.db_2.5.0 >>>> [7] RSQLite_0.9-4 DBI_0.2-5 >>>> [9] AnnotationDbi_1.14.1 BSgenome.Ecoli.NCBI.20080805_1.3.17 >>>> [11] BSgenome_1.20.0 GenomicRanges_1.4.6 >>>> [13] Biostrings_2.20.2 IRanges_1.10.5 >>>> [15] multtest_2.8.0 Biobase_2.12.2 >>>> [17] biomaRt_2.8.1 plyr_1.6 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] gtools_2.6.2 MASS_7.3-14 RCurl_1.6-7 splines_2.13.1 >>>> [5] survival_2.36-9 tools_2.13.1 XML_3.4-2 >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.9 years ago Mark Cowley ▴ 910

Login before adding your answer.