ChIPpeakAnno annotatePeakInBatch error message

0

Entering edit mode

Dario Strbenac ★ 1.6k

@dario-strbenac-5916

Last seen 4 weeks ago

Australia

Hello, I made another small example of using annoPeakInBatch to demonstrate to a friend, but it has crashed. It's similar to the other example but with different data. I'm not sure why it is happening. Here is my small example: peaksT <- data.frame(chr = c("chr1", "chr1", "chr1", "chr2", "chr2", "chr2"), start = c(2000010, 19000000, 30000000, 300, 5500, 100000), end = c(2000310, 19000300, 30000300, 600, 5800, 100300)) featuresT <- data.frame(name = c("gene1", "gene2", "gene3", "gene4", "gene5", "gene6"), chr = c("chr1", "chr1", "chr1", "chr2", "chr2", "chr2"), start = c(1000000, 10000000, 15000000, 1000, 6000, 10000), end = c(2000000, 20000000, 22000000, 5000, 7000, 15000), strand = c('+', '-', '+', '+', '-', '+')) require(ChIPpeakAnno) peaksRangedData <- RangedData(space = peaksT$chr, ranges = IRanges(start = peaksT$start, end = peaksT$end)) featuresRangedData <- RangedData(name = featuresT$name, space = featuresT$chr, strand = featuresT$strand, ranges = IRanges(start = featuresT$start, end = featuresT$end)) featureLoc <- "TSS" annotatePeakInBatch(peaksRangedData, AnnotationData = featuresRangedData, PeakLocForDistance = "middle") Error in if (as.character(r.n$strand[i]) == "1" || as.character(r.n$strand[i]) == : missing value where TRUE/FALSE needed My sessionInfo is : R version 2.11.0 (2010-04-22) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_AU.UTF-8 [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ChIPpeakAnno_1.5.2 limma_3.4.0 [3] org.Hs.eg.db_2.4.1 GO.db_2.4.1 [5] RSQLite_0.9-0 DBI_0.2-5 [7] AnnotationDbi_1.10.0 BSgenome.Ecoli.NCBI.20080805_1.3.16 [9] BSgenome_1.16.1 GenomicRanges_1.0.1 [11] Biostrings_2.16.0 IRanges_1.6.2 [13] multtest_2.4.0 Biobase_2.8.0 [15] biomaRt_2.4.0 loaded via a namespace (and not attached): [1] MASS_7.3-6 RCurl_1.4-2 splines_2.11.0 survival_2.35-8 [5] XML_3.1-0 Thanks, Dario. -------------------------------------- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia

GO AnnotationData GO AnnotationData • 2.0k views

ADD COMMENT • link 15.7 years ago Dario Strbenac ★ 1.6k

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 2.2 years ago

United States

Hi Dario, For some reason, the merge function to merge big numbers did not work (I will email to the dev list to gain insight). I found a way to fix it by only using IDs and will add the fix to the dev version tonight. Best regards, Julie On 5/24/10 8:10 AM, "Dario Strbenac" <d.strbenac@garvan.org.au> wrote: Hello, I made another small example of using annoPeakInBatch to demonstrate to a friend, but it has crashed. It's similar to the other example but with different data. I'm not sure why it is happening. Here is my small example: peaksT <- data.frame(chr = c("chr1", "chr1", "chr1", "chr2", "chr2", "chr2"), start = c(2000010, 19000000, 30000000, 300, 5500, 100000), end = c(2000310, 19000300, 30000300, 600, 5800, 100300)) featuresT <- data.frame(name = c("gene1", "gene2", "gene3", "gene4", "gene5", "gene6"), chr = c("chr1", "chr1", "chr1", "chr2", "chr2", "chr2"), start = c(1000000, 10000000, 15000000, 1000, 6000, 10000), end = c(2000000, 20000000, 22000000, 5000, 7000, 15000), strand = c('+', '-', '+', '+', '-', '+')) require(ChIPpeakAnno) peaksRangedData <- RangedData(space = peaksT$chr, ranges = IRanges(start = peaksT$start, end = peaksT$end)) featuresRangedData <- RangedData(name = featuresT$name, space = featuresT$chr, strand = featuresT$strand, ranges = IRanges(start = featuresT$start, end = featuresT$end)) featureLoc <- "TSS" annotatePeakInBatch(peaksRangedData, AnnotationData = featuresRangedData, PeakLocForDistance = "middle") Error in if (as.character(r.n$strand[i]) == "1" || as.character(r.n$strand[i]) == : missing value where TRUE/FALSE needed My sessionInfo is : R version 2.11.0 (2010-04-22) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_AU.UTF-8 [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ChIPpeakAnno_1.5.2 limma_3.4.0 [3] org.Hs.eg.db_2.4.1 GO.db_2.4.1 [5] RSQLite_0.9-0 DBI_0.2-5 [7] AnnotationDbi_1.10.0 BSgenome.Ecoli.NCBI.20080805_1.3.16 [9] BSgenome_1.16.1 GenomicRanges_1.0.1 [11] Biostrings_2.16.0 IRanges_1.6.2 [13] multtest_2.4.0 Biobase_2.8.0 [15] biomaRt_2.4.0 loaded via a namespace (and not attached): [1] MASS_7.3-6 RCurl_1.4-2 splines_2.11.0 survival_2.35-8 [5] XML_3.1-0 Thanks, Dario. -------------------------------------- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD COMMENT • link 15.7 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 2.2 years ago

United States

Hi Dario, Please download dev 1.5.3 version of ChIPpeakAnno and let me know if you encounter any problem. Thanks! Best regards, Julie annotatePeakInBatch(peaksRangedData, AnnotationData = featuresRangedData, PeakLocForDistance = "middle") RangedData with 6 rows and 9 value columns across 2 spaces space ranges | peak strand feature start_position end_position insideFeature distancetoFeature <character> <iranges> | <character> <character> <character> <numeric> <numeric> <character> <numeric> 1 1 chr1 [ 2000010, 2000310] | 1 + 1 1e+06 2.0e+06 downstream 1000160 2 2 chr1 [19000000, 19000300] | 2 - 2 1e+07 2.0e+07 inside 999850 3 2 chr1 [30000000, 30000300] | 3 - 2 1e+07 2.0e+07 upstream -10000150 4 4 chr2 [ 300, 600] | 4 + 4 1e+03 5.0e+03 upstream -550 6 6 chr2 [ 100000, 100300] | 6 + 6 1e+04 1.5e+04 downstream 90150 5 5 chr2 [ 5500, 5800] | 5 - 5 6e+03 7.0e+03 downstream 1350 shortestDistance fromOverlappingOrNearest <numeric> <character> 1 1 10 NearestStart 2 2 999700 NearestStart 3 2 10000000 NearestStart 4 4 400 NearestStart 6 6 85000 NearestStart 5 5 200 NearestStart > sessionInfo() R version 2.11.0 (2010-04-22) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ChIPpeakAnno_1.5.3 limma_3.4.0 org.Hs.eg.db_2.4.1 [4] GO.db_2.4.1 RSQLite_0.9-0 DBI_0.2-5 [7] AnnotationDbi_1.10.1 BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.1 [10] GenomicRanges_1.0.1 Biostrings_2.16.0 IRanges_1.6.1 [13] multtest_2.4.0 Biobase_2.8.0 biomaRt_2.4.0 On 5/24/10 5:10 AM, "Dario Strbenac" <d.strbenac@garvan.org.au> wrote: Hello, I made another small example of using annoPeakInBatch to demonstrate to a friend, but it has crashed. It's similar to the other example but with different data. I'm not sure why it is happening. Here is my small example: peaksT <- data.frame(chr = c("chr1", "chr1", "chr1", "chr2", "chr2", "chr2"), start = c(2000010, 19000000, 30000000, 300, 5500, 100000), end = c(2000310, 19000300, 30000300, 600, 5800, 100300)) featuresT <- data.frame(name = c("gene1", "gene2", "gene3", "gene4", "gene5", "gene6"), chr = c("chr1", "chr1", "chr1", "chr2", "chr2", "chr2"), start = c(1000000, 10000000, 15000000, 1000, 6000, 10000), end = c(2000000, 20000000, 22000000, 5000, 7000, 15000), strand = c('+', '-', '+', '+', '-', '+')) require(ChIPpeakAnno) peaksRangedData <- RangedData(space = peaksT$chr, ranges = IRanges(start = peaksT$start, end = peaksT$end)) featuresRangedData <- RangedData(name = featuresT$name, space = featuresT$chr, strand = featuresT$strand, ranges = IRanges(start = featuresT$start, end = featuresT$end)) featureLoc <- "TSS" annotatePeakInBatch(peaksRangedData, AnnotationData = featuresRangedData, PeakLocForDistance = "middle") Error in if (as.character(r.n$strand[i]) == "1" || as.character(r.n$strand[i]) == : missing value where TRUE/FALSE needed My sessionInfo is : R version 2.11.0 (2010-04-22) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_AU.UTF-8 [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ChIPpeakAnno_1.5.2 limma_3.4.0 [3] org.Hs.eg.db_2.4.1 GO.db_2.4.1 [5] RSQLite_0.9-0 DBI_0.2-5 [7] AnnotationDbi_1.10.0 BSgenome.Ecoli.NCBI.20080805_1.3.16 [9] BSgenome_1.16.1 GenomicRanges_1.0.1 [11] Biostrings_2.16.0 IRanges_1.6.2 [13] multtest_2.4.0 Biobase_2.8.0 [15] biomaRt_2.4.0 loaded via a namespace (and not attached): [1] MASS_7.3-6 RCurl_1.4-2 splines_2.11.0 survival_2.35-8 [5] XML_3.1-0 Thanks, Dario. -------------------------------------- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD COMMENT • link 15.7 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Hello, Yes, I encountered the same problem again. This time I tried the code on my full table of data. This is my script. All the files it refers to are web accessible, so that you can replicate it too. I am definitely using version 1.5.3 of the package. CpGIslandsTable <- read.table("http://129.94.136.7/file_dump/dario/hg18_CpG_Islands.bed", sep = '\t', stringsAsFactors = FALSE) genesTable <- read.csv("http://129.94.136.7/file_dump/dario/humanGenom eAnnotation.csv", stringsAsFactors = FALSE) colnames(CpGIslandsTable) <- c("chr", "start", "end", "name") peaksRangedData <- RangedData(space = CpGIslandsTable$chr, ranges = IRanges(start = CpGIslandsTable$start, end = CpGIslandsTable$end)) featuresRangedData <- RangedData(name = genesTable$name, space = genesTable$chr, strand = genesTable$strand, ranges = IRanges(start = genesTable$start, end = genesTable$end)) featureLoc <- "TSS" annotatePeakInBatch(peaksRangedData, AnnotationData = featuresRangedData, PeakLocForDistance = "middle") > sessionInfo() R version 2.11.0 (2010-04-22) x86_64-pc-mingw32 locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C LC_TIME=English_Australia.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ChIPpeakAnno_1.5.3 limma_3.4.0 org.Hs.eg.db_2.4.1 GO.db_2.4.1 RSQLite_0.9-0 [6] DBI_0.2-5 AnnotationDbi_1.10.1 BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.0 GenomicRanges_1.0.1 [11] Biostrings_2.16.0 IRanges_1.6.0 multtest_2.4.0 Biobase_2.8.0 biomaRt_2.4.0 loaded via a namespace (and not attached): [1] MASS_7.3-5 RCurl_1.3-1 splines_2.11.0 survival_2.35-8 XML_2.8-1 ---- Original message ---- >Date: Mon, 24 May 2010 22:57:47 -0400 >From: "Zhu, Julie" <julie.zhu at="" umassmed.edu=""> >Subject: Re: [BioC] ChIPpeakAnno annotatePeakInBatch error message >To: "D.Strbenac at garvan.org.au" <d.strbenac at="" garvan.org.au="">, "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> > > Hi Dario, > > Please download dev 1.5.3 version of ChIPpeakAnno > and let me know if you encounter any problem. > Thanks! > > Best regards, > > Julie > > annotatePeakInBatch(peaksRangedData, AnnotationData > = featuresRangedData, PeakLocForDistance = "middle") > RangedData with 6 rows and 9 value columns across 2 > spaces > space ranges | peak > strand feature start_position end_position > insideFeature distancetoFeature > <character> <iranges> | <character> > <character> <character> <numeric> <numeric> > <character> <numeric> > 1 1 chr1 [ 2000010, 2000310] | 1 > + 1 1e+06 2.0e+06 > downstream 1000160 > 2 2 chr1 [19000000, 19000300] | 2 > - 2 1e+07 2.0e+07 > inside 999850 > 3 2 chr1 [30000000, 30000300] | 3 > - 2 1e+07 2.0e+07 > upstream -10000150 > 4 4 chr2 [ 300, 600] | 4 > + 4 1e+03 5.0e+03 > upstream -550 > 6 6 chr2 [ 100000, 100300] | 6 > + 6 1e+04 1.5e+04 > downstream 90150 > 5 5 chr2 [ 5500, 5800] | 5 > - 5 6e+03 7.0e+03 > downstream 1350 > shortestDistance fromOverlappingOrNearest > <numeric> <character> > 1 1 10 NearestStart > 2 2 999700 NearestStart > 3 2 10000000 NearestStart > 4 4 400 NearestStart > 6 6 85000 NearestStart > 5 5 200 NearestStart > > > sessionInfo() > R version 2.11.0 (2010-04-22) > i386-apple-darwin9.8.0 > > locale: > [1] > en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] ChIPpeakAnno_1.5.3 limma_3.4.0 > org.Hs.eg.db_2.4.1 > > [4] GO.db_2.4.1 > RSQLite_0.9-0 > DBI_0.2-5 > > [7] AnnotationDbi_1.10.1 > BSgenome.Ecoli.NCBI.20080805_1.3.16 > BSgenome_1.16.1 > [10] GenomicRanges_1.0.1 > Biostrings_2.16.0 > IRanges_1.6.1 > > [13] multtest_2.4.0 > Biobase_2.8.0 > biomaRt_2.4.0 > > > On 5/24/10 5:10 AM, "Dario Strbenac" > <d.strbenac at="" garvan.org.au=""> wrote: > > Hello, > > I made another small example of using > annoPeakInBatch to demonstrate to a friend, but it > has crashed. It's similar to the other example but > with different data. I'm not sure why it is > happening. > > Here is my small example: > > peaksT <- data.frame(chr = c("chr1", "chr1", > "chr1", "chr2", "chr2", "chr2"), start = > c(2000010, 19000000, 30000000, 300, 5500, 100000), > end = c(2000310, 19000300, 30000300, 600, 5800, > 100300)) > featuresT <- data.frame(name = c("gene1", "gene2", > "gene3", "gene4", "gene5", "gene6"), chr = > c("chr1", "chr1", "chr1", "chr2", "chr2", "chr2"), > start = c(1000000, 10000000, 15000000, 1000, 6000, > 10000), end = c(2000000, 20000000, 22000000, 5000, > 7000, 15000), strand = c('+', '-', '+', '+', '-', > '+')) > > require(ChIPpeakAnno) > > peaksRangedData <- RangedData(space = peaksT$chr, > ranges = IRanges(start = peaksT$start, end = > peaksT$end)) > featuresRangedData <- RangedData(name = > featuresT$name, space = featuresT$chr, strand = > featuresT$strand, ranges = IRanges(start = > featuresT$start, end = featuresT$end)) > featureLoc <- "TSS" > > annotatePeakInBatch(peaksRangedData, > AnnotationData = featuresRangedData, > PeakLocForDistance = "middle") > > Error in if (as.character(r.n$strand[i]) == "1" || > as.character(r.n$strand[i]) == : > missing value where TRUE/FALSE needed > > My sessionInfo is : > > R version 2.11.0 (2010-04-22) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_AU.UTF-8 > LC_COLLATE=en_AU.UTF-8 > [5] LC_MONETARY=C > LC_MESSAGES=en_AU.UTF-8 > [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_AU.UTF-8 > LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils > datasets methods base > > other attached packages: > [1] ChIPpeakAnno_1.5.2 > limma_3.4.0 > > [3] org.Hs.eg.db_2.4.1 > GO.db_2.4.1 > > [5] RSQLite_0.9-0 DBI_0.2-5 > > [7] AnnotationDbi_1.10.0 > BSgenome.Ecoli.NCBI.20080805_1.3.16 > [9] BSgenome_1.16.1 > GenomicRanges_1.0.1 > > [11] Biostrings_2.16.0 > IRanges_1.6.2 > > [13] multtest_2.4.0 > Biobase_2.8.0 > > [15] biomaRt_2.4.0 > > loaded via a namespace (and not attached): > [1] MASS_7.3-6 RCurl_1.4-2 splines_2.11.0 > survival_2.35-8 > [5] XML_3.1-0 > > Thanks, > Dario. > > -------------------------------------- > Dario Strbenac > Research Assistant > Cancer Epigenetics > Garvan Institute of Medical Research > Darlinghurst NSW 2010 > Australia > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -------------------------------------- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia

ADD REPLY • link 15.7 years ago Dario Strbenac ★ 1.6k

0

Entering edit mode

Hi Dario, Thanks for the vigorous test of the new feature! The peak dataset contains chrX_random that is not in the feature dataset. I added is.na check on the strand which should fix the problem. I also attached the annotated Dataset. Please let me know if you encounter any problem. Best regards, Julie On 5/26/10 11:00 PM, "Dario Strbenac" <d.strbenac at="" garvan.org.au=""> wrote: Hello, Yes, I encountered the same problem again. This time I tried the code on my full table of data. This is my script. All the files it refers to are web accessible, so that you can replicate it too. I am definitely using version 1.5.3 of the package. CpGIslandsTable <- read.table("http://129.94.136.7/file_dump/dario/hg18_CpG_Islands.bed", sep = '\t', stringsAsFactors = FALSE) genesTable <- read.csv("http://129.94.136.7/file_dump/dario/humanGenom eAnnotation.csv", stringsAsFactors = FALSE) colnames(CpGIslandsTable) <- c("chr", "start", "end", "name") peaksRangedData <- RangedData(space = CpGIslandsTable$chr, ranges = IRanges(start = CpGIslandsTable$start, end = CpGIslandsTable$end)) featuresRangedData <- RangedData(name = genesTable$name, space = genesTable$chr, strand = genesTable$strand, ranges = IRanges(start = genesTable$start, end = genesTable$end)) featureLoc <- "TSS" annotatePeakInBatch(peaksRangedData, AnnotationData = featuresRangedData, PeakLocForDistance = "middle") > sessionInfo() R version 2.11.0 (2010-04-22) x86_64-pc-mingw32 locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C LC_TIME=English_Australia.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ChIPpeakAnno_1.5.3 limma_3.4.0 org.Hs.eg.db_2.4.1 GO.db_2.4.1 RSQLite_0.9-0 [6] DBI_0.2-5 AnnotationDbi_1.10.1 BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.0 GenomicRanges_1.0.1 [11] Biostrings_2.16.0 IRanges_1.6.0 multtest_2.4.0 Biobase_2.8.0 biomaRt_2.4.0 loaded via a namespace (and not attached): [1] MASS_7.3-5 RCurl_1.3-1 splines_2.11.0 survival_2.35-8 XML_2.8-1 ---- Original message ---- >Date: Mon, 24 May 2010 22:57:47 -0400 >From: "Zhu, Julie" <julie.zhu at="" umassmed.edu=""> >Subject: Re: [BioC] ChIPpeakAnno annotatePeakInBatch error message >To: "D.Strbenac at garvan.org.au" <d.strbenac at="" garvan.org.au="">, "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> > > Hi Dario, > > Please download dev 1.5.3 version of ChIPpeakAnno > and let me know if you encounter any problem. > Thanks! > > Best regards, > > Julie > > annotatePeakInBatch(peaksRangedData, AnnotationData > = featuresRangedData, PeakLocForDistance = "middle") > RangedData with 6 rows and 9 value columns across 2 > spaces > space ranges | peak > strand feature start_position end_position > insideFeature distancetoFeature > <character> <iranges> | <character> > <character> <character> <numeric> <numeric> > <character> <numeric> > 1 1 chr1 [ 2000010, 2000310] | 1 > + 1 1e+06 2.0e+06 > downstream 1000160 > 2 2 chr1 [19000000, 19000300] | 2 > - 2 1e+07 2.0e+07 > inside 999850 > 3 2 chr1 [30000000, 30000300] | 3 > - 2 1e+07 2.0e+07 > upstream -10000150 > 4 4 chr2 [ 300, 600] | 4 > + 4 1e+03 5.0e+03 > upstream -550 > 6 6 chr2 [ 100000, 100300] | 6 > + 6 1e+04 1.5e+04 > downstream 90150 > 5 5 chr2 [ 5500, 5800] | 5 > - 5 6e+03 7.0e+03 > downstream 1350 > shortestDistance fromOverlappingOrNearest > <numeric> <character> > 1 1 10 NearestStart > 2 2 999700 NearestStart > 3 2 10000000 NearestStart > 4 4 400 NearestStart > 6 6 85000 NearestStart > 5 5 200 NearestStart > > > sessionInfo() > R version 2.11.0 (2010-04-22) > i386-apple-darwin9.8.0 > > locale: > [1] > en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] ChIPpeakAnno_1.5.3 limma_3.4.0 > org.Hs.eg.db_2.4.1 > > [4] GO.db_2.4.1 > RSQLite_0.9-0 > DBI_0.2-5 > > [7] AnnotationDbi_1.10.1 > BSgenome.Ecoli.NCBI.20080805_1.3.16 > BSgenome_1.16.1 > [10] GenomicRanges_1.0.1 > Biostrings_2.16.0 > IRanges_1.6.1 > > [13] multtest_2.4.0 > Biobase_2.8.0 > biomaRt_2.4.0 > > > On 5/24/10 5:10 AM, "Dario Strbenac" > <d.strbenac at="" garvan.org.au=""> wrote: > > Hello, > > I made another small example of using > annoPeakInBatch to demonstrate to a friend, but it > has crashed. It's similar to the other example but > with different data. I'm not sure why it is > happening. > > Here is my small example: > > peaksT <- data.frame(chr = c("chr1", "chr1", > "chr1", "chr2", "chr2", "chr2"), start = > c(2000010, 19000000, 30000000, 300, 5500, 100000), > end = c(2000310, 19000300, 30000300, 600, 5800, > 100300)) > featuresT <- data.frame(name = c("gene1", "gene2", > "gene3", "gene4", "gene5", "gene6"), chr = > c("chr1", "chr1", "chr1", "chr2", "chr2", "chr2"), > start = c(1000000, 10000000, 15000000, 1000, 6000, > 10000), end = c(2000000, 20000000, 22000000, 5000, > 7000, 15000), strand = c('+', '-', '+', '+', '-', > '+')) > > require(ChIPpeakAnno) > > peaksRangedData <- RangedData(space = peaksT$chr, > ranges = IRanges(start = peaksT$start, end = > peaksT$end)) > featuresRangedData <- RangedData(name = > featuresT$name, space = featuresT$chr, strand = > featuresT$strand, ranges = IRanges(start = > featuresT$start, end = featuresT$end)) > featureLoc <- "TSS" > > annotatePeakInBatch(peaksRangedData, > AnnotationData = featuresRangedData, > PeakLocForDistance = "middle") > > Error in if (as.character(r.n$strand[i]) == "1" || > as.character(r.n$strand[i]) == : > missing value where TRUE/FALSE needed > > My sessionInfo is : > > R version 2.11.0 (2010-04-22) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_AU.UTF-8 > LC_COLLATE=en_AU.UTF-8 > [5] LC_MONETARY=C > LC_MESSAGES=en_AU.UTF-8 > [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_AU.UTF-8 > LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils > datasets methods base > > other attached packages: > [1] ChIPpeakAnno_1.5.2 > limma_3.4.0 > > [3] org.Hs.eg.db_2.4.1 > GO.db_2.4.1 > > [5] RSQLite_0.9-0 DBI_0.2-5 > > [7] AnnotationDbi_1.10.0 > BSgenome.Ecoli.NCBI.20080805_1.3.16 > [9] BSgenome_1.16.1 > GenomicRanges_1.0.1 > > [11] Biostrings_2.16.0 > IRanges_1.6.2 > > [13] multtest_2.4.0 > Biobase_2.8.0 > > [15] biomaRt_2.4.0 > > loaded via a namespace (and not attached): > [1] MASS_7.3-6 RCurl_1.4-2 splines_2.11.0 > survival_2.35-8 > [5] XML_3.1-0 > > Thanks, > Dario. > > -------------------------------------- > Dario Strbenac > Research Assistant > Cancer Epigenetics > Garvan Institute of Medical Research > Darlinghurst NSW 2010 > Australia > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -------------------------------------- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia

ADD REPLY • link 15.7 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Hi Dario, Thanks for the vigorous test of the new feature! The peak dataset contains chrX_random that is not in the feature dataset. I added is.na check on the strand which should fix the problem (dev version 1.5.4). Please let me know if you encounter any problem. Best regards, Julie On 5/26/10 11:00 PM, "Dario Strbenac" <d.strbenac@garvan.org.au> wrote: Hello, Yes, I encountered the same problem again. This time I tried the code on my full table of data. This is my script. All the files it refers to are web accessible, so that you can replicate it too. I am definitely using version 1.5.3 of the package. CpGIslandsTable <- read.table("http://129.94.136.7/file_dump/dario/hg18_CpG_Islands.bed", sep = '\t', stringsAsFactors = FALSE) genesTable <- read.csv("http://129.94.136.7/file_dump/dario/humanGenom eAnnotation.csv", stringsAsFactors = FALSE) colnames(CpGIslandsTable) <- c("chr", "start", "end", "name") peaksRangedData <- RangedData(space = CpGIslandsTable$chr, ranges = IRanges(start = CpGIslandsTable$start, end = CpGIslandsTable$end)) featuresRangedData <- RangedData(name = genesTable$name, space = genesTable$chr, strand = genesTable$strand, ranges = IRanges(start = genesTable$start, end = genesTable$end)) featureLoc <- "TSS" annotatePeakInBatch(peaksRangedData, AnnotationData = featuresRangedData, PeakLocForDistance = "middle") > sessionInfo() R version 2.11.0 (2010-04-22) x86_64-pc-mingw32 locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C LC_TIME=English_Australia.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ChIPpeakAnno_1.5.3 limma_3.4.0 org.Hs.eg.db_2.4.1 GO.db_2.4.1 RSQLite_0.9-0 [6] DBI_0.2-5 AnnotationDbi_1.10.1 BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.0 GenomicRanges_1.0.1 [11] Biostrings_2.16.0 IRanges_1.6.0 multtest_2.4.0 Biobase_2.8.0 biomaRt_2.4.0 loaded via a namespace (and not attached): [1] MASS_7.3-5 RCurl_1.3-1 splines_2.11.0 survival_2.35-8 XML_2.8-1 ---- Original message ---- >Date: Mon, 24 May 2010 22:57:47 -0400 >From: "Zhu, Julie" <julie.zhu@umassmed.edu> >Subject: Re: [BioC] ChIPpeakAnno annotatePeakInBatch error message >To: "D.Strbenac@garvan.org.au" <d.strbenac@garvan.org.au>, "bioconductor@stat.math.ethz.ch" <bioconductor@stat.math.ethz.ch> > > Hi Dario, > > Please download dev 1.5.3 version of ChIPpeakAnno > and let me know if you encounter any problem. > Thanks! > > Best regards, > > Julie > > annotatePeakInBatch(peaksRangedData, AnnotationData > = featuresRangedData, PeakLocForDistance = "middle") > RangedData with 6 rows and 9 value columns across 2 > spaces > space ranges | peak > strand feature start_position end_position > insideFeature distancetoFeature > <character> <iranges> | <character> > <character> <character> <numeric> <numeric> > <character> <numeric> > 1 1 chr1 [ 2000010, 2000310] | 1 > + 1 1e+06 2.0e+06 > downstream 1000160 > 2 2 chr1 [19000000, 19000300] | 2 > - 2 1e+07 2.0e+07 > inside 999850 > 3 2 chr1 [30000000, 30000300] | 3 > - 2 1e+07 2.0e+07 > upstream -10000150 > 4 4 chr2 [ 300, 600] | 4 > + 4 1e+03 5.0e+03 > upstream -550 > 6 6 chr2 [ 100000, 100300] | 6 > + 6 1e+04 1.5e+04 > downstream 90150 > 5 5 chr2 [ 5500, 5800] | 5 > - 5 6e+03 7.0e+03 > downstream 1350 > shortestDistance fromOverlappingOrNearest > <numeric> <character> > 1 1 10 NearestStart > 2 2 999700 NearestStart > 3 2 10000000 NearestStart > 4 4 400 NearestStart > 6 6 85000 NearestStart > 5 5 200 NearestStart > > > sessionInfo() > R version 2.11.0 (2010-04-22) > i386-apple-darwin9.8.0 > > locale: > [1] > en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] ChIPpeakAnno_1.5.3 limma_3.4.0 > org.Hs.eg.db_2.4.1 > > [4] GO.db_2.4.1 > RSQLite_0.9-0 > DBI_0.2-5 > > [7] AnnotationDbi_1.10.1 > BSgenome.Ecoli.NCBI.20080805_1.3.16 > BSgenome_1.16.1 > [10] GenomicRanges_1.0.1 > Biostrings_2.16.0 > IRanges_1.6.1 > > [13] multtest_2.4.0 > Biobase_2.8.0 > biomaRt_2.4.0 > > > On 5/24/10 5:10 AM, "Dario Strbenac" > <d.strbenac@garvan.org.au> wrote: > > Hello, > > I made another small example of using > annoPeakInBatch to demonstrate to a friend, but it > has crashed. It's similar to the other example but > with different data. I'm not sure why it is > happening. > > Here is my small example: > > peaksT <- data.frame(chr = c("chr1", "chr1", > "chr1", "chr2", "chr2", "chr2"), start = > c(2000010, 19000000, 30000000, 300, 5500, 100000), > end = c(2000310, 19000300, 30000300, 600, 5800, > 100300)) > featuresT <- data.frame(name = c("gene1", "gene2", > "gene3", "gene4", "gene5", "gene6"), chr = > c("chr1", "chr1", "chr1", "chr2", "chr2", "chr2"), > start = c(1000000, 10000000, 15000000, 1000, 6000, > 10000), end = c(2000000, 20000000, 22000000, 5000, > 7000, 15000), strand = c('+', '-', '+', '+', '-', > '+')) > > require(ChIPpeakAnno) > > peaksRangedData <- RangedData(space = peaksT$chr, > ranges = IRanges(start = peaksT$start, end = > peaksT$end)) > featuresRangedData <- RangedData(name = > featuresT$name, space = featuresT$chr, strand = > featuresT$strand, ranges = IRanges(start = > featuresT$start, end = featuresT$end)) > featureLoc <- "TSS" > > annotatePeakInBatch(peaksRangedData, > AnnotationData = featuresRangedData, > PeakLocForDistance = "middle") > > Error in if (as.character(r.n$strand[i]) == "1" || > as.character(r.n$strand[i]) == : > missing value where TRUE/FALSE needed > > My sessionInfo is : > > R version 2.11.0 (2010-04-22) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_AU.UTF-8 > LC_COLLATE=en_AU.UTF-8 > [5] LC_MONETARY=C > LC_MESSAGES=en_AU.UTF-8 > [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_AU.UTF-8 > LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils > datasets methods base > > other attached packages: > [1] ChIPpeakAnno_1.5.2 > limma_3.4.0 > > [3] org.Hs.eg.db_2.4.1 > GO.db_2.4.1 > > [5] RSQLite_0.9-0 DBI_0.2-5 > > [7] AnnotationDbi_1.10.0 > BSgenome.Ecoli.NCBI.20080805_1.3.16 > [9] BSgenome_1.16.1 > GenomicRanges_1.0.1 > > [11] Biostrings_2.16.0 > IRanges_1.6.2 > > [13] multtest_2.4.0 > Biobase_2.8.0 > > [15] biomaRt_2.4.0 > > loaded via a namespace (and not attached): > [1] MASS_7.3-6 RCurl_1.4-2 splines_2.11.0 > survival_2.35-8 > [5] XML_3.1-0 > > Thanks, > Dario. > > -------------------------------------- > Dario Strbenac > Research Assistant > Cancer Epigenetics > Garvan Institute of Medical Research > Darlinghurst NSW 2010 > Australia > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -------------------------------------- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia [[alternative HTML version deleted]]

ADD REPLY • link 15.7 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Dario Strbenac ★ 1.6k

@dario-strbenac-5916

Last seen 4 weeks ago

Australia

Oh, thanks for this fix. I forgot to remove the chr*_random rows when I loaded the CpG Island BED file into R. Just one more point though. I just found that after chromosome 1, the annotated peaks and features were on different chromosomes in the spreadsheet you sent to me. I suppose this is because the CpG islands file is ordered chr1, chr2, chr3, ..., whereas the genes file is ASCII ordered (i.e. chr1, chr10, chr11, ...), and you merge the overlaps by list position. It would be important to make this requirement clear in the documentation (annotatePeakInBatch.Rd), or alternatively to make it not depend on these two tables having the same chromosome ordering. - Dario. ---- Original message ---- >Date: Thu, 27 May 2010 14:26:12 -0400 >From: "Zhu, Julie" <julie.zhu at="" umassmed.edu=""> >Subject: Re: [BioC] ChIPpeakAnno annotatePeakInBatch error message >To: "D.Strbenac at garvan.org.au" <d.strbenac at="" garvan.org.au="">, "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> > > Hi Dario, > > Thanks for the vigorous test of the new feature! > > The peak dataset contains chrX_random that is not in > the feature dataset. I added is.na check on the > strand which should fix the problem. I also attached > the annotated Dataset. Please let me know if you > encounter any problem. > > Best regards, > > Julie > > On 5/26/10 11:00 PM, "Dario Strbenac" > <d.strbenac at="" garvan.org.au=""> wrote: > > Hello, > > Yes, I encountered the same problem again. This > time I tried the code on my full table of data. > This is my script. All the files it refers to are > web accessible, so that you can replicate it too. > I am definitely using version 1.5.3 of the > package. > > CpGIslandsTable <- > read.table("http://129.94.136.7/file_dump/dario/hg18_CpG_Islands.bed", > sep = '\t', stringsAsFactors = FALSE) > genesTable <- > read.csv("http://129.94.136.7/file_dump/dario/humanGenomeAnnotat ion.csv", > stringsAsFactors = FALSE) > colnames(CpGIslandsTable) <- c("chr", "start", > "end", "name") > > peaksRangedData <- RangedData(space = > CpGIslandsTable$chr, ranges = IRanges(start = > CpGIslandsTable$start, end = CpGIslandsTable$end)) > featuresRangedData <- RangedData(name = > genesTable$name, space = genesTable$chr, strand = > genesTable$strand, ranges = IRanges(start = > genesTable$start, end = genesTable$end)) > featureLoc <- "TSS" > > annotatePeakInBatch(peaksRangedData, > AnnotationData = featuresRangedData, > PeakLocForDistance = "middle") > > > sessionInfo() > R version 2.11.0 (2010-04-22) > x86_64-pc-mingw32 > > locale: > [1] LC_COLLATE=English_Australia.1252 > LC_CTYPE=English_Australia.1252 > LC_MONETARY=English_Australia.1252 LC_NUMERIC=C > LC_TIME=English_Australia.1252 > > > attached base packages: > [1] stats graphics grDevices utils > datasets methods base > > other attached packages: > [1] ChIPpeakAnno_1.5.3 > limma_3.4.0 > org.Hs.eg.db_2.4.1 > GO.db_2.4.1 > RSQLite_0.9-0 > > [6] DBI_0.2-5 > AnnotationDbi_1.10.1 > BSgenome.Ecoli.NCBI.20080805_1.3.16 > BSgenome_1.16.0 > GenomicRanges_1.0.1 > > [11] Biostrings_2.16.0 > IRanges_1.6.0 > multtest_2.4.0 > Biobase_2.8.0 > biomaRt_2.4.0 > > > loaded via a namespace (and not attached): > [1] MASS_7.3-5 RCurl_1.3-1 splines_2.11.0 > survival_2.35-8 XML_2.8-1 > > ---- Original message ---- > >Date: Mon, 24 May 2010 22:57:47 -0400 > >From: "Zhu, Julie" <julie.zhu at="" umassmed.edu=""> > >Subject: Re: [BioC] ChIPpeakAnno > annotatePeakInBatch error message > >To: "D.Strbenac at garvan.org.au" > <d.strbenac at="" garvan.org.au="">, > "bioconductor at stat.math.ethz.ch" > <bioconductor at="" stat.math.ethz.ch=""> > > > > Hi Dario, > > > > Please download dev 1.5.3 version of > ChIPpeakAnno > > and let me know if you encounter any problem. > > Thanks! > > > > Best regards, > > > > Julie > > > > annotatePeakInBatch(peaksRangedData, > AnnotationData > > = featuresRangedData, PeakLocForDistance = > "middle") > > RangedData with 6 rows and 9 value columns > across 2 > > spaces > > space ranges | > peak > > strand feature start_position > end_position > > insideFeature distancetoFeature > > <character> <iranges> | > <character> > > <character> <character> <numeric> > <numeric> > > <character> <numeric> > > 1 1 chr1 [ 2000010, 2000310] | > 1 > > + 1 1e+06 > 2.0e+06 > > downstream 1000160 > > 2 2 chr1 [19000000, 19000300] | > 2 > > - 2 1e+07 > 2.0e+07 > > inside 999850 > > 3 2 chr1 [30000000, 30000300] | > 3 > > - 2 1e+07 > 2.0e+07 > > upstream -10000150 > > 4 4 chr2 [ 300, 600] | > 4 > > + 4 1e+03 > 5.0e+03 > > upstream -550 > > 6 6 chr2 [ 100000, 100300] | > 6 > > + 6 1e+04 > 1.5e+04 > > downstream 90150 > > 5 5 chr2 [ 5500, 5800] | > 5 > > - 5 6e+03 > 7.0e+03 > > downstream 1350 > > shortestDistance fromOverlappingOrNearest > > <numeric> <character> > > 1 1 10 NearestStart > > 2 2 999700 NearestStart > > 3 2 10000000 NearestStart > > 4 4 400 NearestStart > > 6 6 85000 NearestStart > > 5 5 200 NearestStart > > > > > sessionInfo() > > R version 2.11.0 (2010-04-22) > > i386-apple-darwin9.8.0 > > > > locale: > > [1] > > > en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > > > attached base packages: > > [1] stats graphics grDevices utils > datasets > > methods base > > > > other attached packages: > > [1] ChIPpeakAnno_1.5.3 > limma_3.4.0 > > org.Hs.eg.db_2.4.1 > > > > [4] GO.db_2.4.1 > > RSQLite_0.9-0 > > DBI_0.2-5 > > > > [7] AnnotationDbi_1.10.1 > > > BSgenome.Ecoli.NCBI.20080805_1.3.16 > > BSgenome_1.16.1 > > [10] GenomicRanges_1.0.1 > > Biostrings_2.16.0 > > IRanges_1.6.1 > > > > [13] multtest_2.4.0 > > Biobase_2.8.0 > > biomaRt_2.4.0 > > > > > > On 5/24/10 5:10 AM, "Dario Strbenac" > > <d.strbenac at="" garvan.org.au=""> wrote: > > > > Hello, > > > > I made another small example of using > > annoPeakInBatch to demonstrate to a friend, > but it > > has crashed. It's similar to the other > example but > > with different data. I'm not sure why it is > > happening. > > > > Here is my small example: > > > > peaksT <- data.frame(chr = c("chr1", "chr1", > > "chr1", "chr2", "chr2", "chr2"), start = > > c(2000010, 19000000, 30000000, 300, 5500, > 100000), > > end = c(2000310, 19000300, 30000300, 600, > 5800, > > 100300)) > > featuresT <- data.frame(name = c("gene1", > "gene2", > > "gene3", "gene4", "gene5", "gene6"), chr = > > c("chr1", "chr1", "chr1", "chr2", "chr2", > "chr2"), > > start = c(1000000, 10000000, 15000000, 1000, > 6000, > > 10000), end = c(2000000, 20000000, 22000000, > 5000, > > 7000, 15000), strand = c('+', '-', '+', '+', > '-', > > '+')) > > > > require(ChIPpeakAnno) > > > > peaksRangedData <- RangedData(space = > peaksT$chr, > > ranges = IRanges(start = peaksT$start, end = > > peaksT$end)) > > featuresRangedData <- RangedData(name = > > featuresT$name, space = featuresT$chr, > strand = > > featuresT$strand, ranges = IRanges(start = > > featuresT$start, end = featuresT$end)) > > featureLoc <- "TSS" > > > > annotatePeakInBatch(peaksRangedData, > > AnnotationData = featuresRangedData, > > PeakLocForDistance = "middle") > > > > Error in if (as.character(r.n$strand[i]) == > "1" || > > as.character(r.n$strand[i]) == : > > missing value where TRUE/FALSE needed > > > > My sessionInfo is : > > > > R version 2.11.0 (2010-04-22) > > x86_64-unknown-linux-gnu > > > > locale: > > [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C > > > > [3] LC_TIME=en_AU.UTF-8 > > LC_COLLATE=en_AU.UTF-8 > > [5] LC_MONETARY=C > > LC_MESSAGES=en_AU.UTF-8 > > [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C > > > > [9] LC_ADDRESS=C > LC_TELEPHONE=C > > > > [11] LC_MEASUREMENT=en_AU.UTF-8 > > LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices utils > > datasets methods base > > > > other attached packages: > > [1] ChIPpeakAnno_1.5.2 > > limma_3.4.0 > > > > [3] org.Hs.eg.db_2.4.1 > > GO.db_2.4.1 > > > > [5] RSQLite_0.9-0 > DBI_0.2-5 > > > > [7] AnnotationDbi_1.10.0 > > > BSgenome.Ecoli.NCBI.20080805_1.3.16 > > [9] BSgenome_1.16.1 > > GenomicRanges_1.0.1 > > > > [11] Biostrings_2.16.0 > > IRanges_1.6.2 > > > > [13] multtest_2.4.0 > > Biobase_2.8.0 > > > > [15] biomaRt_2.4.0 > > > > loaded via a namespace (and not attached): > > [1] MASS_7.3-6 RCurl_1.4-2 > splines_2.11.0 > > survival_2.35-8 > > [5] XML_3.1-0 > > > > Thanks, > > Dario. > > > > -------------------------------------- > > Dario Strbenac > > Research Assistant > > Cancer Epigenetics > > Garvan Institute of Medical Research > > Darlinghurst NSW 2010 > > Australia > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -------------------------------------- > Dario Strbenac > Research Assistant > Cancer Epigenetics > Garvan Institute of Medical Research > Darlinghurst NSW 2010 > Australia >________________ >ForDarioStrbenac.xls (4489k bytes) -------------------------------------- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia

ADD COMMENT • link 15.7 years ago Dario Strbenac ★ 1.6k

0

Entering edit mode

Hi Dario, Thanks for verifying the fix! The order of space is auto-generated with RangedData. If you type the following command with your input dataset (peak and feature), you will see that the space is ordered the same way for the input datasets. unique(space(featuresRangedData)) [1] "chr1" "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18" "chr19" "chr2" "chr20" "chr21" "chr22" "chr3" "chr4" [18] "chr5" "chr6" "chr7" "chr8" "chr9" "chrX" "chrY" unique(space(peaksRangedData)) [1] "chr1" "chr1_random" "chr10" "chr10_random" "chr11" "chr11_random" "chr12" "chr13" [9] "chr13_random" "chr14" "chr15" "chr15_random" "chr16" "chr16_random" "chr17" "chr17_random" [17] "chr18" "chr19" "chr2" "chr2_random" "chr20" "chr21" "chr21_random" "chr22" [25] "chr22_random" "chr3" "chr3_random" "chr4" "chr4_random" "chr5" "chr5_h2_hap1" "chr5_random" [33] "chr6" "chr6_cox_hap1" "chr6_qbl_hap2" "chr6_random" "chr7" "chr7_random" "chr8" "chr8_random" [41] "chr9" "chr9_random" "chrX" "chrX_random" "chrY" unique(space(annP)) [1] "chr1" "chr1_random" "chr10" "chr10_random" "chr11" "chr11_random" "chr12" "chr13" [9] "chr13_random" "chr14" "chr15" "chr15_random" "chr16" "chr16_random" "chr17" "chr17_random" [17] "chr18" "chr19" "chr2" "chr2_random" "chr20" "chr21" "chr21_random" "chr22" [25] "chr22_random" "chr3" "chr3_random" "chr4" "chr4_random" "chr5" "chr5_h2_hap1" "chr5_random" [33] "chr6" "chr6_cox_hap1" "chr6_qbl_hap2" "chr6_random" "chr7" "chr7_random" "chr8" "chr8_random" [41] "chr9" "chr9_random" "chrX" "chrX_random" "chrY" If you want the output to be written in a certain sorted way, you can save it as a data.frame then sort the data frame before writing to the Excel file. Hope it makes sense. Best regards, Julie On 5/27/10 10:36 PM, "Dario Strbenac" <d.strbenac@garvan.org.au> wrote: Oh, thanks for this fix. I forgot to remove the chr*_random rows when I loaded the CpG Island BED file into R. Just one more point though. I just found that after chromosome 1, the annotated peaks and features were on different chromosomes in the spreadsheet you sent to me. I suppose this is because the CpG islands file is ordered chr1, chr2, chr3, ..., whereas the genes file is ASCII ordered (i.e. chr1, chr10, chr11, ...), and you merge the overlaps by list position. It would be important to make this requirement clear in the documentation (annotatePeakInBatch.Rd), or alternatively to make it not depend on these two tables having the same chromosome ordering. - Dario. ---- Original message ---- >Date: Thu, 27 May 2010 14:26:12 -0400 >From: "Zhu, Julie" <julie.zhu@umassmed.edu> >Subject: Re: [BioC] ChIPpeakAnno annotatePeakInBatch error message >To: "D.Strbenac@garvan.org.au" <d.strbenac@garvan.org.au>, "bioconductor@stat.math.ethz.ch" <bioconductor@stat.math.ethz.ch> > > Hi Dario, > > Thanks for the vigorous test of the new feature! > > The peak dataset contains chrX_random that is not in > the feature dataset. I added is.na check on the > strand which should fix the problem. I also attached > the annotated Dataset. Please let me know if you > encounter any problem. > > Best regards, > > Julie > > On 5/26/10 11:00 PM, "Dario Strbenac" > <d.strbenac@garvan.org.au> wrote: > > Hello, > > Yes, I encountered the same problem again. This > time I tried the code on my full table of data. > This is my script. All the files it refers to are > web accessible, so that you can replicate it too. > I am definitely using version 1.5.3 of the > package. > > CpGIslandsTable <- > read.table("http://129.94.136.7/file_dump/dario/hg18_CpG_Islands.bed", > sep = '\t', stringsAsFactors = FALSE) > genesTable <- > read.csv("http://129.94.136.7/file_dump/dario/humanGenomeAnnotat ion.csv", > stringsAsFactors = FALSE) > colnames(CpGIslandsTable) <- c("chr", "start", > "end", "name") > > peaksRangedData <- RangedData(space = > CpGIslandsTable$chr, ranges = IRanges(start = > CpGIslandsTable$start, end = CpGIslandsTable$end)) > featuresRangedData <- RangedData(name = > genesTable$name, space = genesTable$chr, strand = > genesTable$strand, ranges = IRanges(start = > genesTable$start, end = genesTable$end)) > featureLoc <- "TSS" > > annotatePeakInBatch(peaksRangedData, > AnnotationData = featuresRangedData, > PeakLocForDistance = "middle") > > > sessionInfo() > R version 2.11.0 (2010-04-22) > x86_64-pc-mingw32 > > locale: > [1] LC_COLLATE=English_Australia.1252 > LC_CTYPE=English_Australia.1252 > LC_MONETARY=English_Australia.1252 LC_NUMERIC=C > LC_TIME=English_Australia.1252 > > > attached base packages: > [1] stats graphics grDevices utils > datasets methods base > > other attached packages: > [1] ChIPpeakAnno_1.5.3 > limma_3.4.0 > org.Hs.eg.db_2.4.1 > GO.db_2.4.1 > RSQLite_0.9-0 > > [6] DBI_0.2-5 > AnnotationDbi_1.10.1 > BSgenome.Ecoli.NCBI.20080805_1.3.16 > BSgenome_1.16.0 > GenomicRanges_1.0.1 > > [11] Biostrings_2.16.0 > IRanges_1.6.0 > multtest_2.4.0 > Biobase_2.8.0 > biomaRt_2.4.0 > > > loaded via a namespace (and not attached): > [1] MASS_7.3-5 RCurl_1.3-1 splines_2.11.0 > survival_2.35-8 XML_2.8-1 > > ---- Original message ---- > >Date: Mon, 24 May 2010 22:57:47 -0400 > >From: "Zhu, Julie" <julie.zhu@umassmed.edu> > >Subject: Re: [BioC] ChIPpeakAnno > annotatePeakInBatch error message > >To: "D.Strbenac@garvan.org.au" > <d.strbenac@garvan.org.au>, > "bioconductor@stat.math.ethz.ch" > <bioconductor@stat.math.ethz.ch> > > > > Hi Dario, > > > > Please download dev 1.5.3 version of > ChIPpeakAnno > > and let me know if you encounter any problem. > > Thanks! > > > > Best regards, > > > > Julie > > > > annotatePeakInBatch(peaksRangedData, > AnnotationData > > = featuresRangedData, PeakLocForDistance = > "middle") > > RangedData with 6 rows and 9 value columns > across 2 > > spaces > > space ranges | > peak > > strand feature start_position > end_position > > insideFeature distancetoFeature > > <character> <iranges> | > <character> > > <character> <character> <numeric> > <numeric> > > <character> <numeric> > > 1 1 chr1 [ 2000010, 2000310] | > 1 > > + 1 1e+06 > 2.0e+06 > > downstream 1000160 > > 2 2 chr1 [19000000, 19000300] | > 2 > > - 2 1e+07 > 2.0e+07 > > inside 999850 > > 3 2 chr1 [30000000, 30000300] | > 3 > > - 2 1e+07 > 2.0e+07 > > upstream -10000150 > > 4 4 chr2 [ 300, 600] | > 4 > > + 4 1e+03 > 5.0e+03 > > upstream -550 > > 6 6 chr2 [ 100000, 100300] | > 6 > > + 6 1e+04 > 1.5e+04 > > downstream 90150 > > 5 5 chr2 [ 5500, 5800] | > 5 > > - 5 6e+03 > 7.0e+03 > > downstream 1350 > > shortestDistance fromOverlappingOrNearest > > <numeric> <character> > > 1 1 10 NearestStart > > 2 2 999700 NearestStart > > 3 2 10000000 NearestStart > > 4 4 400 NearestStart > > 6 6 85000 NearestStart > > 5 5 200 NearestStart > > > > > sessionInfo() > > R version 2.11.0 (2010-04-22) > > i386-apple-darwin9.8.0 > > > > locale: > > [1] > > > en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > > > attached base packages: > > [1] stats graphics grDevices utils > datasets > > methods base > > > > other attached packages: > > [1] ChIPpeakAnno_1.5.3 > limma_3.4.0 > > org.Hs.eg.db_2.4.1 > > > > [4] GO.db_2.4.1 > > RSQLite_0.9-0 > > DBI_0.2-5 > > > > [7] AnnotationDbi_1.10.1 > > > BSgenome.Ecoli.NCBI.20080805_1.3.16 > > BSgenome_1.16.1 > > [10] GenomicRanges_1.0.1 > > Biostrings_2.16.0 > > IRanges_1.6.1 > > > > [13] multtest_2.4.0 > > Biobase_2.8.0 > > biomaRt_2.4.0 > > > > > > On 5/24/10 5:10 AM, "Dario Strbenac" > > <d.strbenac@garvan.org.au> wrote: > > > > Hello, > > > > I made another small example of using > > annoPeakInBatch to demonstrate to a friend, > but it > > has crashed. It's similar to the other > example but > > with different data. I'm not sure why it is > > happening. > > > > Here is my small example: > > > > peaksT <- data.frame(chr = c("chr1", "chr1", > > "chr1", "chr2", "chr2", "chr2"), start = > > c(2000010, 19000000, 30000000, 300, 5500, > 100000), > > end = c(2000310, 19000300, 30000300, 600, > 5800, > > 100300)) > > featuresT <- data.frame(name = c("gene1", > "gene2", > > "gene3", "gene4", "gene5", "gene6"), chr = > > c("chr1", "chr1", "chr1", "chr2", "chr2", > "chr2"), > > start = c(1000000, 10000000, 15000000, 1000, > 6000, > > 10000), end = c(2000000, 20000000, 22000000, > 5000, > > 7000, 15000), strand = c('+', '-', '+', '+', > '-', > > '+')) > > > > require(ChIPpeakAnno) > > > > peaksRangedData <- RangedData(space = > peaksT$chr, > > ranges = IRanges(start = peaksT$start, end = > > peaksT$end)) > > featuresRangedData <- RangedData(name = > > featuresT$name, space = featuresT$chr, > strand = > > featuresT$strand, ranges = IRanges(start = > > featuresT$start, end = featuresT$end)) > > featureLoc <- "TSS" > > > > annotatePeakInBatch(peaksRangedData, > > AnnotationData = featuresRangedData, > > PeakLocForDistance = "middle") > > > > Error in if (as.character(r.n$strand[i]) == > "1" || > > as.character(r.n$strand[i]) == : > > missing value where TRUE/FALSE needed > > > > My sessionInfo is : > > > > R version 2.11.0 (2010-04-22) > > x86_64-unknown-linux-gnu > > > > locale: > > [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C > > > > [3] LC_TIME=en_AU.UTF-8 > > LC_COLLATE=en_AU.UTF-8 > > [5] LC_MONETARY=C > > LC_MESSAGES=en_AU.UTF-8 > > [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C > > > > [9] LC_ADDRESS=C > LC_TELEPHONE=C > > > > [11] LC_MEASUREMENT=en_AU.UTF-8 > > LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices utils > > datasets methods base > > > > other attached packages: > > [1] ChIPpeakAnno_1.5.2 > > limma_3.4.0 > > > > [3] org.Hs.eg.db_2.4.1 > > GO.db_2.4.1 > > > > [5] RSQLite_0.9-0 > DBI_0.2-5 > > > > [7] AnnotationDbi_1.10.0 > > > BSgenome.Ecoli.NCBI.20080805_1.3.16 > > [9] BSgenome_1.16.1 > > GenomicRanges_1.0.1 > > > > [11] Biostrings_2.16.0 > > IRanges_1.6.2 > > > > [13] multtest_2.4.0 > > Biobase_2.8.0 > > > > [15] biomaRt_2.4.0 > > > > loaded via a namespace (and not attached): > > [1] MASS_7.3-6 RCurl_1.4-2 > splines_2.11.0 > > survival_2.35-8 > > [5] XML_3.1-0 > > > > Thanks, > > Dario. > > > > -------------------------------------- > > Dario Strbenac > > Research Assistant > > Cancer Epigenetics > > Garvan Institute of Medical Research > > Darlinghurst NSW 2010 > > Australia > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -------------------------------------- > Dario Strbenac > Research Assistant > Cancer Epigenetics > Garvan Institute of Medical Research > Darlinghurst NSW 2010 > Australia >________________ >ForDarioStrbenac.xls (4489k bytes) -------------------------------------- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia [[alternative HTML version deleted]]

ADD REPLY • link 15.7 years ago Julie Zhu ★ 4.3k

Login before adding your answer.