Question RE: getEnrichedGo in ChIPpeakAnno package

0

Entering edit mode

Noah Dowell ▴ 410

@noah-dowell-3791

Last seen 9.6 years ago

Hello All, I couldn't find a solution to my question in the archives and my attempts have been unsuccessful so hopefully someone has some advice. I have analyzed my yeast ChIP-chip tiling array data using Starr and converted my list of chip-enriched regions to RangedData to make use of the peakOverlap and GOenrichment functions in ChIPpeakAnno. The annotatePeakInBatch function has worked nicely but I am stuck with the getEnrichedGO function. I think the problem may be due to differences between the org.Hs.eg.db and org.Sc.sgd.db. The org.Hs.eg.db has a mapping of ENSEMBL gene accession numbers to Entrez Gene identifiers, but the org.Sc.sgd.db completely lacks this and uses a mapping to SGD Gene Identifiers. As far as I can tell the getEnrichedGO function calls for a mapping to Entrez Gene ids thus the error I am showing below. Does anyone know of a work around for this? Thank you for your help. Noah >library(org.Sc.sgd.db) > goTest <- getEnrichedGO(annoPeakChr1data, orgAnn = "org.Sc.sgd.db", maxP = 0.01, multiAdj =TRUE, minGOterm = 10, multiAdjMethod = "BH") Error in get(paste(GOgenome, "ENSEMBL2EG", sep = "")) : object 'org.Sc.sgdENSEMBL2EG' not found ##also tried: > goTest <- getEnrichedGO(annoPeakChr1data, orgAnn = "org.Sc.sgd.db", feature_id_type= "ensembl_gene_id", maxP = 0.01, multiAdj =TRUE, minGOterm = 10, multiAdjMethod = "BH") Error in get(paste(GOgenome, "ENSEMBL2EG", sep = "")) : object 'org.Sc.sgdENSEMBL2EG' not found #### here is what my annotatePeak Object looks like: > head(annoPeakChr1data) RangedData with 6 rows and 9 value columns across 1 space space ranges | peak strand feature start_position end_position <character> <iranges> | <character> <character> <character> <numeric> <numeric> 01 YAL069W I [ 16, 254] | 01 1 YAL069W 335 649 02 YAL067W-A I [ 2731, 2924] | 02 1 YAL067W-A 2480 2707 06 YAL062W I [29935, 29959] | 06 1 YAL062W 31568 32941 07 YAL062W I [30011, 30039] | 07 1 YAL062W 31568 32941 08 YAL062W I [31661, 31678] | 08 1 YAL062W 31568 32941 09 YAL062W I [31702, 31710] | 09 1 YAL062W 31568 32941 > sessionInfo() R version 2.11.0 (2010-04-22) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] org.Sc.sgd.db_2.4.1 rtracklayer_1.8.1 RCurl_1.3-1 bitops_1.0-4.1 [5] Starr_1.4.0 affxparser_1.20.0 affy_1.26.0 Ringo_1.12.0 [9] Matrix_0.999375-38 lattice_0.18-5 RColorBrewer_1.0-2 ChIPpeakAnno_1.4.0 [13] limma_3.4.0 org.Hs.eg.db_2.4.1 GO.db_2.4.1 RSQLite_0.8-4 [17] DBI_0.2-5 AnnotationDbi_1.10.0 BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.0 [21] GenomicRanges_1.0.1 Biostrings_2.16.0 IRanges_1.6.0 multtest_2.4.0 [25] Biobase_2.8.0 biomaRt_2.4.0 loaded via a namespace (and not attached): [1] affyio_1.16.0 annotate_1.26.0 genefilter_1.30.0 MASS_7.3-5 preprocessCore_1.10.0 pspline_1.0-14 [7] splines_2.11.0 survival_2.35-8 tools_2.11.0 XML_2.8-1 xtable_1.5-6

Yeast ChIPpeakAnno Starr Yeast ChIPpeakAnno Starr • 1.6k views

ADD COMMENT • link updated 13.9 years ago by Julie Zhu ★ 4.3k • written 13.9 years ago by Noah Dowell ▴ 410

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 5 months ago

United States

Hi Noah, Yes, you are right that this is due to the differences between the org.Hs.eg.db and org.Sc.sgd.db. The org.Hs.eg.db is Entrez ID centric while org.Sc.sgd.db is orf centric. It would be nice if all the org.*.*.dbs have similar data structure and mapping. For now, I would suggest call getEnrichedGO function with a list of orfs using the following syntax. You need to first convert the list of Ensembl ID to orfs first. enrichedGO.Cse4 <- getEnrichedGO (orfs, feature_id_type="entrez_id", orgAnn="org.Sc.sgd.db", maxP=0.05, multiAdj =TRUE, minGOterm=5, multiAdjMethod="BH") Best regards, Julie On 6/10/10 4:15 PM, "Noah Dowell" <noahd@ucla.edu> wrote: Hello All, I couldn't find a solution to my question in the archives and my attempts have been unsuccessful so hopefully someone has some advice. I have analyzed my yeast ChIP-chip tiling array data using Starr and converted my list of chip-enriched regions to RangedData to make use of the peakOverlap and GOenrichment functions in ChIPpeakAnno. The annotatePeakInBatch function has worked nicely but I am stuck with the getEnrichedGO function. I think the problem may be due to differences between the org.Hs.eg.db and org.Sc.sgd.db. The org.Hs.eg.db has a mapping of ENSEMBL gene accession numbers to Entrez Gene identifiers, but the org.Sc.sgd.db completely lacks this and uses a mapping to SGD Gene Identifiers. As far as I can tell the getEnrichedGO function calls for a mapping to Entrez Gene ids thus the error I am showing below. Does anyone know of a work around for this? Thank you for your help. Noah >library(org.Sc.sgd.db) > goTest <- getEnrichedGO(annoPeakChr1data, orgAnn = "org.Sc.sgd.db", maxP = 0.01, multiAdj =TRUE, minGOterm = 10, multiAdjMethod = "BH") Error in get(paste(GOgenome, "ENSEMBL2EG", sep = "")) : object 'org.Sc.sgdENSEMBL2EG' not found ##also tried: > goTest <- getEnrichedGO(annoPeakChr1data, orgAnn = "org.Sc.sgd.db", feature_id_type= "ensembl_gene_id", maxP = 0.01, multiAdj =TRUE, minGOterm = 10, multiAdjMethod = "BH") Error in get(paste(GOgenome, "ENSEMBL2EG", sep = "")) : object 'org.Sc.sgdENSEMBL2EG' not found #### here is what my annotatePeak Object looks like: > head(annoPeakChr1data) RangedData with 6 rows and 9 value columns across 1 space space ranges | peak strand feature start_position end_position <character> <iranges> | <character> <character> <character> <numeric> <numeric> 01 YAL069W I [ 16, 254] | 01 1 YAL069W 335 649 02 YAL067W-A I [ 2731, 2924] | 02 1 YAL067W-A 2480 2707 06 YAL062W I [29935, 29959] | 06 1 YAL062W 31568 32941 07 YAL062W I [30011, 30039] | 07 1 YAL062W 31568 32941 08 YAL062W I [31661, 31678] | 08 1 YAL062W 31568 32941 09 YAL062W I [31702, 31710] | 09 1 YAL062W 31568 32941 > sessionInfo() R version 2.11.0 (2010-04-22) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] org.Sc.sgd.db_2.4.1 rtracklayer_1.8.1 RCurl_1.3-1 bitops_1.0-4.1 [5] Starr_1.4.0 affxparser_1.20.0 affy_1.26.0 Ringo_1.12.0 [9] Matrix_0.999375-38 lattice_0.18-5 RColorBrewer_1.0-2 ChIPpeakAnno_1.4.0 [13] limma_3.4.0 org.Hs.eg.db_2.4.1 GO.db_2.4.1 RSQLite_0.8-4 [17] DBI_0.2-5 AnnotationDbi_1.10.0 BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.0 [21] GenomicRanges_1.0.1 Biostrings_2.16.0 IRanges_1.6.0 multtest_2.4.0 [25] Biobase_2.8.0 biomaRt_2.4.0 loaded via a namespace (and not attached): [1] affyio_1.16.0 annotate_1.26.0 genefilter_1.30.0 MASS_7.3-5 preprocessCore_1.10.0 pspline_1.0-14 [7] splines_2.11.0 survival_2.35-8 tools_2.11.0 XML_2.8-1 xtable_1.5-6 _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD COMMENT • link 13.9 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Hi Noah, Yeast is difficult because that community has a strong preference for their classic IDs. The same is true in arabidopsis. This is why those two organism packages have "sgd" and "tair" in their respective package names. I have managed to keep the rest of the org packages entrez gene centric however. Another thing you can do if you find yourself with a similar problem is to use the org.Sc.sgdENTREZID provided by the ord.Sc.sgd.db package. The package may be orf centric, but you can still map to an entrez gene ID if you use the org.Sc.sgdENTREZID mapping. Marc On 06/10/2010 01:53 PM, Zhu, Julie wrote: > Hi Noah, > > Yes, you are right that this is due to the differences between the org.Hs.eg.db and org.Sc.sgd.db. The org.Hs.eg.db is Entrez ID centric while org.Sc.sgd.db is orf centric. It would be nice if all the org.*.*.dbs have similar data structure and mapping. For now, I would suggest call getEnrichedGO function with a list of orfs using the following syntax. You need to first convert the list of Ensembl ID to orfs first. > > enrichedGO.Cse4 <- getEnrichedGO (orfs, feature_id_type="entrez_id", orgAnn="org.Sc.sgd.db", maxP=0.05, multiAdj =TRUE, minGOterm=5, multiAdjMethod="BH") > > Best regards, > > Julie > > > > On 6/10/10 4:15 PM, "Noah Dowell" <noahd at="" ucla.edu=""> wrote: > > Hello All, > > I couldn't find a solution to my question in the archives and my attempts have been unsuccessful so hopefully someone has some advice. > > I have analyzed my yeast ChIP-chip tiling array data using Starr and converted my list of chip-enriched regions to RangedData to make use of the peakOverlap and GOenrichment functions in ChIPpeakAnno. The annotatePeakInBatch function has worked nicely but I am stuck with the getEnrichedGO function. I think the problem may be due to differences between the org.Hs.eg.db and org.Sc.sgd.db. The org.Hs.eg.db has a mapping of ENSEMBL gene accession numbers to Entrez Gene identifiers, but the org.Sc.sgd.db completely lacks this and uses a mapping to SGD Gene Identifiers. As far as I can tell the getEnrichedGO function calls for a mapping to Entrez Gene ids thus the error I am showing below. > > Does anyone know of a work around for this? > > Thank you for your help. > > Noah > > > >> library(org.Sc.sgd.db) >> > >> goTest <- getEnrichedGO(annoPeakChr1data, orgAnn = "org.Sc.sgd.db", maxP = 0.01, multiAdj =TRUE, minGOterm = 10, multiAdjMethod = "BH") >> > Error in get(paste(GOgenome, "ENSEMBL2EG", sep = "")) : > object 'org.Sc.sgdENSEMBL2EG' not found > > ##also tried: > > >> goTest <- getEnrichedGO(annoPeakChr1data, orgAnn = "org.Sc.sgd.db", feature_id_type= "ensembl_gene_id", maxP = 0.01, multiAdj =TRUE, minGOterm = 10, multiAdjMethod = "BH") >> > Error in get(paste(GOgenome, "ENSEMBL2EG", sep = "")) : > object 'org.Sc.sgdENSEMBL2EG' not found > > #### here is what my annotatePeak Object looks like: > > >> head(annoPeakChr1data) >> > RangedData with 6 rows and 9 value columns across 1 space > space ranges | peak strand feature start_position end_position > <character> <iranges> | <character> <character> <character> <numeric> <numeric> > 01 YAL069W I [ 16, 254] | 01 1 YAL069W 335 649 > 02 YAL067W-A I [ 2731, 2924] | 02 1 YAL067W-A 2480 2707 > 06 YAL062W I [29935, 29959] | 06 1 YAL062W 31568 32941 > 07 YAL062W I [30011, 30039] | 07 1 YAL062W 31568 32941 > 08 YAL062W I [31661, 31678] | 08 1 YAL062W 31568 32941 > 09 YAL062W I [31702, 31710] | 09 1 YAL062W 31568 32941 > > > > > > > > > > >> sessionInfo() >> > R version 2.11.0 (2010-04-22) > i386-apple-darwin9.8.0 > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods base > > other attached packages: > [1] org.Sc.sgd.db_2.4.1 rtracklayer_1.8.1 RCurl_1.3-1 bitops_1.0-4.1 > [5] Starr_1.4.0 affxparser_1.20.0 affy_1.26.0 Ringo_1.12.0 > [9] Matrix_0.999375-38 lattice_0.18-5 RColorBrewer_1.0-2 ChIPpeakAnno_1.4.0 > [13] limma_3.4.0 org.Hs.eg.db_2.4.1 GO.db_2.4.1 RSQLite_0.8-4 > [17] DBI_0.2-5 AnnotationDbi_1.10.0 BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.0 > [21] GenomicRanges_1.0.1 Biostrings_2.16.0 IRanges_1.6.0 multtest_2.4.0 > [25] Biobase_2.8.0 biomaRt_2.4.0 > > loaded via a namespace (and not attached): > [1] affyio_1.16.0 annotate_1.26.0 genefilter_1.30.0 MASS_7.3-5 preprocessCore_1.10.0 pspline_1.0-14 > [7] splines_2.11.0 survival_2.35-8 tools_2.11.0 XML_2.8-1 xtable_1.5-6 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 13.9 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Thank you Julie and Marc! Your suggestion to use a list of ORFs worked great. This package you have created is excellent and has been of tremendous help in my research. Two questions: 1. Is there a simple subsetting routine or function allowing me to access the specific gene names for each GO term that shows significant enrichment? 2. An unrelated question regarding the findOverlappingPeaks function. I am getting the following error: > tOverlap <- findOverlappingPeaks(rd1, rd1b, maxgap= 50, multiple = TRUE, NameOfPeaks1= "tf1", NameOfPeaks2 = "tf2") Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent In looking at your example RangedData and comparing to my rd1 and rd1b objects I notice obvious differences. Here are the top of my RangedData: > head(rd1) RangedData with 6 rows and 2 value columns across 1 space space ranges | site score <character> <iranges> | <factor> <numeric> 1 chrI [ 16, 295] | chr1.cher1 56.7815947 2 chrI [ 5995, 6027] | chr1.cher2 0.3204913 3 chrI [12996, 13171] | chr1.cher3 3.9056628 4 chrI [25185, 25245] | chr1.cher4 1.4921349 5 chrI [31578, 31824] | chr1.cher5 21.7538769 6 chrI [32967, 33125] | chr1.cher6 14.2653774 > head(rd1b) RangedData with 6 rows and 2 value columns across 1 space space ranges | site score <character> <iranges> | <factor> <numeric> 1 chrI [ 16, 254] | chr1.cher1 47.5533476 2 chrI [ 2731, 2924] | chr1.cher2 18.4670121 3 chrI [ 5908, 5968] | chr1.cher3 0.5301501 4 chrI [12996, 13159] | chr1.cher4 2.1622454 5 chrI [25214, 25231] | chr1.cher5 0.1064385 6 chrI [29935, 29959] | chr1.cher6 0.5132352 I have a "site" column instead of a "name" column and lack any strand info. These RangedData objects worked fine in your annotatePeakInBatch function. I saw the discussion of this error on the message board in regard to the PeakBatch function but I think that is a slightly different case. Thank you for your help. Noah > sessionInfo() R version 2.11.0 (2010-04-22) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] org.Sc.sgd.db_2.4.1 rtracklayer_1.8.1 [3] RCurl_1.3-1 bitops_1.0-4.1 [5] Starr_1.4.0 affxparser_1.20.0 [7] affy_1.26.0 Ringo_1.12.0 [9] Matrix_0.999375-38 lattice_0.18-5 [11] RColorBrewer_1.0-2 ChIPpeakAnno_1.4.0 [13] limma_3.4.0 org.Hs.eg.db_2.4.1 [15] GO.db_2.4.1 RSQLite_0.8-4 [17] DBI_0.2-5 AnnotationDbi_1.10.0 [19] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.0 [21] GenomicRanges_1.0.1 Biostrings_2.16.0 [23] IRanges_1.6.0 multtest_2.4.0 [25] Biobase_2.8.0 biomaRt_2.4.0 loaded via a namespace (and not attached): [1] affyio_1.16.0 annotate_1.26.0 genefilter_1.30.0 MASS_7.3-5 [5] preprocessCore_1.10.0 pspline_1.0-14 splines_2.11.0 survival_2.35-8 [9] tools_2.11.0 XML_2.8-1 xtable_1.5-6 > On Jun 10, 2010, at 1:53 PM, Zhu, Julie wrote: > Hi Noah, > > Yes, you are right that this is due to the differences between the org.Hs.eg.db and org.Sc.sgd.db. The org.Hs.eg.db is Entrez ID centric while org.Sc.sgd.db is orf centric. It would be nice if all the org.*.*.dbs have similar data structure and mapping. For now, I would suggest call getEnrichedGO function with a list of orfs using the following syntax. You need to first convert the list of Ensembl ID to orfs first. > > enrichedGO.Cse4 <- getEnrichedGO (orfs, feature_id_type="entrez_id", orgAnn="org.Sc.sgd.db", maxP=0.05, multiAdj =TRUE, minGOterm=5, multiAdjMethod="BH") > > Best regards, > > Julie > > > > On 6/10/10 4:15 PM, "Noah Dowell" <noahd@ucla.edu> wrote: > > Hello All, > > I couldn't find a solution to my question in the archives and my attempts have been unsuccessful so hopefully someone has some advice. > > I have analyzed my yeast ChIP-chip tiling array data using Starr and converted my list of chip-enriched regions to RangedData to make use of the peakOverlap and GOenrichment functions in ChIPpeakAnno. The annotatePeakInBatch function has worked nicely but I am stuck with the getEnrichedGO function. I think the problem may be due to differences between the org.Hs.eg.db and org.Sc.sgd.db. The org.Hs.eg.db has a mapping of ENSEMBL gene accession numbers to Entrez Gene identifiers, but the org.Sc.sgd.db completely lacks this and uses a mapping to SGD Gene Identifiers. As far as I can tell the getEnrichedGO function calls for a mapping to Entrez Gene ids thus the error I am showing below. > > Does anyone know of a work around for this? > > Thank you for your help. > > Noah > > > >library(org.Sc.sgd.db) > > > goTest <- getEnrichedGO(annoPeakChr1data, orgAnn = "org.Sc.sgd.db", maxP = 0.01, multiAdj =TRUE, minGOterm = 10, multiAdjMethod = "BH") > Error in get(paste(GOgenome, "ENSEMBL2EG", sep = "")) : > object 'org.Sc.sgdENSEMBL2EG' not found > > ##also tried: > > > goTest <- getEnrichedGO(annoPeakChr1data, orgAnn = "org.Sc.sgd.db", feature_id_type= "ensembl_gene_id", maxP = 0.01, multiAdj =TRUE, minGOterm = 10, multiAdjMethod = "BH") > Error in get(paste(GOgenome, "ENSEMBL2EG", sep = "")) : > object 'org.Sc.sgdENSEMBL2EG' not found > > #### here is what my annotatePeak Object looks like: > > > head(annoPeakChr1data) > RangedData with 6 rows and 9 value columns across 1 space > space ranges | peak strand feature start_position end_position > <character> <iranges> | <character> <character> <character> <numeric> <numeric> > 01 YAL069W I [ 16, 254] | 01 1 YAL069W 335 649 > 02 YAL067W-A I [ 2731, 2924] | 02 1 YAL067W-A 2480 2707 > 06 YAL062W I [29935, 29959] | 06 1 YAL062W 31568 32941 > 07 YAL062W I [30011, 30039] | 07 1 YAL062W 31568 32941 > 08 YAL062W I [31661, 31678] | 08 1 YAL062W 31568 32941 > 09 YAL062W I [31702, 31710] | 09 1 YAL062W 31568 32941 > > > > > > > > > > > sessionInfo() > R version 2.11.0 (2010-04-22) > i386-apple-darwin9.8.0 > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods base > > other attached packages: > [1] org.Sc.sgd.db_2.4.1 rtracklayer_1.8.1 RCurl_1.3-1 bitops_1.0-4.1 > [5] Starr_1.4.0 affxparser_1.20.0 affy_1.26.0 Ringo_1.12.0 > [9] Matrix_0.999375-38 lattice_0.18-5 RColorBrewer_1.0-2 ChIPpeakAnno_1.4.0 > [13] limma_3.4.0 org.Hs.eg.db_2.4.1 GO.db_2.4.1 RSQLite_0.8-4 > [17] DBI_0.2-5 AnnotationDbi_1.10.0 BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.0 > [21] GenomicRanges_1.0.1 Biostrings_2.16.0 IRanges_1.6.0 multtest_2.4.0 > [25] Biobase_2.8.0 biomaRt_2.4.0 > > loaded via a namespace (and not attached): > [1] affyio_1.16.0 annotate_1.26.0 genefilter_1.30.0 MASS_7.3-5 preprocessCore_1.10.0 pspline_1.0-14 > [7] splines_2.11.0 survival_2.35-8 tools_2.11.0 XML_2.8-1 xtable_1.5-6 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]]

ADD REPLY • link 13.9 years ago Noah Dowell ▴ 410

0

Entering edit mode

Hi Noah, Thank you very much for the feedback! Please see my comments to your question below. Best regards, Julie On 6/11/10 8:53 PM, "Noah Dowell" <noahd@ucla.edu> wrote: Thank you Julie and Marc! Your suggestion to use a list of ORFs worked great. This package you have created is excellent and has been of tremendous help in my research. Two questions: 1. Is there a simple subsetting routine or function allowing me to access the specific gene names for each GO term that shows significant enrichment? It would be a nice addition. Please let me know if you are interested in contributing the code. Thanks! 2. An unrelated question regarding the findOverlappingPeaks function. I am getting the following error: Your subsequent email indicated that you have resolved this by using annotatePeakInBatch function. I think the reason you got the following error is that the RangedData does not have strand information. The findOverlapping function in 1.4 version of ChIPpeakAnno requires strand information to work (FYI., version 1.5.4 dropped this requirements). Thanks again for sharing your solutions with us! > tOverlap <- findOverlappingPeaks(rd1, rd1b, maxgap= 50, multiple = TRUE, NameOfPeaks1= "tf1", NameOfPeaks2 = "tf2") Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent In looking at your example RangedData and comparing to my rd1 and rd1b objects I notice obvious differences. Here are the top of my RangedData: > head(rd1) RangedData with 6 rows and 2 value columns across 1 space space ranges | site score <character> <iranges> | <factor> <numeric> 1 chrI [ 16, 295] | chr1.cher1 56.7815947 2 chrI [ 5995, 6027] | chr1.cher2 0.3204913 3 chrI [12996, 13171] | chr1.cher3 3.9056628 4 chrI [25185, 25245] | chr1.cher4 1.4921349 5 chrI [31578, 31824] | chr1.cher5 21.7538769 6 chrI [32967, 33125] | chr1.cher6 14.2653774 > head(rd1b) RangedData with 6 rows and 2 value columns across 1 space space ranges | site score <character> <iranges> | <factor> <numeric> 1 chrI [ 16, 254] | chr1.cher1 47.5533476 2 chrI [ 2731, 2924] | chr1.cher2 18.4670121 3 chrI [ 5908, 5968] | chr1.cher3 0.5301501 4 chrI [12996, 13159] | chr1.cher4 2.1622454 5 chrI [25214, 25231] | chr1.cher5 0.1064385 6 chrI [29935, 29959] | chr1.cher6 0.5132352 I have a "site" column instead of a "name" column and lack any strand info. These RangedData objects worked fine in your annotatePeakInBatch function. I saw the discussion of this error on the message board in regard to the PeakBatch function but I think that is a slightly different case. Thank you for your help. Noah > sessionInfo() R version 2.11.0 (2010-04-22) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] org.Sc.sgd.db_2.4.1 rtracklayer_1.8.1 [3] RCurl_1.3-1 bitops_1.0-4.1 [5] Starr_1.4.0 affxparser_1.20.0 [7] affy_1.26.0 Ringo_1.12.0 [9] Matrix_0.999375-38 lattice_0.18-5 [11] RColorBrewer_1.0-2 ChIPpeakAnno_1.4.0 [13] limma_3.4.0 org.Hs.eg.db_2.4.1 [15] GO.db_2.4.1 RSQLite_0.8-4 [17] DBI_0.2-5 AnnotationDbi_1.10.0 [19] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.0 [21] GenomicRanges_1.0.1 Biostrings_2.16.0 [23] IRanges_1.6.0 multtest_2.4.0 [25] Biobase_2.8.0 biomaRt_2.4.0 loaded via a namespace (and not attached): [1] affyio_1.16.0 annotate_1.26.0 genefilter_1.30.0 MASS_7.3-5 [5] preprocessCore_1.10.0 pspline_1.0-14 splines_2.11.0 survival_2.35-8 [9] tools_2.11.0 XML_2.8-1 xtable_1.5-6 > On Jun 10, 2010, at 1:53 PM, Zhu, Julie wrote: Hi Noah, Yes, you are right that this is due to the differences between the org.Hs.eg.db and org.Sc.sgd.db. The org.Hs.eg.db is Entrez ID centric while org.Sc.sgd.db is orf centric. It would be nice if all the org.*.*.dbs have similar data structure and mapping. For now, I would suggest call getEnrichedGO function with a list of orfs using the following syntax. You need to first convert the list of Ensembl ID to orfs first. enrichedGO.Cse4 <- getEnrichedGO (orfs, feature_id_type="entrez_id", orgAnn="org.Sc.sgd.db", maxP=0.05, multiAdj =TRUE, minGOterm=5, multiAdjMethod="BH") Best regards, Julie On 6/10/10 4:15 PM, "Noah Dowell" <noahd@ucla.edu> wrote: Hello All, I couldn't find a solution to my question in the archives and my attempts have been unsuccessful so hopefully someone has some advice. I have analyzed my yeast ChIP-chip tiling array data using Starr and converted my list of chip-enriched regions to RangedData to make use of the peakOverlap and GOenrichment functions in ChIPpeakAnno. The annotatePeakInBatch function has worked nicely but I am stuck with the getEnrichedGO function. I think the problem may be due to differences between the org.Hs.eg.db and org.Sc.sgd.db. The org.Hs.eg.db has a mapping of ENSEMBL gene accession numbers to Entrez Gene identifiers, but the org.Sc.sgd.db completely lacks this and uses a mapping to SGD Gene Identifiers. As far as I can tell the getEnrichedGO function calls for a mapping to Entrez Gene ids thus the error I am showing below. Does anyone know of a work around for this? Thank you for your help. Noah >library(org.Sc.sgd.db) > goTest <- getEnrichedGO(annoPeakChr1data, orgAnn = "org.Sc.sgd.db", maxP = 0.01, multiAdj =TRUE, minGOterm = 10, multiAdjMethod = "BH") Error in get(paste(GOgenome, "ENSEMBL2EG", sep = "")) : object 'org.Sc.sgdENSEMBL2EG' not found ##also tried: > goTest <- getEnrichedGO(annoPeakChr1data, orgAnn = "org.Sc.sgd.db", feature_id_type= "ensembl_gene_id", maxP = 0.01, multiAdj =TRUE, minGOterm = 10, multiAdjMethod = "BH") Error in get(paste(GOgenome, "ENSEMBL2EG", sep = "")) : object 'org.Sc.sgdENSEMBL2EG' not found #### here is what my annotatePeak Object looks like: > head(annoPeakChr1data) RangedData with 6 rows and 9 value columns across 1 space space ranges | peak strand feature start_position end_position <character> <iranges> | <character> <character> <character> <numeric> <numeric> 01 YAL069W I [ 16, 254] | 01 1 YAL069W 335 649 02 YAL067W-A I [ 2731, 2924] | 02 1 YAL067W-A 2480 2707 06 YAL062W I [29935, 29959] | 06 1 YAL062W 31568 32941 07 YAL062W I [30011, 30039] | 07 1 YAL062W 31568 32941 08 YAL062W I [31661, 31678] | 08 1 YAL062W 31568 32941 09 YAL062W I [31702, 31710] | 09 1 YAL062W 31568 32941 > sessionInfo() R version 2.11.0 (2010-04-22) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] org.Sc.sgd.db_2.4.1 rtracklayer_1.8.1 RCurl_1.3-1 bitops_1.0-4.1 [5] Starr_1.4.0 affxparser_1.20.0 affy_1.26.0 Ringo_1.12.0 [9] Matrix_0.999375-38 lattice_0.18-5 RColorBrewer_1.0-2 ChIPpeakAnno_1.4.0 [13] limma_3.4.0 org.Hs.eg.db_2.4.1 GO.db_2.4.1 RSQLite_0.8-4 [17] DBI_0.2-5 AnnotationDbi_1.10.0 BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.0 [21] GenomicRanges_1.0.1 Biostrings_2.16.0 IRanges_1.6.0 multtest_2.4.0 [25] Biobase_2.8.0 biomaRt_2.4.0 loaded via a namespace (and not attached): [1] affyio_1.16.0 annotate_1.26.0 genefilter_1.30.0 MASS_7.3-5 preprocessCore_1.10.0 pspline_1.0-14 [7] splines_2.11.0 survival_2.35-8 tools_2.11.0 XML_2.8-1 xtable_1.5-6 _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 13.9 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Hello Julie, Thank you for your help and for writing this excellent package. I am a complete amateur when it comes to writing code so I think contributing would be over my head. I did find a non-elegant solution to grabbing the unique and overlapping genes that map to two peak datasets. I will post that routine below in case anyone is following along. # These functions should work for anyone who wants to grab the gene lists that are unique or # overlap between two annotated peak datasets. # TF1 and TF2 are RangedData objects. # x is a local Annotation object created with the function getAnnotation. TF1annoPeakTotal <- annotatePeakInBatch(TF1, AnnotationData = x) TF2annoPeakTotal <- annotatePeakInBatch(TF2, AnnotationData = x) # Column 8 contains the features that map to specific peaks. TF1peakGenes <- TF1annoPeakTotal[,8] TF2peakGenes <- TF2annoPeakTotal[,8] # This step is probably not needed because the intersect function only considers unique entries. TF1peakGenes <- unique(TF1peakGenes) TF2peakGenes <- unique(TF2peakGenes) TF1tf2GeneIntersect <- intersect(TF1peakGenes, TF2peakGenes) TF1OnlyGenes <- setdiff(TF1peakGenes, TF2peakGenes) TF2OnlyGenes <- setdiff(TF2peakGenes, TF1peakGenes) Have a great week. noah On Jun 13, 2010, at 7:53 AM, Zhu, Julie wrote: > Hi Noah, > > Thank you very much for the feedback! > > Please see my comments to your question below. > > Best regards, > > Julie > > > On 6/11/10 8:53 PM, "Noah Dowell" <noahd@ucla.edu> wrote: > >> Thank you Julie and Marc! >> >> Your suggestion to use a list of ORFs worked great. >> >> This package you have created is excellent and has been of tremendous help in my research. >> >> Two questions: >> >> Is there a simple subsetting routine or function allowing me to access the specific gene names for each GO term that shows significant enrichment? > > It would be a nice addition. Please let me know if you are interested in contributing the code. Thanks! >> >> 2. An unrelated question regarding the findOverlappingPeaks function. I am getting the following error: >> >> Your subsequent email indicated that you have resolved this by using annotatePeakInBatch function. I think the reason you got the following error is that the RangedData does not have strand information. The findOverlapping function in 1.4 version of ChIPpeakAnno requires strand information to work (FYI., version 1.5.4 dropped this requirements). >> >> Thanks again for sharing your solutions with us! >> >> >> > tOverlap <- findOverlappingPeaks(rd1, rd1b, maxgap= 50, multiple = TRUE, NameOfPeaks1= "tf1", NameOfPeaks2 = "tf2") >> Error in dimnames(x) <- dn : >> length of 'dimnames' [2] not equal to array extent >> >> In looking at your example RangedData and comparing to my rd1 and rd1b objects I notice obvious differences. Here are the top of my RangedData: >> >> > head(rd1) >> RangedData with 6 rows and 2 value columns across 1 space >> space ranges | site score >> <character> <iranges> | <factor> <numeric> >> 1 chrI [ 16, 295] | chr1.cher1 56.7815947 >> 2 chrI [ 5995, 6027] | chr1.cher2 0.3204913 >> 3 chrI [12996, 13171] | chr1.cher3 3.9056628 >> 4 chrI [25185, 25245] | chr1.cher4 1.4921349 >> 5 chrI [31578, 31824] | chr1.cher5 21.7538769 >> 6 chrI [32967, 33125] | chr1.cher6 14.2653774 >> > head(rd1b) >> RangedData with 6 rows and 2 value columns across 1 space >> space ranges | site score >> <character> <iranges> | <factor> <numeric> >> 1 chrI [ 16, 254] | chr1.cher1 47.5533476 >> 2 chrI [ 2731, 2924] | chr1.cher2 18.4670121 >> 3 chrI [ 5908, 5968] | chr1.cher3 0.5301501 >> 4 chrI [12996, 13159] | chr1.cher4 2.1622454 >> 5 chrI [25214, 25231] | chr1.cher5 0.1064385 >> 6 chrI [29935, 29959] | chr1.cher6 0.5132352 >> >> >> >> I have a "site" column instead of a "name" column and lack any strand info. These RangedData objects worked fine in your annotatePeakInBatch function. I saw the discussion of this error on the message board in regard to the PeakBatch function but I think that is a slightly different case. >> >> Thank you for your help. >> >> >> Noah >> >> >> > sessionInfo() >> R version 2.11.0 (2010-04-22) >> i386-apple-darwin9.8.0 >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] grid stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] org.Sc.sgd.db_2.4.1 rtracklayer_1.8.1 >> [3] RCurl_1.3-1 bitops_1.0-4.1 >> [5] Starr_1.4.0 affxparser_1.20.0 >> [7] affy_1.26.0 Ringo_1.12.0 >> [9] Matrix_0.999375-38 lattice_0.18-5 >> [11] RColorBrewer_1.0-2 ChIPpeakAnno_1.4.0 >> [13] limma_3.4.0 org.Hs.eg.db_2.4.1 >> [15] GO.db_2.4.1 RSQLite_0.8-4 >> [17] DBI_0.2-5 AnnotationDbi_1.10.0 >> [19] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.0 >> [21] GenomicRanges_1.0.1 Biostrings_2.16.0 >> [23] IRanges_1.6.0 multtest_2.4.0 >> [25] Biobase_2.8.0 biomaRt_2.4.0 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.16.0 annotate_1.26.0 genefilter_1.30.0 MASS_7.3-5 >> [5] preprocessCore_1.10.0 pspline_1.0-14 splines_2.11.0 survival_2.35-8 >> [9] tools_2.11.0 XML_2.8-1 xtable_1.5-6 >> > >> >> >> >> >> >> >> On Jun 10, 2010, at 1:53 PM, Zhu, Julie wrote: >> >>> Hi Noah, >>> >>> Yes, you are right that this is due to the differences between the org.Hs.eg.db and org.Sc.sgd.db. The org.Hs.eg.db is Entrez ID centric while org.Sc.sgd.db is orf centric. It would be nice if all the org.*.*.dbs have similar data structure and mapping. For now, I would suggest call getEnrichedGO function with a list of orfs using the following syntax. You need to first convert the list of Ensembl ID to orfs first. >>> >>> enrichedGO.Cse4 <- getEnrichedGO (orfs, feature_id_type="entrez_id", orgAnn="org.Sc.sgd.db", maxP=0.05, multiAdj =TRUE, minGOterm=5, multiAdjMethod="BH") >>> >>> Best regards, >>> >>> Julie >>> >>> >>> >>> On 6/10/10 4:15 PM, "Noah Dowell" <noahd@ucla.edu> wrote: >>> >>>> Hello All, >>>> >>>> I couldn't find a solution to my question in the archives and my attempts have been unsuccessful so hopefully someone has some advice. >>>> >>>> I have analyzed my yeast ChIP-chip tiling array data using Starr and converted my list of chip-enriched regions to RangedData to make use of the peakOverlap and GOenrichment functions in ChIPpeakAnno. The annotatePeakInBatch function has worked nicely but I am stuck with the getEnrichedGO function. I think the problem may be due to differences between the org.Hs.eg.db and org.Sc.sgd.db. The org.Hs.eg.db has a mapping of ENSEMBL gene accession numbers to Entrez Gene identifiers, but the org.Sc.sgd.db completely lacks this and uses a mapping to SGD Gene Identifiers. As far as I can tell the getEnrichedGO function calls for a mapping to Entrez Gene ids thus the error I am showing below. >>>> >>>> Does anyone know of a work around for this? >>>> >>>> Thank you for your help. >>>> >>>> Noah >>>> >>>> >>>> >library(org.Sc.sgd.db) >>>> >>>> > goTest <- getEnrichedGO(annoPeakChr1data, orgAnn = "org.Sc.sgd.db", maxP = 0.01, multiAdj =TRUE, minGOterm = 10, multiAdjMethod = "BH") >>>> Error in get(paste(GOgenome, "ENSEMBL2EG", sep = "")) : >>>> object 'org.Sc.sgdENSEMBL2EG' not found >>>> >>>> ##also tried: >>>> >>>> > goTest <- getEnrichedGO(annoPeakChr1data, orgAnn = "org.Sc.sgd.db", feature_id_type= "ensembl_gene_id", maxP = 0.01, multiAdj =TRUE, minGOterm = 10, multiAdjMethod = "BH") >>>> Error in get(paste(GOgenome, "ENSEMBL2EG", sep = "")) : >>>> object 'org.Sc.sgdENSEMBL2EG' not found >>>> >>>> #### here is what my annotatePeak Object looks like: >>>> >>>> > head(annoPeakChr1data) >>>> RangedData with 6 rows and 9 value columns across 1 space >>>> space ranges | peak strand feature start_position end_position >>>> <character> <iranges> | <character> <character> <character> <numeric> <numeric> >>>> 01 YAL069W I [ 16, 254] | 01 1 YAL069W 335 649 >>>> 02 YAL067W-A I [ 2731, 2924] | 02 1 YAL067W-A 2480 2707 >>>> 06 YAL062W I [29935, 29959] | 06 1 YAL062W 31568 32941 >>>> 07 YAL062W I [30011, 30039] | 07 1 YAL062W 31568 32941 >>>> 08 YAL062W I [31661, 31678] | 08 1 YAL062W 31568 32941 >>>> 09 YAL062W I [31702, 31710] | 09 1 YAL062W 31568 32941 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> > sessionInfo() >>>> R version 2.11.0 (2010-04-22) >>>> i386-apple-darwin9.8.0 >>>> >>>> locale: >>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>>> >>>> attached base packages: >>>> [1] grid stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] org.Sc.sgd.db_2.4.1 rtracklayer_1.8.1 RCurl_1.3-1 bitops_1.0-4.1 >>>> [5] Starr_1.4.0 affxparser_1.20.0 affy_1.26.0 Ringo_1.12.0 >>>> [9] Matrix_0.999375-38 lattice_0.18-5 RColorBrewer_1.0-2 ChIPpeakAnno_1.4.0 >>>> [13] limma_3.4.0 org.Hs.eg.db_2.4.1 GO.db_2.4.1 RSQLite_0.8-4 >>>> [17] DBI_0.2-5 AnnotationDbi_1.10.0 BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.0 >>>> [21] GenomicRanges_1.0.1 Biostrings_2.16.0 IRanges_1.6.0 multtest_2.4.0 >>>> [25] Biobase_2.8.0 biomaRt_2.4.0 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] affyio_1.16.0 annotate_1.26.0 genefilter_1.30.0 MASS_7.3-5 preprocessCore_1.10.0 pspline_1.0-14 >>>> [7] splines_2.11.0 survival_2.35-8 tools_2.11.0 XML_2.8-1 xtable_1.5-6 >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> >> >> [[alternative HTML version deleted]]

ADD REPLY • link 13.9 years ago Noah Dowell ▴ 410

0

Entering edit mode

Hi Noah, Thank you very much for the positive feedback and for sharing your solution! Just want to point out that setdiff and intersect will remove redundant elements automatically. Best regards, Julie On 6/14/10 2:04 PM, "Noah Dowell" <noahd@ucla.edu> wrote: Hello Julie, Thank you for your help and for writing this excellent package. I am a complete amateur when it comes to writing code so I think contributing would be over my head. I did find a non-elegant solution to grabbing the unique and overlapping genes that map to two peak datasets. I will post that routine below in case anyone is following along. # These functions should work for anyone who wants to grab the gene lists that are unique or # overlap between two annotated peak datasets. # TF1 and TF2 are RangedData objects. # x is a local Annotation object created with the function getAnnotation. TF1annoPeakTotal <- annotatePeakInBatch(TF1, AnnotationData = x) TF2annoPeakTotal <- annotatePeakInBatch(TF2, AnnotationData = x) # Column 8 contains the features that map to specific peaks. TF1peakGenes <- TF1annoPeakTotal[,8] TF2peakGenes <- TF2annoPeakTotal[,8] # This step is probably not needed because the intersect function only considers unique entries. TF1peakGenes <- unique(TF1peakGenes) TF2peakGenes <- unique(TF2peakGenes) TF1tf2GeneIntersect <- intersect(TF1peakGenes, TF2peakGenes) TF1OnlyGenes <- setdiff(TF1peakGenes, TF2peakGenes) TF2OnlyGenes <- setdiff(TF2peakGenes, TF1peakGenes) Have a great week. noah On Jun 13, 2010, at 7:53 AM, Zhu, Julie wrote: Hi Noah, Thank you very much for the feedback! Please see my comments to your question below. Best regards, Julie On 6/11/10 8:53 PM, "Noah Dowell" <noahd@ucla.edu <x-msg:="" 29="" noahd@ucla.edu=""> > wrote: Thank you Julie and Marc! Your suggestion to use a list of ORFs worked great. This package you have created is excellent and has been of tremendous help in my research. Two questions: 1. Is there a simple subsetting routine or function allowing me to access the specific gene names for each GO term that shows significant enrichment? It would be a nice addition. Please let me know if you are interested in contributing the code. Thanks! 2. An unrelated question regarding the findOverlappingPeaks function. I am getting the following error: Your subsequent email indicated that you have resolved this by using annotatePeakInBatch function. I think the reason you got the following error is that the RangedData does not have strand information. The findOverlapping function in 1.4 version of ChIPpeakAnno requires strand information to work (FYI., version 1.5.4 dropped this requirements). Thanks again for sharing your solutions with us! > tOverlap <- findOverlappingPeaks(rd1, rd1b, maxgap= 50, multiple = TRUE, NameOfPeaks1= "tf1", NameOfPeaks2 = "tf2") Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent In looking at your example RangedData and comparing to my rd1 and rd1b objects I notice obvious differences. Here are the top of my RangedData: > head(rd1) RangedData with 6 rows and 2 value columns across 1 space space ranges | site score <character> <iranges> | <factor> <numeric> 1 chrI [ 16, 295] | chr1.cher1 56.7815947 2 chrI [ 5995, 6027] | chr1.cher2 0.3204913 3 chrI [12996, 13171] | chr1.cher3 3.9056628 4 chrI [25185, 25245] | chr1.cher4 1.4921349 5 chrI [31578, 31824] | chr1.cher5 21.7538769 6 chrI [32967, 33125] | chr1.cher6 14.2653774 > head(rd1b) RangedData with 6 rows and 2 value columns across 1 space space ranges | site score <character> <iranges> | <factor> <numeric> 1 chrI [ 16, 254] | chr1.cher1 47.5533476 2 chrI [ 2731, 2924] | chr1.cher2 18.4670121 3 chrI [ 5908, 5968] | chr1.cher3 0.5301501 4 chrI [12996, 13159] | chr1.cher4 2.1622454 5 chrI [25214, 25231] | chr1.cher5 0.1064385 6 chrI [29935, 29959] | chr1.cher6 0.5132352 I have a "site" column instead of a "name" column and lack any strand info. These RangedData objects worked fine in your annotatePeakInBatch function. I saw the discussion of this error on the message board in regard to the PeakBatch function but I think that is a slightly different case. Thank you for your help. Noah > sessionInfo() R version 2.11.0 (2010-04-22) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] org.Sc.sgd.db_2.4.1 rtracklayer_1.8.1 [3] RCurl_1.3-1 bitops_1.0-4.1 [5] Starr_1.4.0 affxparser_1.20.0 [7] affy_1.26.0 Ringo_1.12.0 [9] Matrix_0.999375-38 lattice_0.18-5 [11] RColorBrewer_1.0-2 ChIPpeakAnno_1.4.0 [13] limma_3.4.0 org.Hs.eg.db_2.4.1 [15] GO.db_2.4.1 RSQLite_0.8-4 [17] DBI_0.2-5 AnnotationDbi_1.10.0 [19] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.0 [21] GenomicRanges_1.0.1 Biostrings_2.16.0 [23] IRanges_1.6.0 multtest_2.4.0 [25] Biobase_2.8.0 biomaRt_2.4.0 loaded via a namespace (and not attached): [1] affyio_1.16.0 annotate_1.26.0 genefilter_1.30.0 MASS_7.3-5 [5] preprocessCore_1.10.0 pspline_1.0-14 splines_2.11.0 survival_2.35-8 [9] tools_2.11.0 XML_2.8-1 xtable_1.5-6 > On Jun 10, 2010, at 1:53 PM, Zhu, Julie wrote: Hi Noah, Yes, you are right that this is due to the differences between the org.Hs.eg.db and org.Sc.sgd.db. The org.Hs.eg.db is Entrez ID centric while org.Sc.sgd.db is orf centric. It would be nice if all the org.*.*.dbs have similar data structure and mapping. For now, I would suggest call getEnrichedGO function with a list of orfs using the following syntax. You need to first convert the list of Ensembl ID to orfs first. enrichedGO.Cse4 <- getEnrichedGO (orfs, feature_id_type="entrez_id", orgAnn="org.Sc.sgd.db", maxP=0.05, multiAdj =TRUE, minGOterm=5, multiAdjMethod="BH") Best regards, Julie On 6/10/10 4:15 PM, "Noah Dowell" <noahd@ucla.edu <x-msg:="" 29="" noahd@ucla.edu=""> > wrote: Hello All, I couldn't find a solution to my question in the archives and my attempts have been unsuccessful so hopefully someone has some advice. I have analyzed my yeast ChIP-chip tiling array data using Starr and converted my list of chip-enriched regions to RangedData to make use of the peakOverlap and GOenrichment functions in ChIPpeakAnno. The annotatePeakInBatch function has worked nicely but I am stuck with the getEnrichedGO function. I think the problem may be due to differences between the org.Hs.eg.db and org.Sc.sgd.db. The org.Hs.eg.db has a mapping of ENSEMBL gene accession numbers to Entrez Gene identifiers, but the org.Sc.sgd.db completely lacks this and uses a mapping to SGD Gene Identifiers. As far as I can tell the getEnrichedGO function calls for a mapping to Entrez Gene ids thus the error I am showing below. Does anyone know of a work around for this? Thank you for your help. Noah >library(org.Sc.sgd.db) > goTest <- getEnrichedGO(annoPeakChr1data, orgAnn = "org.Sc.sgd.db", maxP = 0.01, multiAdj =TRUE, minGOterm = 10, multiAdjMethod = "BH") Error in get(paste(GOgenome, "ENSEMBL2EG", sep = "")) : object 'org.Sc.sgdENSEMBL2EG' not found ##also tried: > goTest <- getEnrichedGO(annoPeakChr1data, orgAnn = "org.Sc.sgd.db", feature_id_type= "ensembl_gene_id", maxP = 0.01, multiAdj =TRUE, minGOterm = 10, multiAdjMethod = "BH") Error in get(paste(GOgenome, "ENSEMBL2EG", sep = "")) : object 'org.Sc.sgdENSEMBL2EG' not found #### here is what my annotatePeak Object looks like: > head(annoPeakChr1data) RangedData with 6 rows and 9 value columns across 1 space space ranges | peak strand feature start_position end_position <character> <iranges> | <character> <character> <character> <numeric> <numeric> 01 YAL069W I [ 16, 254] | 01 1 YAL069W 335 649 02 YAL067W-A I [ 2731, 2924] | 02 1 YAL067W-A 2480 2707 06 YAL062W I [29935, 29959] | 06 1 YAL062W 31568 32941 07 YAL062W I [30011, 30039] | 07 1 YAL062W 31568 32941 08 YAL062W I [31661, 31678] | 08 1 YAL062W 31568 32941 09 YAL062W I [31702, 31710] | 09 1 YAL062W 31568 32941 > sessionInfo() R version 2.11.0 (2010-04-22) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] org.Sc.sgd.db_2.4.1 rtracklayer_1.8.1 RCurl_1.3-1 bitops_1.0-4.1 [5] Starr_1.4.0 affxparser_1.20.0 affy_1.26.0 Ringo_1.12.0 [9] Matrix_0.999375-38 lattice_0.18-5 RColorBrewer_1.0-2 ChIPpeakAnno_1.4.0 [13] limma_3.4.0 org.Hs.eg.db_2.4.1 GO.db_2.4.1 RSQLite_0.8-4 [17] DBI_0.2-5 AnnotationDbi_1.10.0 BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.0 [21] GenomicRanges_1.0.1 Biostrings_2.16.0 IRanges_1.6.0 multtest_2.4.0 [25] Biobase_2.8.0 biomaRt_2.4.0 loaded via a namespace (and not attached): [1] affyio_1.16.0 annotate_1.26.0 genefilter_1.30.0 MASS_7.3-5 preprocessCore_1.10.0 pspline_1.0-14 [7] splines_2.11.0 survival_2.35-8 tools_2.11.0 XML_2.8-1 xtable_1.5-6 _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch <x-msg: 29="" bioconductor@stat.math.ethz.ch=""> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 13.9 years ago Julie Zhu ★ 4.3k

Login before adding your answer.