Question

ChIPpeakAnno: Inquiry for Worm Analysis

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 5 months ago

United States

Dear Carlos, The code looks fine except the following line. You would want to use MyPeakList instead of the first 6 records as MyPeakList[1:6,]. AnnotatedPeak = annotatePeakInBatch(MyPeakList[1:6,], AnnotationData=annotation.cds) Please let me know if this resolves the issue. Thanks! Best regards, Julie On 3/1/12 12:13 PM, "Carlos Araya" <claraya@stanford.edu> wrote: Dear Julie, I'm ready to perform GO enrichment analysis on ~292 ChIP-seq datasets and would like to use your ChIPpeakAnno package. However, I've encountered problems that I believe are due to differences in the gene/feature names used. Here is an excerpt of my code: library(ChIPpeakAnno) library(org.Ce.eg.db) # load peak calls and genomic annotations, filter for gene (CDS) annotations: peaks.bed = read.table("YL474_HPL-2_L1_yale_stn_1x2_peaks.bed", header=FALSE) annotation = read.table("in2shape_wbComp.bed", header=TRUE) annotation.cds = annotation[annotation$feature.type == "cds",] summary(annotation.pct) chrm start end feature score strand name gene chrI :4593 Min. : 58 Min. : 779 WBGene00000001: 0 .:30163 -:14976 2L52.1 : 0 WBGene00000001: 0 chrII :5195 1st Qu.: 4687094 1st Qu.: 4688288 WBGene00000002: 0 .: 0 2RSSE.1: 0 WBGene00000002: 0 chrIII:4345 Median : 8229719 Median : 8235458 WBGene00000003: 0 +:15187 2RSSE.2: 0 WBGene00000003: 0 chrIV :5126 Mean : 8415951 Mean : 8419438 WBGene00000004: 0 4R79.2a: 0 WBGene00000004: 0 chrM : 12 3rd Qu.:11924503 3rd Qu.:11929206 WBGene00000005: 0 4R79.2b: 0 WBGene00000005: 0 chrV :6763 Max. :20915455 Max. :20922707 (Other) : 0 (Other): 0 (Other) : 0 chrX :4129 NA's :30163 NA's :30163 NA's :30163 # convert bed tables into ranged data: MyPeakList = BED2RangedData(peaks.bed) annotation.cds = BED2RangedData(annotation.cds) # annotate peaks: AnnotatedPeak = annotatePeakInBatch(MyPeakList[1:6,], AnnotationData=annotation.cds) as.data.frame(AnnotatedPeak) space start end width names peak strand feature start_position end_position insideFeature distancetoFeature shortestDistance 1 I 16921 17165 245 MACS_peak_2 00006 MACS_peak_2 + 00006 11618 16804 downstream 5303 117 2 I 110632 110850 219 MACS_peak_4 00018 MACS_peak_4 + 00018 111036 112316 upstream -404 186 3 I 536767 536967 201 MACS_peak_6 00070 MACS_peak_6 + 00070 537125 541634 upstream -358 158 4 I 3680 4064 385 MACS_peak_1 02345 MACS_peak_1 - 02345 4221 10230 downstream 6550 157 5 I 108389 108741 353 MACS_peak_3 02354 MACS_peak_3 - 02354 101213 108161 upstream -228 228 6 I 315156 315412 257 MACS_peak_5 02372 MACS_peak_5 - 02372 310981 314992 upstream -164 164 fromOverlappingOrNearest 1 NearestStart 2 NearestStart 3 NearestStart 4 NearestStart 5 NearestStart 6 NearestStart # execute enrichment analysis: enrichedGO = getEnrichedGO(AnnotatedPeak, orgAnn = "org.Ce.eg.db", maxP = 0.01, multiAdj = TRUE, minGOterm = 10, multiAdjMethod = "BH" ) Error in if (class(go.ids) != "matrix" | dim(go.ids)[2] < 4) { : argument is of length zero I've highlighted in blue the outputs of distinct commands and in red the error that I obtain upon attempting GO analysis. Any ideas as to how I should annotate my features (which codes I should use) or how to prepare the objects in R for the enrichment analysis? Please note that in my annotation BED file the WormBase identifiers for each gene are also available (under the "gene" column). I noticed there is a "org.Ce.egWORMBASE" object in the "org.Ce.eg.db" library which may be useful, but I do not understand how to use it convert between gene IDs. Kind regards, Carlos L. Araya, PhD Stanford University | Department of Genetics 300 Pasteur Dr. #M348 | Stanford, CA 94305 m: 206.661.3648 | e: claraya@stanford.edu [[alternative HTML version deleted]]

Annotation GO annotate convert ChIPpeakAnno Annotation GO annotate convert ChIPpeakAnno • 1.0k views

ADD COMMENT • link 12.2 years ago • updated 12.1 years ago Julie Zhu ★ 4.3k

score 0 · Answer 1 · 2012-03-01

Dear Carlo, Thanks for sharing the working code! Please do not hesitate to contact me and cc bioconductor if you have further questions. Best regards, Julie On 3/1/12 1:09 PM, "Carlos Araya" <claraya@stanford.edu> wrote: Dear Julie, I've made the change that you recommended and also changed the source of the annotations as below. This works now! mart = useMart(biomart="ensembl", dataset="celegans_gene_ensembl") mart.TSS = getAnnotation(mart, featureType = "TSS") MyPeakList = BED2RangedData(peaks.bed) AnnotatedPeak = annotatePeakInBatch(MyPeakList, AnnotationData=mart.TSS) enrichedGO = getEnrichedGO(AnnotatedPeak, orgAnn = "org.Ce.eg.db", maxP = 0.01, multiAdj = TRUE, minGOterm = 10, multiAdjMethod = "BH" ) Thanks! I must admit though; I'll probably have more questions later. =] Carlos L. Araya, PhD Stanford University | Department of Genetics 300 Pasteur Dr. #M348 | Stanford, CA 94305 m: 206.661.3648 | e: claraya@stanford.edu On Thu, Mar 1, 2012 at 9:31 AM, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu> wrote: Dear Carlos, The code looks fine except the following line. You would want to use MyPeakList instead of the first 6 records as MyPeakList[1:6,]. AnnotatedPeak = annotatePeakInBatch(MyPeakList[1:6,], AnnotationData=annotation.cds) Please let me know if this resolves the issue. Thanks! Best regards, Julie On 3/1/12 12:13 PM, "Carlos Araya" <claraya@stanford.edu <http:="" claraya@stanford.edu=""> > wrote: Dear Julie, I'm ready to perform GO enrichment analysis on ~292 ChIP-seq datasets and would like to use your ChIPpeakAnno package. However, I've encountered problems that I believe are due to differences in the gene/feature names used. Here is an excerpt of my code: library(ChIPpeakAnno) library(org.Ce.eg.db) # load peak calls and genomic annotations, filter for gene (CDS) annotations: peaks.bed = read.table("YL474_HPL-2_L1_yale_stn_1x2_peaks.bed", header=FALSE) annotation = read.table("in2shape_wbComp.bed", header=TRUE) annotation.cds = annotation[annotation$feature.type == "cds",] summary(annotation.pct) chrm start end feature score strand name gene chrI :4593 Min. : 58 Min. : 779 WBGene00000001: 0 .:30163 -:14976 2L52.1 : 0 WBGene00000001: 0 chrII :5195 1st Qu.: 4687094 1st Qu.: 4688288 WBGene00000002: 0 .: 0 2RSSE.1: 0 WBGene00000002: 0 chrIII:4345 Median : 8229719 Median : 8235458 WBGene00000003: 0 +:15187 2RSSE.2: 0 WBGene00000003: 0 chrIV :5126 Mean : 8415951 Mean : 8419438 WBGene00000004: 0 4R79.2a: 0 WBGene00000004: 0 chrM : 12 3rd Qu.:11924503 3rd Qu.:11929206 WBGene00000005: 0 4R79.2b: 0 WBGene00000005: 0 chrV :6763 Max. :20915455 Max. :20922707 (Other) : 0 (Other): 0 (Other) : 0 chrX :4129 NA's :30163 NA's :30163 NA's :30163 # convert bed tables into ranged data: MyPeakList = BED2RangedData(peaks.bed) annotation.cds = BED2RangedData(annotation.cds) # annotate peaks: AnnotatedPeak = annotatePeakInBatch(MyPeakList[1:6,], AnnotationData=annotation.cds) as.data.frame(AnnotatedPeak) space start end width names peak strand feature start_position end_position insideFeature distancetoFeature shortestDistance 1 I 16921 17165 245 MACS_peak_2 00006 MACS_peak_2 + 00006 11618 16804 downstream 5303 117 2 I 110632 110850 219 MACS_peak_4 00018 MACS_peak_4 + 00018 111036 112316 upstream -404 186 3 I 536767 536967 201 MACS_peak_6 00070 MACS_peak_6 + 00070 537125 541634 upstream -358 158 4 I 3680 4064 385 MACS_peak_1 02345 MACS_peak_1 - 02345 4221 10230 downstream 6550 157 5 I 108389 108741 353 MACS_peak_3 02354 MACS_peak_3 - 02354 101213 108161 upstream -228 228 6 I 315156 315412 257 MACS_peak_5 02372 MACS_peak_5 - 02372 310981 314992 upstream -164 164 fromOverlappingOrNearest 1 NearestStart 2 NearestStart 3 NearestStart 4 NearestStart 5 NearestStart 6 NearestStart # execute enrichment analysis: enrichedGO = getEnrichedGO(AnnotatedPeak, orgAnn = "org.Ce.eg.db", maxP = 0.01, multiAdj = TRUE, minGOterm = 10, multiAdjMethod = "BH" ) Error in if (class(go.ids) != "matrix" | dim(go.ids)[2] < 4) { : argument is of length zero I've highlighted in blue the outputs of distinct commands and in red the error that I obtain upon attempting GO analysis. Any ideas as to how I should annotate my features (which codes I should use) or how to prepare the objects in R for the enrichment analysis? Please note that in my annotation BED file the WormBase identifiers for each gene are also available (under the "gene" column). I noticed there is a "org.Ce.egWORMBASE" object in the "org.Ce.eg.db" library which may be useful, but I do not understand how to use it convert between gene IDs. Kind regards, Carlos L. Araya, PhD Stanford University | Department of Genetics 300 Pasteur Dr. #M348 | Stanford, CA 94305 m: 206.661.3648 <tel:206.661.3648> | e: claraya@stanford.edu <http: claraya@stanford.edu=""> [[alternative HTML version deleted]]

score 0 · Answer 2 · 2012-03-30

Dear Carlos, That is a very good question. Currently It does not have the capability to pass in a level parameter. However, you could filter the results after the analysis is done before filtering the p-value or adjusting the p-value. In the future, we could add level information to the result set for quick filtering. Best regards, Julie On 3/30/12 12:38 PM, "Carlos Araya" <claraya@stanford.edu> wrote: Dear Lihua, I hope this email finds you well. I am wondering whether there is a way to restrict broad/high-level and poorly-informative GO terms such as "intracellular part" in ChIPpeakAnno? Kind regards, Carlos L. Araya, PhD Stanford University | Department of Genetics 300 Pasteur Dr. #M348 | Stanford, CA 94305 m: 206.661.3648 | e: claraya@stanford.edu On Thu, Mar 1, 2012 at 10:21 AM, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu> wrote: Dear Carlo, Thanks for sharing the working code! Please do not hesitate to contact me and cc bioconductor if you have further questions. Best regards, Julie On 3/1/12 1:09 PM, "Carlos Araya" <claraya@stanford.edu <http:="" claraya@stanford.edu=""> > wrote: Dear Julie, I've made the change that you recommended and also changed the source of the annotations as below. This works now! mart = useMart(biomart="ensembl", dataset="celegans_gene_ensembl") mart.TSS = getAnnotation(mart, featureType = "TSS") MyPeakList = BED2RangedData(peaks.bed) AnnotatedPeak = annotatePeakInBatch(MyPeakList, AnnotationData=mart.TSS) enrichedGO = getEnrichedGO(AnnotatedPeak, orgAnn = "org.Ce.eg.db", maxP = 0.01, multiAdj = TRUE, minGOterm = 10, multiAdjMethod = "BH" ) Thanks! I must admit though; I'll probably have more questions later. =] Carlos L. Araya, PhD Stanford University | Department of Genetics 300 Pasteur Dr. #M348 | Stanford, CA 94305 m: 206.661.3648 <tel:206.661.3648> | e: claraya@stanford.edu <http: claraya@stanford.edu=""> On Thu, Mar 1, 2012 at 9:31 AM, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu <http:="" julie.zhu@umassmed.edu=""> > wrote: Dear Carlos, The code looks fine except the following line. You would want to use MyPeakList instead of the first 6 records as MyPeakList[1:6,]. AnnotatedPeak = annotatePeakInBatch(MyPeakList[1:6,], AnnotationData=annotation.cds) Please let me know if this resolves the issue. Thanks! Best regards, Julie On 3/1/12 12:13 PM, "Carlos Araya" <claraya@stanford.edu <http:="" claraya@stanford.edu=""> <http: claraya@stanford.edu=""> > wrote: Dear Julie, I'm ready to perform GO enrichment analysis on ~292 ChIP-seq datasets and would like to use your ChIPpeakAnno package. However, I've encountered problems that I believe are due to differences in the gene/feature names used. Here is an excerpt of my code: library(ChIPpeakAnno) library(org.Ce.eg.db) # load peak calls and genomic annotations, filter for gene (CDS) annotations: peaks.bed = read.table("YL474_HPL-2_L1_yale_stn_1x2_peaks.bed", header=FALSE) annotation = read.table("in2shape_wbComp.bed", header=TRUE) annotation.cds = annotation[annotation$feature.type == "cds",] summary(annotation.pct) chrm start end feature score strand name gene chrI :4593 Min. : 58 Min. : 779 WBGene00000001: 0 .:30163 -:14976 2L52.1 : 0 WBGene00000001: 0 chrII :5195 1st Qu.: 4687094 1st Qu.: 4688288 WBGene00000002: 0 .: 0 2RSSE.1: 0 WBGene00000002: 0 chrIII:4345 Median : 8229719 Median : 8235458 WBGene00000003: 0 +:15187 2RSSE.2: 0 WBGene00000003: 0 chrIV :5126 Mean : 8415951 Mean : 8419438 WBGene00000004: 0 4R79.2a: 0 WBGene00000004: 0 chrM : 12 3rd Qu.:11924503 3rd Qu.:11929206 WBGene00000005: 0 4R79.2b: 0 WBGene00000005: 0 chrV :6763 Max. :20915455 Max. :20922707 (Other) : 0 (Other): 0 (Other) : 0 chrX :4129 NA's :30163 NA's :30163 NA's :30163 # convert bed tables into ranged data: MyPeakList = BED2RangedData(peaks.bed) annotation.cds = BED2RangedData(annotation.cds) # annotate peaks: AnnotatedPeak = annotatePeakInBatch(MyPeakList[1:6,], AnnotationData=annotation.cds) as.data.frame(AnnotatedPeak) space start end width names peak strand feature start_position end_position insideFeature distancetoFeature shortestDistance 1 I 16921 17165 245 MACS_peak_2 00006 MACS_peak_2 + 00006 11618 16804 downstream 5303 117 2 I 110632 110850 219 MACS_peak_4 00018 MACS_peak_4 + 00018 111036 112316 upstream -404 186 3 I 536767 536967 201 MACS_peak_6 00070 MACS_peak_6 + 00070 537125 541634 upstream -358 158 4 I 3680 4064 385 MACS_peak_1 02345 MACS_peak_1 - 02345 4221 10230 downstream 6550 157 5 I 108389 108741 353 MACS_peak_3 02354 MACS_peak_3 - 02354 101213 108161 upstream -228 228 6 I 315156 315412 257 MACS_peak_5 02372 MACS_peak_5 - 02372 310981 314992 upstream -164 164 fromOverlappingOrNearest 1 NearestStart 2 NearestStart 3 NearestStart 4 NearestStart 5 NearestStart 6 NearestStart # execute enrichment analysis: enrichedGO = getEnrichedGO(AnnotatedPeak, orgAnn = "org.Ce.eg.db", maxP = 0.01, multiAdj = TRUE, minGOterm = 10, multiAdjMethod = "BH" ) Error in if (class(go.ids) != "matrix" | dim(go.ids)[2] < 4) { : argument is of length zero I've highlighted in blue the outputs of distinct commands and in red the error that I obtain upon attempting GO analysis. Any ideas as to how I should annotate my features (which codes I should use) or how to prepare the objects in R for the enrichment analysis? Please note that in my annotation BED file the WormBase identifiers for each gene are also available (under the "gene" column). I noticed there is a "org.Ce.egWORMBASE" object in the "org.Ce.eg.db" library which may be useful, but I do not understand how to use it convert between gene IDs. Kind regards, Carlos L. Araya, PhD Stanford University | Department of Genetics 300 Pasteur Dr. #M348 | Stanford, CA 94305 m: 206.661.3648 <tel:206.661.3648> <tel:206.661.3648 <tel:206.661.3648=""> > | e: claraya@stanford.edu <http: claraya@stanford.edu=""> <http: claraya@stanford.edu=""> [[alternative HTML version deleted]]