ChIPpeakAnno: Inquiry for Worm Analysis
2
0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 3 months ago
United States
Dear Carlos, The code looks fine except the following line. You would want to use MyPeakList instead of the first 6 records as MyPeakList[1:6,]. AnnotatedPeak = annotatePeakInBatch(MyPeakList[1:6,], AnnotationData=annotation.cds) Please let me know if this resolves the issue. Thanks! Best regards, Julie On 3/1/12 12:13 PM, "Carlos Araya" <claraya@stanford.edu> wrote: Dear Julie, I'm ready to perform GO enrichment analysis on ~292 ChIP-seq datasets and would like to use your ChIPpeakAnno package. However, I've encountered problems that I believe are due to differences in the gene/feature names used. Here is an excerpt of my code: library(ChIPpeakAnno) library(org.Ce.eg.db) # load peak calls and genomic annotations, filter for gene (CDS) annotations: peaks.bed = read.table("YL474_HPL-2_L1_yale_stn_1x2_peaks.bed", header=FALSE) annotation = read.table("in2shape_wbComp.bed", header=TRUE) annotation.cds = annotation[annotation$feature.type == "cds",] summary(annotation.pct) chrm start end feature score strand name gene chrI :4593 Min. : 58 Min. : 779 WBGene00000001: 0 .:30163 -:14976 2L52.1 : 0 WBGene00000001: 0 chrII :5195 1st Qu.: 4687094 1st Qu.: 4688288 WBGene00000002: 0 .: 0 2RSSE.1: 0 WBGene00000002: 0 chrIII:4345 Median : 8229719 Median : 8235458 WBGene00000003: 0 +:15187 2RSSE.2: 0 WBGene00000003: 0 chrIV :5126 Mean : 8415951 Mean : 8419438 WBGene00000004: 0 4R79.2a: 0 WBGene00000004: 0 chrM : 12 3rd Qu.:11924503 3rd Qu.:11929206 WBGene00000005: 0 4R79.2b: 0 WBGene00000005: 0 chrV :6763 Max. :20915455 Max. :20922707 (Other) : 0 (Other): 0 (Other) : 0 chrX :4129 NA's :30163 NA's :30163 NA's :30163 # convert bed tables into ranged data: MyPeakList = BED2RangedData(peaks.bed) annotation.cds = BED2RangedData(annotation.cds) # annotate peaks: AnnotatedPeak = annotatePeakInBatch(MyPeakList[1:6,], AnnotationData=annotation.cds) as.data.frame(AnnotatedPeak) space start end width names peak strand feature start_position end_position insideFeature distancetoFeature shortestDistance 1 I 16921 17165 245 MACS_peak_2 00006 MACS_peak_2 + 00006 11618 16804 downstream 5303 117 2 I 110632 110850 219 MACS_peak_4 00018 MACS_peak_4 + 00018 111036 112316 upstream -404 186 3 I 536767 536967 201 MACS_peak_6 00070 MACS_peak_6 + 00070 537125 541634 upstream -358 158 4 I 3680 4064 385 MACS_peak_1 02345 MACS_peak_1 - 02345 4221 10230 downstream 6550 157 5 I 108389 108741 353 MACS_peak_3 02354 MACS_peak_3 - 02354 101213 108161 upstream -228 228 6 I 315156 315412 257 MACS_peak_5 02372 MACS_peak_5 - 02372 310981 314992 upstream -164 164 fromOverlappingOrNearest 1 NearestStart 2 NearestStart 3 NearestStart 4 NearestStart 5 NearestStart 6 NearestStart # execute enrichment analysis: enrichedGO = getEnrichedGO(AnnotatedPeak, orgAnn = "org.Ce.eg.db", maxP = 0.01, multiAdj = TRUE, minGOterm = 10, multiAdjMethod = "BH" ) Error in if (class(go.ids) != "matrix" | dim(go.ids)[2] < 4) { : argument is of length zero I've highlighted in blue the outputs of distinct commands and in red the error that I obtain upon attempting GO analysis. Any ideas as to how I should annotate my features (which codes I should use) or how to prepare the objects in R for the enrichment analysis? Please note that in my annotation BED file the WormBase identifiers for each gene are also available (under the "gene" column). I noticed there is a "org.Ce.egWORMBASE" object in the "org.Ce.eg.db" library which may be useful, but I do not understand how to use it convert between gene IDs. Kind regards, Carlos L. Araya, PhD Stanford University | Department of Genetics 300 Pasteur Dr. #M348 | Stanford, CA 94305 m: 206.661.3648 | e: claraya@stanford.edu [[alternative HTML version deleted]]
Annotation GO annotate convert ChIPpeakAnno Annotation GO annotate convert ChIPpeakAnno • 698 views
ADD COMMENT
0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 3 months ago
United States
Dear Carlo, Thanks for sharing the working code! Please do not hesitate to contact me and cc bioconductor if you have further questions. Best regards, Julie On 3/1/12 1:09 PM, "Carlos Araya" <claraya@stanford.edu> wrote: Dear Julie, I've made the change that you recommended and also changed the source of the annotations as below. This works now! mart = useMart(biomart="ensembl", dataset="celegans_gene_ensembl") mart.TSS = getAnnotation(mart, featureType = "TSS") MyPeakList = BED2RangedData(peaks.bed) AnnotatedPeak = annotatePeakInBatch(MyPeakList, AnnotationData=mart.TSS) enrichedGO = getEnrichedGO(AnnotatedPeak, orgAnn = "org.Ce.eg.db", maxP = 0.01, multiAdj = TRUE, minGOterm = 10, multiAdjMethod = "BH" ) Thanks! I must admit though; I'll probably have more questions later. =] Carlos L. Araya, PhD Stanford University | Department of Genetics 300 Pasteur Dr. #M348 | Stanford, CA 94305 m: 206.661.3648 | e: claraya@stanford.edu On Thu, Mar 1, 2012 at 9:31 AM, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu> wrote: Dear Carlos, The code looks fine except the following line. You would want to use MyPeakList instead of the first 6 records as MyPeakList[1:6,]. AnnotatedPeak = annotatePeakInBatch(MyPeakList[1:6,], AnnotationData=annotation.cds) Please let me know if this resolves the issue. Thanks! Best regards, Julie On 3/1/12 12:13 PM, "Carlos Araya" <claraya@stanford.edu <http:="" claraya@stanford.edu=""> > wrote: Dear Julie, I'm ready to perform GO enrichment analysis on ~292 ChIP-seq datasets and would like to use your ChIPpeakAnno package. However, I've encountered problems that I believe are due to differences in the gene/feature names used. Here is an excerpt of my code: library(ChIPpeakAnno) library(org.Ce.eg.db) # load peak calls and genomic annotations, filter for gene (CDS) annotations: peaks.bed = read.table("YL474_HPL-2_L1_yale_stn_1x2_peaks.bed", header=FALSE) annotation = read.table("in2shape_wbComp.bed", header=TRUE) annotation.cds = annotation[annotation$feature.type == "cds",] summary(annotation.pct) chrm start end feature score strand name gene chrI :4593 Min. : 58 Min. : 779 WBGene00000001: 0 .:30163 -:14976 2L52.1 : 0 WBGene00000001: 0 chrII :5195 1st Qu.: 4687094 1st Qu.: 4688288 WBGene00000002: 0 .: 0 2RSSE.1: 0 WBGene00000002: 0 chrIII:4345 Median : 8229719 Median : 8235458 WBGene00000003: 0 +:15187 2RSSE.2: 0 WBGene00000003: 0 chrIV :5126 Mean : 8415951 Mean : 8419438 WBGene00000004: 0 4R79.2a: 0 WBGene00000004: 0 chrM : 12 3rd Qu.:11924503 3rd Qu.:11929206 WBGene00000005: 0 4R79.2b: 0 WBGene00000005: 0 chrV :6763 Max. :20915455 Max. :20922707 (Other) : 0 (Other): 0 (Other) : 0 chrX :4129 NA's :30163 NA's :30163 NA's :30163 # convert bed tables into ranged data: MyPeakList = BED2RangedData(peaks.bed) annotation.cds = BED2RangedData(annotation.cds) # annotate peaks: AnnotatedPeak = annotatePeakInBatch(MyPeakList[1:6,], AnnotationData=annotation.cds) as.data.frame(AnnotatedPeak) space start end width names peak strand feature start_position end_position insideFeature distancetoFeature shortestDistance 1 I 16921 17165 245 MACS_peak_2 00006 MACS_peak_2 + 00006 11618 16804 downstream 5303 117 2 I 110632 110850 219 MACS_peak_4 00018 MACS_peak_4 + 00018 111036 112316 upstream -404 186 3 I 536767 536967 201 MACS_peak_6 00070 MACS_peak_6 + 00070 537125 541634 upstream -358 158 4 I 3680 4064 385 MACS_peak_1 02345 MACS_peak_1 - 02345 4221 10230 downstream 6550 157 5 I 108389 108741 353 MACS_peak_3 02354 MACS_peak_3 - 02354 101213 108161 upstream -228 228 6 I 315156 315412 257 MACS_peak_5 02372 MACS_peak_5 - 02372 310981 314992 upstream -164 164 fromOverlappingOrNearest 1 NearestStart 2 NearestStart 3 NearestStart 4 NearestStart 5 NearestStart 6 NearestStart # execute enrichment analysis: enrichedGO = getEnrichedGO(AnnotatedPeak, orgAnn = "org.Ce.eg.db", maxP = 0.01, multiAdj = TRUE, minGOterm = 10, multiAdjMethod = "BH" ) Error in if (class(go.ids) != "matrix" | dim(go.ids)[2] < 4) { : argument is of length zero I've highlighted in blue the outputs of distinct commands and in red the error that I obtain upon attempting GO analysis. Any ideas as to how I should annotate my features (which codes I should use) or how to prepare the objects in R for the enrichment analysis? Please note that in my annotation BED file the WormBase identifiers for each gene are also available (under the "gene" column). I noticed there is a "org.Ce.egWORMBASE" object in the "org.Ce.eg.db" library which may be useful, but I do not understand how to use it convert between gene IDs. Kind regards, Carlos L. Araya, PhD Stanford University | Department of Genetics 300 Pasteur Dr. #M348 | Stanford, CA 94305 m: 206.661.3648 <tel:206.661.3648> | e: claraya@stanford.edu <http: claraya@stanford.edu=""> [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 3 months ago
United States
Dear Carlos, That is a very good question. Currently It does not have the capability to pass in a level parameter. However, you could filter the results after the analysis is done before filtering the p-value or adjusting the p-value. In the future, we could add level information to the result set for quick filtering. Best regards, Julie On 3/30/12 12:38 PM, "Carlos Araya" <claraya@stanford.edu> wrote: Dear Lihua, I hope this email finds you well. I am wondering whether there is a way to restrict broad/high-level and poorly-informative GO terms such as "intracellular part" in ChIPpeakAnno? Kind regards, Carlos L. Araya, PhD Stanford University | Department of Genetics 300 Pasteur Dr. #M348 | Stanford, CA 94305 m: 206.661.3648 | e: claraya@stanford.edu On Thu, Mar 1, 2012 at 10:21 AM, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu> wrote: Dear Carlo, Thanks for sharing the working code! Please do not hesitate to contact me and cc bioconductor if you have further questions. Best regards, Julie On 3/1/12 1:09 PM, "Carlos Araya" <claraya@stanford.edu <http:="" claraya@stanford.edu=""> > wrote: Dear Julie, I've made the change that you recommended and also changed the source of the annotations as below. This works now! mart = useMart(biomart="ensembl", dataset="celegans_gene_ensembl") mart.TSS = getAnnotation(mart, featureType = "TSS") MyPeakList = BED2RangedData(peaks.bed) AnnotatedPeak = annotatePeakInBatch(MyPeakList, AnnotationData=mart.TSS) enrichedGO = getEnrichedGO(AnnotatedPeak, orgAnn = "org.Ce.eg.db", maxP = 0.01, multiAdj = TRUE, minGOterm = 10, multiAdjMethod = "BH" ) Thanks! I must admit though; I'll probably have more questions later. =] Carlos L. Araya, PhD Stanford University | Department of Genetics 300 Pasteur Dr. #M348 | Stanford, CA 94305 m: 206.661.3648 <tel:206.661.3648> | e: claraya@stanford.edu <http: claraya@stanford.edu=""> On Thu, Mar 1, 2012 at 9:31 AM, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu <http:="" julie.zhu@umassmed.edu=""> > wrote: Dear Carlos, The code looks fine except the following line. You would want to use MyPeakList instead of the first 6 records as MyPeakList[1:6,]. AnnotatedPeak = annotatePeakInBatch(MyPeakList[1:6,], AnnotationData=annotation.cds) Please let me know if this resolves the issue. Thanks! Best regards, Julie On 3/1/12 12:13 PM, "Carlos Araya" <claraya@stanford.edu <http:="" claraya@stanford.edu=""> <http: claraya@stanford.edu=""> > wrote: Dear Julie, I'm ready to perform GO enrichment analysis on ~292 ChIP-seq datasets and would like to use your ChIPpeakAnno package. However, I've encountered problems that I believe are due to differences in the gene/feature names used. Here is an excerpt of my code: library(ChIPpeakAnno) library(org.Ce.eg.db) # load peak calls and genomic annotations, filter for gene (CDS) annotations: peaks.bed = read.table("YL474_HPL-2_L1_yale_stn_1x2_peaks.bed", header=FALSE) annotation = read.table("in2shape_wbComp.bed", header=TRUE) annotation.cds = annotation[annotation$feature.type == "cds",] summary(annotation.pct) chrm start end feature score strand name gene chrI :4593 Min. : 58 Min. : 779 WBGene00000001: 0 .:30163 -:14976 2L52.1 : 0 WBGene00000001: 0 chrII :5195 1st Qu.: 4687094 1st Qu.: 4688288 WBGene00000002: 0 .: 0 2RSSE.1: 0 WBGene00000002: 0 chrIII:4345 Median : 8229719 Median : 8235458 WBGene00000003: 0 +:15187 2RSSE.2: 0 WBGene00000003: 0 chrIV :5126 Mean : 8415951 Mean : 8419438 WBGene00000004: 0 4R79.2a: 0 WBGene00000004: 0 chrM : 12 3rd Qu.:11924503 3rd Qu.:11929206 WBGene00000005: 0 4R79.2b: 0 WBGene00000005: 0 chrV :6763 Max. :20915455 Max. :20922707 (Other) : 0 (Other): 0 (Other) : 0 chrX :4129 NA's :30163 NA's :30163 NA's :30163 # convert bed tables into ranged data: MyPeakList = BED2RangedData(peaks.bed) annotation.cds = BED2RangedData(annotation.cds) # annotate peaks: AnnotatedPeak = annotatePeakInBatch(MyPeakList[1:6,], AnnotationData=annotation.cds) as.data.frame(AnnotatedPeak) space start end width names peak strand feature start_position end_position insideFeature distancetoFeature shortestDistance 1 I 16921 17165 245 MACS_peak_2 00006 MACS_peak_2 + 00006 11618 16804 downstream 5303 117 2 I 110632 110850 219 MACS_peak_4 00018 MACS_peak_4 + 00018 111036 112316 upstream -404 186 3 I 536767 536967 201 MACS_peak_6 00070 MACS_peak_6 + 00070 537125 541634 upstream -358 158 4 I 3680 4064 385 MACS_peak_1 02345 MACS_peak_1 - 02345 4221 10230 downstream 6550 157 5 I 108389 108741 353 MACS_peak_3 02354 MACS_peak_3 - 02354 101213 108161 upstream -228 228 6 I 315156 315412 257 MACS_peak_5 02372 MACS_peak_5 - 02372 310981 314992 upstream -164 164 fromOverlappingOrNearest 1 NearestStart 2 NearestStart 3 NearestStart 4 NearestStart 5 NearestStart 6 NearestStart # execute enrichment analysis: enrichedGO = getEnrichedGO(AnnotatedPeak, orgAnn = "org.Ce.eg.db", maxP = 0.01, multiAdj = TRUE, minGOterm = 10, multiAdjMethod = "BH" ) Error in if (class(go.ids) != "matrix" | dim(go.ids)[2] < 4) { : argument is of length zero I've highlighted in blue the outputs of distinct commands and in red the error that I obtain upon attempting GO analysis. Any ideas as to how I should annotate my features (which codes I should use) or how to prepare the objects in R for the enrichment analysis? Please note that in my annotation BED file the WormBase identifiers for each gene are also available (under the "gene" column). I noticed there is a "org.Ce.egWORMBASE" object in the "org.Ce.eg.db" library which may be useful, but I do not understand how to use it convert between gene IDs. Kind regards, Carlos L. Araya, PhD Stanford University | Department of Genetics 300 Pasteur Dr. #M348 | Stanford, CA 94305 m: 206.661.3648 <tel:206.661.3648> <tel:206.661.3648 <tel:206.661.3648=""> > | e: claraya@stanford.edu <http: claraya@stanford.edu=""> <http: claraya@stanford.edu=""> [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 238 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6