Question

ChIPpeakAnno

0

Entering edit mode

Paolo Kunderfranco ▴ 350

@paolo-kunderfranco-5158

Last seen 6.9 years ago

Hi, We used R package ChIPpeakAnno to annotate some ChIP Seq coordinates with respect to the TSS with the script pasted below the message. I was wondering what are exactly the meaning of the features in the pie-chart. To be more exaustly I would like to know what does feature refers to. â¢â¯ upstream: peak resides upstream of the feature; â¢â¯ downstream: peak resides downstream ofthe feature; â¢â¯ inside: peak resides inside the feature; â¢â¯ overlapStart: peak overlaps with the start of the feature; â¢â¯ overlapEnd: peak overlaps with the end of the feature; â¢â¯ includeFeature: peak include the feature entirely Many thanks, Paolo ############################################## # find the nearest TSS for the peaks ############################################## test.rangedData = BED2RangedData(test.bed) setwd=(system.file("data", package ="ChIPpeakAnno")) data(TSS.mouse.NCBIM37) annotatedPeak = annotatePeakInBatch(test.rangedData, AnnotationData=TSS.mouse.NCBIM37) as.data.frame(annotatedPeak) a<- as.data.frame(annotatedPeak) write.table(a,file="annotatedPeakList.xls", sep="\t", col.names=TRUE, row.names=FALSE) write.table(a,file="annotatedPeakList.bed", sep="\t", col.names=TRUE, row.names=FALSE) #addGeneIDs(annotatedPeak,"org.Mm.eg.db",c("symbol")) #addGeneIDs(annotatedPeak$feature,"org.Mm.eg.db",c("symbol")) library("org.Mm.eg.db") b<- addGeneIDs(annotatedPeak,"org.Mm.eg.db",c("symbol")) c<- as.data.frame(b) write.table(c,file="annotatedPeakList_GeneId.xls", sep="\t", col.names=TRUE, row.names=FALSE) write.table(c,file="annotatedPeakList_GeneId.bed", sep="\t", col.names=TRUE, row.names=FALSE) ############################################## # Plot the distribution of the peaks relative to the TSS # Gives a birds-eye view of the peak distribution relative to the genomic features of interest. ############################################## data(annotatedPeak) y = annotatedPeak$distancetoFeature[!is.na(annotatedPeak$distancetoFea ture) & annotatedPeak$fromOverlappingOrNearest == "NearestStart"] a<-hist(y, xlab="Distance To Nearest TSS", main="", breaks=100000, xlim=c(-2e+06, 2e+06),col='blue') png('distribution of the peaks relative to the TSS.png') plot(a, col='blue', main="", xlab="Distance To Nearest TSS", xlim=c(-1e+06, 1e+06)) dev.off() temp = as.data.frame(annotatedPeak) #plot(density(y),main="",col='blue') png('density of the peaks relative to the TSS.png') plot(density(y),main="",col='blue',xlim=c(-1e+07, 1e+07)) dev.off() y = annotatedPeak$distancetoFeature[!is.na(annotatedPeak$distancetoFea ture) & annotatedPeak$fromOverlappingOrNearest == "NearestStart" & abs(annotatedPeak$distancetoFeature) <10000] #pie(main="",table(temp[as.character(temp$fromOverlappingOrNearest) == "Overlapping" | (as.character(temp$fromOverlappingOrNearest) == "NearestStart" & !temp$peak %in% temp[as.character(temp$fromOverlappingOrNearest) == "Overlapping", ]$peak) ,]$insideFeature)) png('distribution of the peaks relative to the CDS.png') pie(main="",table(temp[as.character(temp$fromOverlappingOrNearest) == "Overlapping" | (as.character(temp$fromOverlappingOrNearest) == "NearestStart" & !temp$peak %in% temp[as.character(temp$fromOverlappingOrNearest) == "Overlapping", ]$peak) ,]$insideFeature)) dev.off() [[alternative HTML version deleted]]

annotate ChIPpeakAnno annotate ChIPpeakAnno • 1.6k views

ADD COMMENT • link updated 10.0 years ago by Julie Zhu ★ 4.3k • written 10.0 years ago by Paolo Kunderfranco ▴ 350

score 0 · Answer 1 · 2014-04-23

Paolo, Feature here refers to gene. Does it make sense? Thanks! BTW, I noticed that you are using TSS.mouse.NCBIM37 (mm9). If your peaks are in mm10 coordinates, you would need to use TSS.mouse.GRCm38. You can also obtain your own annotation using getAnnotation function, as detailed at http://grokbase.com/t/r/bioconductor/1336g2xnvc/bioc- build-tss-from-biomart-for-chippeakanno-how-to-decide-the-mm9-and- mm10-assembly Best regards, Julie On 4/23/14 2:32 AM, "Paolo Kunderfranco" <paolo.kunderfranco@gmail.com> wrote: Hi, We used R package ChIPpeakAnno to annotate some ChIP Seq coordinates with respect to the TSS with the script pasted below the message. I was wondering what are exactly the meaning of the features in the pie-chart. To be more exaustly I would like to know what does feature refers to. â¢â¯ upstream: peak resides upstream of the feature; â¢â¯ downstream: peak resides downstream ofthe feature; â¢â¯ inside: peak resides inside the feature; â¢â¯ overlapStart: peak overlaps with the start of the feature; â¢â¯ overlapEnd: peak overlaps with the end of the feature; â¢â¯ includeFeature: peak include the feature entirely Many thanks, Paolo ############################################## # find the nearest TSS for the peaks ############################################## test.rangedData = BED2RangedData(test.bed) setwd=(system.file("data", package ="ChIPpeakAnno")) data(TSS.mouse.NCBIM37) annotatedPeak = annotatePeakInBatch(test.rangedData, AnnotationData=TSS.mouse.NCBIM37) as.data.frame(annotatedPeak) a<- as.data.frame(annotatedPeak) write.table(a,file="annotatedPeakList.xls", sep="\t", col.names=TRUE, row.names=FALSE) write.table(a,file="annotatedPeakList.bed", sep="\t", col.names=TRUE, row.names=FALSE) #addGeneIDs(annotatedPeak,"org.Mm.eg.db",c("symbol")) #addGeneIDs(annotatedPeak$feature,"org.Mm.eg.db",c("symbol")) library("org.Mm.eg.db") b<- addGeneIDs(annotatedPeak,"org.Mm.eg.db",c("symbol")) c<- as.data.frame(b) write.table(c,file="annotatedPeakList_GeneId.xls", sep="\t", col.names=TRUE, row.names=FALSE) write.table(c,file="annotatedPeakList_GeneId.bed", sep="\t", col.names=TRUE, row.names=FALSE) ############################################## # Plot the distribution of the peaks relative to the TSS # Gives a birds-eye view of the peak distribution relative to the genomic features of interest. ############################################## data(annotatedPeak) y = annotatedPeak$distancetoFeature[!is.na <http: is.na=""> (annotatedPeak$distancetoFeature) & annotatedPeak$fromOverlappingOrNearest == "NearestStart"] a<-hist(y, xlab="Distance To Nearest TSS", main="", breaks=100000, xlim=c(-2e+06, 2e+06),col='blue') png('distribution of the peaks relative to the TSS.png') plot(a, col='blue', main="", xlab="Distance To Nearest TSS", xlim=c(-1e+06, 1e+06)) dev.off() temp = as.data.frame(annotatedPeak) #plot(density(y),main="",col='blue') png('density of the peaks relative to the TSS.png') plot(density(y),main="",col='blue',xlim=c(-1e+07, 1e+07)) dev.off() y = annotatedPeak$distancetoFeature[!is.na <http: is.na=""> (annotatedPeak$distancetoFeature) & annotatedPeak$fromOverlappingOrNearest == "NearestStart" & abs(annotatedPeak$distancetoFeature) <10000] #pie(main="",table(temp[as.character(temp$fromOverlappingOrNearest) == "Overlapping" | (as.character(temp$fromOverlappingOrNearest) == "NearestStart" & !temp$peak %in% temp[as.character(temp$fromOverlappingOrNearest) == "Overlapping", ]$peak) ,]$insideFeature)) png('distribution of the peaks relative to the CDS.png') pie(main="",table(temp[as.character(temp$fromOverlappingOrNearest) == "Overlapping" | (as.character(temp$fromOverlappingOrNearest) == "NearestStart" & !temp$peak %in% temp[as.character(temp$fromOverlappingOrNearest) == "Overlapping", ]$peak) ,]$insideFeature)) dev.off() [[alternative HTML version deleted]]