Question

ChIPpeakAnno - Request of information

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 6 months ago

United States

Dear Federico, Yes. You can ChIPpeakAnno to annotate your peaks with refseq genes/transcripts. You could also use any custom annotation file you have. One option for obtaining the refseq genes/transcripts would be using GenomicFeatures package. I am ccing Bioconductor listserv for others to chime in. library(GenomicFeatures) supportedUCSCtables() Best regards, Julie On 3/29/12 3:58 AM, "Federico Marini" <f.marini@imb-mainz.de> wrote: Dear Prof. Zhu, My name is Federico Marini, and I started recently my activity as PhD student at the Institute of Molecular Biology in Mainz, Germany. I contact you to request additional information regarding one of your published works, namely "ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data", which I had the pleasure to read in detail. I would like to get perform the annotation referring to the refseq genes/transcripts, instead of the ensembl ones. Could it be possible to obtain a similar object for this, and if so, what could be the best/smartest way to do it? Since it could be a very precious resource for the needs of our lab, this is why I chose to contact you directly. Thanks in advance for your time and attention. Best regards, Federico Marini [[alternative HTML version deleted]]

Annotation annotate ChIPpeakAnno Annotation annotate ChIPpeakAnno • 817 views

ADD COMMENT • link updated 12.1 years ago by Ou, Jianhong ★ 1.3k • written 12.1 years ago by Julie Zhu ★ 4.3k

score 0 · Answer 1 · 2012-03-29

Hi Federico, The feature column is the rownames of annotationData. so you could try: rownames(RDucscmm9)<-RDucscmm9$tx_name before annotatePeakInBatch. Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu> On Mar 29, 2012, at 3:12 PM, Federico Marini wrote: Hi Jianhong, I tried the intermediate step you suggested and it actually returned no error, just the warning message: Warning message: In annotatePeakInBatch(peaksRDt[1:10, ], AnnotationData = RDucscmm9, : Please use select instead of multiple! but this is minor. So, first of all thank you one more time for solving this issue. Just one more thing, and then I think I really am good to go: the table I downloaded should ideally report the NM_ and NR_ identifiers for the transcripts, but in the final output what I see is this (in bold the column of interest) RangedData with 14 rows and 9 value columns across 1 space space ranges | peak strand <factor> <iranges> | <character> <character> MACS_peak_1 01511 chr1 [3024414, 3024497] | MACS_peak_1 - MACS_peak_10 01515 chr1 [4346000, 4346141] | MACS_peak_10 - MACS_peak_2 01511 chr1 [3115427, 3115503] | MACS_peak_2 - MACS_peak_3 01511 chr1 [3145725, 3145836] | MACS_peak_3 - MACS_peak_4 01511 chr1 [3332349, 3332698] | MACS_peak_4 - MACS_peak_5 01513 chr1 [3566535, 3566719] | MACS_peak_5 - MACS_peak_6 01512 chr1 [3726359, 3726485] | MACS_peak_6 - MACS_peak_7 01512 chr1 [3795201, 3795271] | MACS_peak_7 - MACS_peak_8 01515 chr1 [4204115, 4204357] | MACS_peak_8 - MACS_peak_9 01515 chr1 [4283090, 4283190] | MACS_peak_9 - MACS_peak_10 01514 chr1 [4346000, 4346141] | MACS_peak_10 - MACS_peak_4 01512 chr1 [3332349, 3332698] | MACS_peak_4 - MACS_peak_5 01512 chr1 [3566535, 3566719] | MACS_peak_5 - MACS_peak_9 01514 chr1 [4283090, 4283190] | MACS_peak_9 - feature start_position end_position insideFeature <character> <numeric> <numeric> <character> MACS_peak_1 01511 01511 3195985 3205713 downstream MACS_peak_10 01515 01515 4333588 4350395 inside MACS_peak_2 01511 01511 3195985 3205713 downstream MACS_peak_3 01511 01511 3195985 3205713 downstream MACS_peak_4 01511 01511 3195985 3205713 upstream MACS_peak_5 01513 01513 3638392 3648985 downstream MACS_peak_6 01512 01512 3204563 3661579 upstream MACS_peak_7 01512 01512 3204563 3661579 upstream MACS_peak_8 01515 01515 4333588 4350395 downstream MACS_peak_9 01515 01515 4333588 4350395 downstream MACS_peak_10 01514 01514 4280927 4399322 inside MACS_peak_4 01512 01512 3204563 3661579 inside MACS_peak_5 01512 01512 3204563 3661579 inside MACS_peak_9 01514 01514 4280927 4399322 inside distancetoFeature shortestDistance fromOverlappingOrNearest <numeric> <numeric> <character> MACS_peak_1 01511 181299 171488 NearestStart MACS_peak_10 01515 4395 4254 NearestStart MACS_peak_2 01511 90286 80482 NearestStart MACS_peak_3 01511 59988 50149 NearestStart MACS_peak_4 01511 -126636 126636 NearestStart MACS_peak_5 01513 82450 71673 NearestStart MACS_peak_6 01512 -64780 64780 NearestStart MACS_peak_7 01512 -133622 133622 NearestStart MACS_peak_8 01515 146280 129231 NearestStart MACS_peak_9 01515 67305 50398 NearestStart MACS_peak_10 01514 53322 53181 Overlapping MACS_peak_4 01512 329230 127786 Overlapping MACS_peak_5 01512 95044 94860 Overlapping MACS_peak_9 01514 116232 2163 Overlapping So, no(t yet) the TranscriptID as I was planning (while it appears regularly ENSMUSG00000064842 or others when I use the TSS you provided). Is some step still missing? I also tried playing around with the vals and columns parameters in the transcripts step, but it didn't work. Best regards, Federico On 29/03/12 18:52, Ou, Jianhong wrote: Hi Federico, Could you please have a try RDucscmm9$strand<-as(RDucscmm9$strand,"factor") before using annotatePeakInBatch, and tell me how about the results. Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu> On Mar 29, 2012, at 12:42 PM, Zhu, Lihua (Julie) wrote: Dear Federico, The following code works. So RDucscmm9<- as(ucscmm9, "RangedData") should work as well. It complains about ranges being NA. Jianhong will help you to debug and get back to you. Thanks! gr <- GRanges(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), ranges = IRanges(1:10, width = 10:1, names = head(letters,10)), strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)), score = 1:10, GC = seq(1, 0, length=10)) as(gr, RangedData) Best regards, Julie On 3/29/12 11:27 AM, "Federico Marini" <f.marini@imb- mainz.de<x-msg:="" 458="" f.marini@imb-mainz.de="">> wrote: Dear Julie, Thanks first of all for the positive and prompt answer. I just have one additional question: when retrieving the TranscriptDb object, these are the steps I took. source("http://bioconductor.org/biocLite.R" <http: bioconductor.org="" bioclite.r=""> ) biocLite("GenomicFeatures") library(GenomicFeatures) supportedUCSCtables() mm9KG <-makeTranscriptDbFromUCSC(genome = "mm9",tablename = "knownGene") ucscmm9<- transcripts(mm9KG) annotPeaksUCSC<-annotatePeakInBatch(peaksRDt[1:10,],AnnotationData=ucs cmm9,output="both",multiple=FALSE,maxgap=0) This returns an error since annotatePeakInBatch requires a RangedData object for the annotation, but my "ucscmm9" is a GRanges object; still, http://wiki.fhcrc.org/bioc/Transcript_Annotation_API_Spec specified that I would have a RangedData object as the output for the transcripts function - but apparently it didn't work like that. I thought one way to make it work is to us RDucscmm9<- as (ucscmm9, "RangedData") but I get this as an output Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : solving row 1: range cannot be determined from the supplied arguments (too many NAs) I tried googling around for this, but sadly no solution was found (yet). Is there a tested and smart way to do this - and get something which is under all aspects comparable to the TSS.mouse.NCBIM37 annotation? Thanks again for the attention. Best regards, Federico On 29/03/12 16:43, Zhu, Lihua (Julie) wrote: Re: ChIPpeakAnno - Request of information Dear Federico, Yes. You can ChIPpeakAnno to annotate your peaks with refseq genes/transcripts. You could also use any custom annotation file you have. One option for obtaining the refseq genes/transcripts would be using GenomicFeatures package. I am ccing Bioconductor listserv for others to chime in. library(GenomicFeatures) supportedUCSCtables() Best regards, Julie On 3/29/12 3:58 AM, "Federico Marini" <f.marini@imb- mainz.de<x-msg:="" 458="" f.marini@imb-mainz.de="">> wrote: Dear Prof. Zhu, My name is Federico Marini, and I started recently my activity as PhD student at the Institute of Molecular Biology in Mainz, Germany. I contact you to request additional information regarding one of your published works, namely "ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data", which I had the pleasure to read in detail. I would like to get perform the annotation referring to the refseq genes/transcripts, instead of the ensembl ones. Could it be possible to obtain a similar object for this, and if so, what could be the best/smartest way to do it? Since it could be a very precious resource for the needs of our lab, this is why I chose to contact you directly. Thanks in advance for your time and attention. Best regards, Federico Marini -- Federico Marini Signaling to Chromatin Laboratory Institute of Molecular Biology gGmbH (IMB) http://www.imb-mainz.de<http: www.imb-mainz.de=""/> Tel: +49 (0) 6131 39 21462 [[alternative HTML version deleted]]