ChIPpeakAnno - Request of information
1
0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 6 months ago
United States
Dear Federico, Yes. You can ChIPpeakAnno to annotate your peaks with refseq genes/transcripts. You could also use any custom annotation file you have. One option for obtaining the refseq genes/transcripts would be using GenomicFeatures package. I am ccing Bioconductor listserv for others to chime in. library(GenomicFeatures) supportedUCSCtables() Best regards, Julie On 3/29/12 3:58 AM, "Federico Marini" <f.marini@imb-mainz.de> wrote: Dear Prof. Zhu, My name is Federico Marini, and I started recently my activity as PhD student at the Institute of Molecular Biology in Mainz, Germany. I contact you to request additional information regarding one of your published works, namely "ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data", which I had the pleasure to read in detail. I would like to get perform the annotation referring to the refseq genes/transcripts, instead of the ensembl ones. Could it be possible to obtain a similar object for this, and if so, what could be the best/smartest way to do it? Since it could be a very precious resource for the needs of our lab, this is why I chose to contact you directly. Thanks in advance for your time and attention. Best regards, Federico Marini [[alternative HTML version deleted]]
Annotation annotate ChIPpeakAnno Annotation annotate ChIPpeakAnno • 817 views
ADD COMMENT
0
Entering edit mode
Ou, Jianhong ★ 1.3k
@ou-jianhong-4539
Last seen 2 days ago
United States
Hi Federico, The feature column is the rownames of annotationData. so you could try: rownames(RDucscmm9)<-RDucscmm9$tx_name before annotatePeakInBatch. Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu> On Mar 29, 2012, at 3:12 PM, Federico Marini wrote: Hi Jianhong, I tried the intermediate step you suggested and it actually returned no error, just the warning message: Warning message: In annotatePeakInBatch(peaksRDt[1:10, ], AnnotationData = RDucscmm9, : Please use select instead of multiple! but this is minor. So, first of all thank you one more time for solving this issue. Just one more thing, and then I think I really am good to go: the table I downloaded should ideally report the NM_ and NR_ identifiers for the transcripts, but in the final output what I see is this (in bold the column of interest) RangedData with 14 rows and 9 value columns across 1 space space ranges | peak strand <factor> <iranges> | <character> <character> MACS_peak_1 01511 chr1 [3024414, 3024497] | MACS_peak_1 - MACS_peak_10 01515 chr1 [4346000, 4346141] | MACS_peak_10 - MACS_peak_2 01511 chr1 [3115427, 3115503] | MACS_peak_2 - MACS_peak_3 01511 chr1 [3145725, 3145836] | MACS_peak_3 - MACS_peak_4 01511 chr1 [3332349, 3332698] | MACS_peak_4 - MACS_peak_5 01513 chr1 [3566535, 3566719] | MACS_peak_5 - MACS_peak_6 01512 chr1 [3726359, 3726485] | MACS_peak_6 - MACS_peak_7 01512 chr1 [3795201, 3795271] | MACS_peak_7 - MACS_peak_8 01515 chr1 [4204115, 4204357] | MACS_peak_8 - MACS_peak_9 01515 chr1 [4283090, 4283190] | MACS_peak_9 - MACS_peak_10 01514 chr1 [4346000, 4346141] | MACS_peak_10 - MACS_peak_4 01512 chr1 [3332349, 3332698] | MACS_peak_4 - MACS_peak_5 01512 chr1 [3566535, 3566719] | MACS_peak_5 - MACS_peak_9 01514 chr1 [4283090, 4283190] | MACS_peak_9 - feature start_position end_position insideFeature <character> <numeric> <numeric> <character> MACS_peak_1 01511 01511 3195985 3205713 downstream MACS_peak_10 01515 01515 4333588 4350395 inside MACS_peak_2 01511 01511 3195985 3205713 downstream MACS_peak_3 01511 01511 3195985 3205713 downstream MACS_peak_4 01511 01511 3195985 3205713 upstream MACS_peak_5 01513 01513 3638392 3648985 downstream MACS_peak_6 01512 01512 3204563 3661579 upstream MACS_peak_7 01512 01512 3204563 3661579 upstream MACS_peak_8 01515 01515 4333588 4350395 downstream MACS_peak_9 01515 01515 4333588 4350395 downstream MACS_peak_10 01514 01514 4280927 4399322 inside MACS_peak_4 01512 01512 3204563 3661579 inside MACS_peak_5 01512 01512 3204563 3661579 inside MACS_peak_9 01514 01514 4280927 4399322 inside distancetoFeature shortestDistance fromOverlappingOrNearest <numeric> <numeric> <character> MACS_peak_1 01511 181299 171488 NearestStart MACS_peak_10 01515 4395 4254 NearestStart MACS_peak_2 01511 90286 80482 NearestStart MACS_peak_3 01511 59988 50149 NearestStart MACS_peak_4 01511 -126636 126636 NearestStart MACS_peak_5 01513 82450 71673 NearestStart MACS_peak_6 01512 -64780 64780 NearestStart MACS_peak_7 01512 -133622 133622 NearestStart MACS_peak_8 01515 146280 129231 NearestStart MACS_peak_9 01515 67305 50398 NearestStart MACS_peak_10 01514 53322 53181 Overlapping MACS_peak_4 01512 329230 127786 Overlapping MACS_peak_5 01512 95044 94860 Overlapping MACS_peak_9 01514 116232 2163 Overlapping So, no(t yet) the TranscriptID as I was planning (while it appears regularly ENSMUSG00000064842 or others when I use the TSS you provided). Is some step still missing? I also tried playing around with the vals and columns parameters in the transcripts step, but it didn't work. Best regards, Federico On 29/03/12 18:52, Ou, Jianhong wrote: Hi Federico, Could you please have a try RDucscmm9$strand<-as(RDucscmm9$strand,"factor") before using annotatePeakInBatch, and tell me how about the results. Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu> On Mar 29, 2012, at 12:42 PM, Zhu, Lihua (Julie) wrote: Dear Federico, The following code works. So RDucscmm9<- as(ucscmm9, "RangedData") should work as well. It complains about ranges being NA. Jianhong will help you to debug and get back to you. Thanks! gr <- GRanges(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), ranges = IRanges(1:10, width = 10:1, names = head(letters,10)), strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)), score = 1:10, GC = seq(1, 0, length=10)) as(gr, “RangedData”) Best regards, Julie On 3/29/12 11:27 AM, "Federico Marini" <f.marini@imb- mainz.de<x-msg:="" 458="" f.marini@imb-mainz.de="">> wrote: Dear Julie, Thanks first of all for the positive and prompt answer. I just have one additional question: when retrieving the TranscriptDb object, these are the steps I took. source("http://bioconductor.org/biocLite.R" <http: bioconductor.org="" bioclite.r=""> ) biocLite("GenomicFeatures") library(GenomicFeatures) supportedUCSCtables() mm9KG <-makeTranscriptDbFromUCSC(genome = "mm9",tablename = "knownGene") ucscmm9<- transcripts(mm9KG) annotPeaksUCSC<-annotatePeakInBatch(peaksRDt[1:10,],AnnotationData=ucs cmm9,output="both",multiple=FALSE,maxgap=0) This returns an error since annotatePeakInBatch requires a RangedData object for the annotation, but my "ucscmm9" is a GRanges object; still, http://wiki.fhcrc.org/bioc/Transcript_Annotation_API_Spec specified that I would have a RangedData object as the output for the transcripts function - but apparently it didn't work like that. I thought one way to make it work is to us RDucscmm9<- as (ucscmm9, "RangedData") but I get this as an output Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : solving row 1: range cannot be determined from the supplied arguments (too many NAs) I tried googling around for this, but sadly no solution was found (yet). Is there a tested and smart way to do this - and get something which is under all aspects comparable to the TSS.mouse.NCBIM37 annotation? Thanks again for the attention. Best regards, Federico On 29/03/12 16:43, Zhu, Lihua (Julie) wrote: Re: ChIPpeakAnno - Request of information Dear Federico, Yes. You can ChIPpeakAnno to annotate your peaks with refseq genes/transcripts. You could also use any custom annotation file you have. One option for obtaining the refseq genes/transcripts would be using GenomicFeatures package. I am ccing Bioconductor listserv for others to chime in. library(GenomicFeatures) supportedUCSCtables() Best regards, Julie On 3/29/12 3:58 AM, "Federico Marini" <f.marini@imb- mainz.de<x-msg:="" 458="" f.marini@imb-mainz.de="">> wrote: Dear Prof. Zhu, My name is Federico Marini, and I started recently my activity as PhD student at the Institute of Molecular Biology in Mainz, Germany. I contact you to request additional information regarding one of your published works, namely "ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data", which I had the pleasure to read in detail. I would like to get perform the annotation referring to the refseq genes/transcripts, instead of the ensembl ones. Could it be possible to obtain a similar object for this, and if so, what could be the best/smartest way to do it? Since it could be a very precious resource for the needs of our lab, this is why I chose to contact you directly. Thanks in advance for your time and attention. Best regards, Federico Marini -- Federico Marini Signaling to Chromatin Laboratory Institute of Molecular Biology gGmbH (IMB) http://www.imb-mainz.de<http: www.imb-mainz.de=""/> Tel: +49 (0) 6131 39 21462 [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Jianhong, Perfect. With this I still keep the biological information I needed, and I will probably just need to match this "uc007aeu.1" and similar to the values in the kgXref table, which I can import separately (using the field #*kgID* mRNA spID spDisplayID geneSymbol refseq protAcc description to join the tables), but feel free to correct me if I am wrong. I will furthermore try all these commands and see if there could be any more ways to make the data look complete and beautiful! Thanks, again and again, for the prompt support. Best regards, Federico On 29/03/12 21:45, Ou, Jianhong wrote: > Hi Federico, > > The feature column is the rownames of annotationData. so you could try: > rownames(RDucscmm9)<-RDucscmm9$tx_name > before annotatePeakInBatch. > > Yours sincerely, > > Jianhong Ou > > jianhong.ou@umassmed.edu <mailto:jianhong.ou@umassmed.edu> > > > On Mar 29, 2012, at 3:12 PM, Federico Marini wrote: > >> Hi Jianhong, >> >> I tried the intermediate step you suggested and it actually returned >> no error, just the warning message: >> >> Warning message: >> In annotatePeakInBatch(peaksRDt[1:10, ], AnnotationData = >> RDucscmm9, : >> Please use select instead of multiple! >> >> but this is minor. So, first of all thank you one more time for >> solving this issue. Just one more thing, and then I think I really am >> good to go: the table I downloaded should ideally report the NM_ and >> NR_ identifiers for the transcripts, but in the final output what I >> see is this (in bold the column of interest) >> >> RangedData with 14 rows and 9 value columns across 1 space >> space ranges | >> peak strand >> <factor> <iranges> | <character> <character> >> MACS_peak_1 01511 chr1 [3024414, 3024497] | >> MACS_peak_1 - >> MACS_peak_10 01515 chr1 [4346000, 4346141] | >> MACS_peak_10 - >> MACS_peak_2 01511 chr1 [3115427, 3115503] | >> MACS_peak_2 - >> MACS_peak_3 01511 chr1 [3145725, 3145836] | >> MACS_peak_3 - >> MACS_peak_4 01511 chr1 [3332349, 3332698] | >> MACS_peak_4 - >> MACS_peak_5 01513 chr1 [3566535, 3566719] | >> MACS_peak_5 - >> MACS_peak_6 01512 chr1 [3726359, 3726485] | >> MACS_peak_6 - >> MACS_peak_7 01512 chr1 [3795201, 3795271] | >> MACS_peak_7 - >> MACS_peak_8 01515 chr1 [4204115, 4204357] | >> MACS_peak_8 - >> MACS_peak_9 01515 chr1 [4283090, 4283190] | >> MACS_peak_9 - >> MACS_peak_10 01514 chr1 [4346000, 4346141] | >> MACS_peak_10 - >> MACS_peak_4 01512 chr1 [3332349, 3332698] | >> MACS_peak_4 - >> MACS_peak_5 01512 chr1 [3566535, 3566719] | >> MACS_peak_5 - >> MACS_peak_9 01514 chr1 [4283090, 4283190] | >> MACS_peak_9 - >> feature start_position end_position >> insideFeature >> <character> <numeric> <numeric> <character> >> MACS_peak_1 01511 *01511* 3195985 3205713 downstream >> MACS_peak_10 01515 *01515* 4333588 4350395 inside >> MACS_peak_2 01511 *01511* 3195985 3205713 downstream >> MACS_peak_3 01511 *01511* 3195985 3205713 downstream >> MACS_peak_4 01511 *01511* 3195985 3205713 upstream >> MACS_peak_5 01513 *01513* 3638392 3648985 downstream >> MACS_peak_6 01512 *01512* 3204563 3661579 upstream >> MACS_peak_7 01512 *01512* 3204563 3661579 upstream >> MACS_peak_8 01515 *01515* 4333588 4350395 downstream >> MACS_peak_9 01515 *01515* 4333588 4350395 downstream >> MACS_peak_10 01514 *01514* 4280927 4399322 inside >> MACS_peak_4 01512 *01512* 3204563 3661579 inside >> MACS_peak_5 01512 *01512* 3204563 3661579 inside >> MACS_peak_9 01514 *01514* 4280927 4399322 inside >> distancetoFeature shortestDistance >> fromOverlappingOrNearest >> <numeric> <numeric> <character> >> MACS_peak_1 01511 181299 171488 >> NearestStart >> MACS_peak_10 01515 4395 4254 >> NearestStart >> MACS_peak_2 01511 90286 80482 >> NearestStart >> MACS_peak_3 01511 59988 50149 >> NearestStart >> MACS_peak_4 01511 -126636 126636 >> NearestStart >> MACS_peak_5 01513 82450 71673 >> NearestStart >> MACS_peak_6 01512 -64780 64780 >> NearestStart >> MACS_peak_7 01512 -133622 133622 >> NearestStart >> MACS_peak_8 01515 146280 129231 >> NearestStart >> MACS_peak_9 01515 67305 50398 >> NearestStart >> MACS_peak_10 01514 53322 >> 53181 Overlapping >> MACS_peak_4 01512 329230 >> 127786 Overlapping >> MACS_peak_5 01512 95044 >> 94860 Overlapping >> MACS_peak_9 01514 116232 >> 2163 Overlapping >> >> So, no(t yet) the TranscriptID as I was planning (while it appears >> regularly ENSMUSG00000064842 or others when I use the TSS you >> provided). Is some step still missing? I also tried playing around >> with the vals and columns parameters in the transcripts step, but it >> didn't work. >> >> Best regards, >> >> Federico >> >> >> On 29/03/12 18:52, Ou, Jianhong wrote: >>> Hi Federico, >>> >>> Could you please have a try >>> RDucscmm9$strand<-as(RDucscmm9$strand,"factor") >>> before using annotatePeakInBatch, and tell me how about the results. >>> >>> Yours sincerely, >>> >>> Jianhong Ou >>> >>> jianhong.ou@umassmed.edu <mailto:jianhong.ou@umassmed.edu> >>> >>> >>> On Mar 29, 2012, at 12:42 PM, Zhu, Lihua (Julie) wrote: >>> >>>> Dear Federico, >>>> >>>> The following code works. So RDucscmm9<- as(ucscmm9, "RangedData") >>>> should work as well. It complains about ranges being NA. >>>> >>>> Jianhong will help you to debug and get back to you. Thanks! >>>> >>>> gr <- >>>> GRanges(seqnames = >>>> Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), >>>> ranges = >>>> IRanges(1:10, width = 10:1, names = head(letters,10)), >>>> strand = >>>> Rle(strand(c("-", "+", "*", "+", "-")), >>>> c(1, 2, 2, 3, 2)), >>>> score = 1:10, >>>> GC = seq(1, 0, length=10)) >>>> >>>> as(gr, “RangedData”) >>>> >>>> Best regards, >>>> >>>> Julie >>>> >>>> On 3/29/12 11:27 AM, "Federico Marini" <f.marini@imb-mainz.de>>>> <x-msg: 458="" f.marini@imb-mainz.de="">> wrote: >>>> >>>>> Dear Julie, >>>>> >>>>> Thanks first of all for the positive and prompt answer. >>>>> >>>>> I just have one additional question: when retrieving the >>>>> TranscriptDb object, these are the steps I took. >>>>> >>>>> >>>>>> source("http://bioconductor.org/biocLite.R" >>>>>> <http: bioconductor.org="" bioclite.r=""> ) >>>>>> biocLite("GenomicFeatures") >>>>>> >>>>>> library(GenomicFeatures) >>>>>> supportedUCSCtables() >>>>>> mm9KG <-makeTranscriptDbFromUCSC(genome = "mm9",tablename = >>>>>> "knownGene") >>>>>> >>>>>> ucscmm9<- transcripts(mm9KG) >>>>>> annotPeaksUCSC<-annotatePeakInBatch(peaksRDt[1:10,],AnnotationD ata=ucscmm9,output="both",multiple=FALSE,maxgap=0) >>>>>> >>>>> >>>>> This returns an error since annotatePeakInBatch requires a >>>>> RangedData object for the annotation, but my "ucscmm9" is a >>>>> GRanges object; still, >>>>> http://wiki.fhcrc.org/bioc/Transcript_Annotation_API_Spec >>>>> specified that I would have a RangedData object as the output for >>>>> the transcripts function - but apparently it didn't work like that. >>>>> >>>>> I thought one way to make it work is to us >>>>> >>>>>> RDucscmm9<- as (ucscmm9, "RangedData") >>>>>> >>>>> but I get this as an output >>>>> >>>>>> Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = >>>>>> "IRanges") : >>>>>> solving row 1: range cannot be determined from the supplied >>>>>> arguments (too many NAs) >>>>>> >>>>> I tried googling around for this, but sadly no solution was found >>>>> (yet). Is there a tested and smart way to do this - and get >>>>> something which is under all aspects comparable to the >>>>> TSS.mouse.NCBIM37 annotation? >>>>> >>>>> Thanks again for the attention. >>>>> >>>>> Best regards, >>>>> >>>>> Federico >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 29/03/12 16:43, Zhu, Lihua (Julie) wrote: >>>>>> Re: ChIPpeakAnno - Request of information Dear Federico, >>>>>> >>>>>> Yes. You can ChIPpeakAnno to annotate your peaks with refseq >>>>>> genes/transcripts. You could also use any custom annotation file >>>>>> you have. >>>>>> >>>>>> One option for obtaining the refseq genes/transcripts would be >>>>>> using GenomicFeatures package. I am ccing Bioconductor listserv >>>>>> for others to chime in. >>>>>> >>>>>> library(GenomicFeatures) >>>>>> supportedUCSCtables() >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Julie >>>>>> >>>>>> >>>>>> On 3/29/12 3:58 AM, "Federico Marini" <f.marini@imb-mainz.de>>>>>> <x-msg: 458="" f.marini@imb-mainz.de="">> wrote: >>>>>> >>>>>> >>>>>>> Dear Prof. Zhu, >>>>>>> >>>>>>> My name is Federico Marini, and I started recently my activity >>>>>>> as PhD student at the Institute of Molecular Biology in Mainz, >>>>>>> Germany. >>>>>>> >>>>>>> I contact you to request additional information regarding one >>>>>>> of your published works, namely "ChIPpeakAnno: a Bioconductor >>>>>>> package to annotate ChIP-seq and ChIP-chip data", which I had >>>>>>> the pleasure to read in detail. I would like to get perform the >>>>>>> annotation referring to the refseq genes/transcripts, instead of >>>>>>> the ensembl ones. Could it be possible to obtain a similar >>>>>>> object for this, and if so, what could be the best/smartest way >>>>>>> to do it? >>>>>>> >>>>>>> Since it could be a very precious resource for the needs of >>>>>>> our lab, this is why I chose to contact you directly. >>>>>>> >>>>>>> Thanks in advance for your time and attention. >>>>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> Federico Marini >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>> >> >> -- >> *Federico Marini* >> Signaling to Chromatin Laboratory >> Institute of Molecular Biology gGmbH (IMB) >> http://www.imb-mainz.de <http: www.imb-mainz.de=""/> >> Tel: +49 (0) 6131 39 21462 > -- *Federico Marini* Signaling to Chromatin Laboratory Institute of Molecular Biology gGmbH (IMB) http://www.imb-mainz.de Tel: +49 (0) 6131 39 21462 [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6