About ChIPpeakAnno

0

Entering edit mode

Lucia Peixoto ▴ 330

@lucia-peixoto-4203

Last seen 9.7 years ago

Thanks Julie, this does the job a "shortestDistance" option will be very helpful in the future at least in my data (mouse) it happens a lot for very long genes with multiple splice forms that peaks get assigned to the wrong gene based just on the "canonical" TSS Lucia On Sat, Mar 22, 2014 at 12:12 PM, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu> wrote: > Lucia, > > Alternatively, you could combine the result from calling > annotatePeakInBatch > twice with two different FeatureLocForDistance options then filter the > result with shortesteDistance column. > > NearestTSS <- > annotatePeakInBatch(peakRange,AnnotationData=TSS.mouse.NCBIM37) > > NearestGeneEnd <- annotatePeakInBatch(peakRange, FeatureLocForDistance = > "geneEnd", AnnotationData=TSS.mouse.NCBIM37) > > NearestGene <- rbind(as.data.frame( NearestTSS) , as.data.frame( > NearestGeneEnd)) > > Then filter NearestGene to select the gene with smaller shortestDistance > for > each peak. > > If needed, we could wrap this up and add an option "shortestDistance" in > FeatureLocForDistance. Please let us know. Thanks! > > Best regards, > > Julie > > > On 3/20/14 6:26 PM, "Lihua Julie Zhu" <julie.zhu@umassmed.edu> wrote: > > > Lucia, > > > > Yes, by default the function returns the gene with the shortest distance > from > > peak start to TSS. You could try to set output = "both", maxgap = 5000, > > PeakLocForDistance = "middle" to have the function output all genes that > are > > within 5kb away from the middle of the peak. For detailed parameter > setting, > > please type help(annotatePeakInBatch) in a R session. > > > > Hope this helps. > > > > Best regards, > > > > Julie > > > > > > On 3/20/14 4:11 PM, "Lucia Peixoto" <luciap@iscb.org> wrote: > > > > Hi Julie, > > > > I have run ChIPpeakAnno without any size constraints to see what > happened. > > It seemed to be running fine, but when I went to look at my positive > controls > > I realized that it is not annotating all the intragenic peaks as "inside" > > > > For example, I have a peak in > > chr15 89378450 89379100 (mm9) and although it falls inside a gene it > assigns > > it as upstream the gene right downstream from it. > > > > Any idea what could be the problem? is it because I am using TSS as > annotation > > file and this peak is closer to the TSS os the next gene eventhough it is > > still intragenic? is there anyway to keep this from happening and > getting true > > intragenic calls? > > > > Here is my R code: > > > > > > myPeakList<-read.table ("DESonoseq_All.bed") > > peakRange= BED2RangedData(myPeakList) > > annotatedPeak = annotatePeakInBatch(peakRange, > > AnnotationData=TSS.mouse.NCBIM37) > > as.data.frame(annotatedPeak) > > > > thanks > > > > Lucia > > > > > > > > > > > > > > On Fri, Mar 7, 2014 at 2:56 PM, Zhu, Lihua (Julie) < > Julie.Zhu@umassmed.edu> > > wrote: > > Lucia, > > > > If you type help(annotatePeakInBatch), you will see that there is a > > parameter "output" with three options. By default, it is set to > nearestStart > > which will generate nearest features without any distance constraint. If > you > > set "output" to one of the other two options, then the distance cutoff > can > > be set by specifying "maxgap", e.g., 5000 as 5kb. Please let me know if > this > > answers your questions. > > > > Best regards, > > > > Julie > > > > > > On 3/7/14 2:18 PM, "Lucia Peixoto" <luciap@iscb.org> wrote: > > > >> Hi, > >> This is my first time using the package, so maybe this is a naive > question > >> What is the distance cutoff used to find "nearest feature (gene, exon, > >> miRNA,etc)" > >> or there isn't any and I can filter on it after the mapping? > >> thanks > > > > > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Lucia Peixoto PhD Postdoctoral Research Fellow Laboratory of Dr. Ted Abel Department of Biology School of Arts and Sciences University of Pennsylvania "Think boldly, don't be afraid of making mistakes, don't miss small details, keep your eyes open, and be modest in everything except your aims." Albert Szent-Gyorgyi [[alternative HTML version deleted]]

miRNA ChIPpeakAnno miRNA ChIPpeakAnno • 1.5k views

ADD COMMENT • link updated 10.1 years ago by Julie Zhu ★ 4.3k • written 10.1 years ago by Lucia Peixoto ▴ 330

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 6 months ago

United States

Lucia, Great! output="shortestDistance" has been added to annotatePeakInBatch in the new release. Please let me know if you encounter any problem. Many thanks for the feedback and the great suggestion! Best regards, Julie On 4/15/14 10:26 AM, "Lucia Peixoto" <luciap@iscb.org> wrote: Thanks Julie, this does the job a "shortestDistance" option will be very helpful in the future at least in my data (mouse) it happens a lot for very long genes with multiple splice forms that peaks get assigned to the wrong gene based just on the "canonical" TSS Lucia On Sat, Mar 22, 2014 at 12:12 PM, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu> wrote: Lucia, Alternatively, you could combine the result from calling annotatePeakInBatch twice with two different FeatureLocForDistance options then filter the result with shortesteDistance column. NearestTSS <- annotatePeakInBatch(peakRange,AnnotationData=TSS.mouse.NCBIM37) NearestGeneEnd <- annotatePeakInBatch(peakRange, FeatureLocForDistance = "geneEnd", AnnotationData=TSS.mouse.NCBIM37) NearestGene <- rbind(as.data.frame( NearestTSS) , as.data.frame( NearestGeneEnd)) Then filter NearestGene to select the gene with smaller shortestDistance for each peak. If needed, we could wrap this up and add an option "shortestDistance" in FeatureLocForDistance. Please let us know. Thanks! Best regards, Julie On 3/20/14 6:26 PM, "Lihua Julie Zhu" <julie.zhu@umassmed.edu> wrote: > Lucia, > > Yes, by default the function returns the gene with the shortest distance from > peak start to TSS. You could try to set output = "both", maxgap = 5000, > PeakLocForDistance = "middle" to have the function output all genes that are > within 5kb away from the middle of the peak. For detailed parameter setting, > please type help(annotatePeakInBatch) in a R session. > > Hope this helps. > > Best regards, > > Julie > > > On 3/20/14 4:11 PM, "Lucia Peixoto" <luciap@iscb.org> wrote: > > Hi Julie, > > I have run ChIPpeakAnno without any size constraints to see what happened. > It seemed to be running fine, but when I went to look at my positive controls > I realized that it is not annotating all the intragenic peaks as "inside" > > For example, I have a peak in > chr15 89378450 89379100 (mm9) and although it falls inside a gene it assigns > it as upstream the gene right downstream from it. > > Any idea what could be the problem? is it because I am using TSS as annotation > file and this peak is closer to the TSS os the next gene eventhough it is > still intragenic? is there anyway to keep this from happening and getting true > intragenic calls? > > Here is my R code: > > > myPeakList<-read.table ("DESonoseq_All.bed") > peakRange= BED2RangedData(myPeakList) > annotatedPeak = annotatePeakInBatch(peakRange, > AnnotationData=TSS.mouse.NCBIM37) > as.data.frame(annotatedPeak) > > thanks > > Lucia > > > > > > > On Fri, Mar 7, 2014 at 2:56 PM, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu> > wrote: > Lucia, > > If you type help(annotatePeakInBatch), you will see that there is a > parameter "output" with three options. By default, it is set to nearestStart > which will generate nearest features without any distance constraint. If you > set "output" to one of the other two options, then the distance cutoff can > be set by specifying "maxgap", e.g., 5000 as 5kb. Please let me know if this > answers your questions. > > Best regards, > > Julie > > > On 3/7/14 2:18 PM, "Lucia Peixoto" <luciap@iscb.org> wrote: > >> Hi, >> This is my first time using the package, so maybe this is a naive question >> What is the distance cutoff used to find "nearest feature (gene, exon, >> miRNA,etc)" >> or there isn't any and I can filter on it after the mapping? >> thanks > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD COMMENT • link 10.1 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 6 months ago

United States

Lucia, "includeFeature" means that the peak spans the entire feature (wider than the feature), and "inside" means that the peak resides inside the feature (narrow than the feature). overlapEnd means that peak overlaps with the end of the feature (gene end if feature is a gene). I hope that you do not mind my ccing Bioc to benefit others with similar questions. Thanks! Best regards, Julie On 4/15/14 11:34 AM, "Lucia Peixoto" <luciap@iscb.org> wrote: Hi Julie, I have been meaning to ask you this for a long time what is the difference between "includeFeature" and "inside" ? When annotating peaks relative to TSS does "overlapEnd" mean the end of the TSS or the end of the gene? thanks Lucia On Tue, Apr 15, 2014 at 11:08 AM, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu> wrote: Lucia, Great! output="shortestDistance" has been added to annotatePeakInBatch in the new release. Please let me know if you encounter any problem. Many thanks for the feedback and the great suggestion! Best regards, Julie On 4/15/14 10:26 AM, "Lucia Peixoto" <luciap@iscb.org <http:="" luciap@iscb.org=""> > wrote: Thanks Julie, this does the job a "shortestDistance" option will be very helpful in the future at least in my data (mouse) it happens a lot for very long genes with multiple splice forms that peaks get assigned to the wrong gene based just on the "canonical" TSS Lucia On Sat, Mar 22, 2014 at 12:12 PM, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu <http:="" julie.zhu@umassmed.edu=""> > wrote: Lucia, Alternatively, you could combine the result from calling annotatePeakInBatch twice with two different FeatureLocForDistance options then filter the result with shortesteDistance column. NearestTSS <- annotatePeakInBatch(peakRange,AnnotationData=TSS.mouse.NCBIM37) NearestGeneEnd <- annotatePeakInBatch(peakRange, FeatureLocForDistance = "geneEnd", AnnotationData=TSS.mouse.NCBIM37) NearestGene <- rbind(as.data.frame( NearestTSS) , as.data.frame( NearestGeneEnd)) Then filter NearestGene to select the gene with smaller shortestDistance for each peak. If needed, we could wrap this up and add an option "shortestDistance" in FeatureLocForDistance. Please let us know. Thanks! Best regards, Julie On 3/20/14 6:26 PM, "Lihua Julie Zhu" <julie.zhu@umassmed.edu <http:="" julie.zhu@umassmed.edu=""> > wrote: > Lucia, > > Yes, by default the function returns the gene with the shortest distance from > peak start to TSS. You could try to set output = "both", maxgap = 5000, > PeakLocForDistance = "middle" to have the function output all genes that are > within 5kb away from the middle of the peak. For detailed parameter setting, > please type help(annotatePeakInBatch) in a R session. > > Hope this helps. > > Best regards, > > Julie > > > On 3/20/14 4:11 PM, "Lucia Peixoto" <luciap@iscb.org <http:="" luciap@iscb.org=""> > wrote: > > Hi Julie, > > I have run ChIPpeakAnno without any size constraints to see what happened. > It seemed to be running fine, but when I went to look at my positive controls > I realized that it is not annotating all the intragenic peaks as "inside" > > For example, I have a peak in > chr15 89378450 89379100 (mm9) and although it falls inside a gene it assigns > it as upstream the gene right downstream from it. > > Any idea what could be the problem? is it because I am using TSS as annotation > file and this peak is closer to the TSS os the next gene eventhough it is > still intragenic? is there anyway to keep this from happening and getting true > intragenic calls? > > Here is my R code: > > > myPeakList<-read.table ("DESonoseq_All.bed") > peakRange= BED2RangedData(myPeakList) > annotatedPeak = annotatePeakInBatch(peakRange, > AnnotationData=TSS.mouse.NCBIM37) > as.data.frame(annotatedPeak) > > thanks > > Lucia > > > > > > > On Fri, Mar 7, 2014 at 2:56 PM, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu <http:="" julie.zhu@umassmed.edu=""> > > wrote: > Lucia, > > If you type help(annotatePeakInBatch), you will see that there is a > parameter "output" with three options. By default, it is set to nearestStart > which will generate nearest features without any distance constraint. If you > set "output" to one of the other two options, then the distance cutoff can > be set by specifying "maxgap", e.g., 5000 as 5kb. Please let me know if this > answers your questions. > > Best regards, > > Julie > > > On 3/7/14 2:18 PM, "Lucia Peixoto" <luciap@iscb.org <http:="" luciap@iscb.org=""> > wrote: > >> Hi, >> This is my first time using the package, so maybe this is a naive question >> What is the distance cutoff used to find "nearest feature (gene, exon, >> miRNA,etc)" >> or there isn't any and I can filter on it after the mapping? >> thanks > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <http: bioconductor@r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD COMMENT • link 10.1 years ago Julie Zhu ★ 4.3k

Login before adding your answer.