Question

ChIPseeker: Error in annotatePeak

0

Entering edit mode

@antonio-miguel-de-jesus-domingues-5182

Last seen 11 weeks ago

Germany

Hi all, I was trying to ChIPseeker for the first time to annotate some peaks with the genomic features, but got the following error: peakAnno <- annotatePeak(peaks_gr, tssRegion=c(-300, 0), as="GRanges", TranscriptDb=txdb_hg19, annoDb="org.Hs.eg.db") Error in IRanges:::normalizeSingleBracketSubscript(i, x) : subscript contains NAs or out of bounds indices The peaks originated from Sicer which I converted to GRanges and then add the seqlengths independently. A tad convoluted but got the job done (or so I thought). Bellow the full code. I searched for the error but could not find out what is going wrong. Help is appreciated. Alternatively, would any one point me to another way to annotate a list of peaks with genomic features (utr's, exons, introns, extragenic, ..)? Cheers, António ###################### ## Peaks ###################### sicer2RangedData <-function(path.to.file) { # load the file bed = read.table(path.to.file, header=F, sep="\t", stringsAsFactors=FALSE)[,c(1:3)] # generate GRanges object myrange <- BED2RangedData(bed) return(myrange) } peaks_file="Tox-W50-G250-islands-summary-FDR0.01_50tags_2fold.bed" peaks <- sicer2RangedData(peaks_file) ######################### ## ChIPseeker likes a proper GRanges obj ######################### ## chr size: hg19_exons <- exons(txdb_ens) peaks_gr <- as(peaks, "GRanges") idx <- c( # where are the proper chromosomes? grep("^\\d", names(seqlengths(hg19_exons))), grep("^X", names(seqlengths(hg19_exons))), grep("^Y", names(seqlengths(hg19_exons))) ) chrlen <- seqlengths(hg19_exons)[idx] # here they are chrlen_sorted <- chrlen[sort(names(seqlengths(hg19_exons)[idx]))] seqlengths(peaks_gr) <- chrlen_sorted ###################### ## Annotate ###################### peakAnno <- annotatePeak(peaks_gr, tssRegion=c(-300, 0), as="GRanges", TranscriptDb=txdb_hg19, annoDb="org.Hs.eg.db") ########### sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel grid stats graphics grDevices utils datasets [8] methods base other attached packages: [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0 [2] ChIPseeker_1.0.2 [3] org.Hs.eg.db_2.14.0 [4] GenomicFeatures_1.16.0 [5] GenomicRanges_1.16.2 [6] ChIPpeakAnno_2.12.1 [7] AnnotationDbi_1.26.0 [8] GenomeInfoDb_1.0.2 [9] Biobase_2.24.0 [10] RSQLite_0.11.4 [11] DBI_0.2-7 [12] Biostrings_2.32.0 [13] XVector_0.4.0 [14] IRanges_1.21.43 [15] BiocGenerics_0.10.0 [16] VennDiagram_1.6.5 [17] biomaRt_2.20.0 [18] RColorBrewer_1.0-5 [19] plyr_1.8.1 [20] ggplot2_0.9.3.1 loaded via a namespace (and not attached): [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.0 [4] bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0 [7] caTools_1.17 codetools_0.2-8 colorspace_1.2-4 [10] digest_0.6.4 fail_1.2 foreach_1.4.2 [13] gdata_2.13.3 GenomicAlignments_1.0.0 GO.db_2.14.0 [16] gplots_2.13.0 gtable_0.1.2 gtools_3.4.0 [19] iterators_1.0.7 KernSmooth_2.23-12 lattice_0.20-29 [22] limma_3.20.1 MASS_7.3-33 Matrix_1.1-3 [25] multtest_2.20.0 munsell_0.4.2 proto_0.3-10 [28] Rcpp_0.11.1 RCurl_1.95-4.1 reshape2_1.4 [31] Rsamtools_1.16.0 rtracklayer_1.23.22 scales_0.2.4 [34] sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0 [37] stringr_0.6.2 survival_2.37-7 tools_3.1.0 [40] XML_3.98-1.1 zlibbioc_1.10.0 -- António Miguel de Jesus Domingues, PhD Postdoctoral researcher Deep Sequencing Group - SFB655 Biotechnology Center (Biotec) Technische Universität Dresden Fetscherstraße 105 01307 Dresden Phone: +49 (351) 458 82362 Email: antonio.domingues(at)biotec.tu-dresden.de -- The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]]

Sequencing GO annotate ChIPseeker Sequencing GO annotate ChIPseeker • 3.1k views

ADD COMMENT • link 10.0 years ago António Miguel de Jesus Domingues ▴ 490

score 0 · Answer 1 · 2014-05-09

Hi all, so with the help of Guangchuang (ChIPseeker's developer) the problem was indirectly solved by reading in the file directly in the function annotatePeak. I also works with the function readPeakFile btw. The issue itself, is due to the different chomosome notations when reading with my function or directly - one makes ensembl like notations, whereas ChIPseeker creates UCSC type. This seems to conflict internally in the annotatePeak. For future reference, see bellow for the full discussion and debbuging between myself and Guangchuang. AntÃ³nio -------- Original Message -------- Subject: Re: ChIPseeker: Error in annotatePeak Date: Fri, 9 May 2014 09:58:18 +0800 From: Yu, Guangchuang <gcyu@connect.hku.hk> To: AntÃ³nio domingues <amjdomingues@gmail.com> Dear AntÃ³nio, Great! Problem solved. I will add a testing condition on seqname, and give user useful information when this case occurred. Thanks for your feedback! Best Regards, Guangchuang On Fri, May 9, 2014 at 12:51 AM, AntÃ³nio domingues <amjdomingues@gmail.com <mailto:amjdomingues@gmail.com="">> wrote: Dear Guangchuang, I did: peaks_cs <- readPeakFile(peaks_file) compare(peaks_gr, peaks_cs) and the main difference is: Warning message: In .Seqinfo.mergexy(x, y) : Each of the 2 combined objects has sequence levels not in the other: - in 'x': 1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21, 22, 3, 4, 5, 6, 7, 8, 9, X, Y - in 'y': chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr2, chr20, chr21, chr22, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrX, chrY Make sure to always combine/compare objects based on the same reference so it seems that the object created with readPeakFile have "chr" but not the one I created. I would assume that since my transcriptbd comes from ensembl, via biomart, this would not be a problem. AntÃ³nio -- AntÃ³nio Miguel de Jesus Domingues, PhD Postdoctoral researcher Deep Sequencing Group - SFB655 Biotechnology Center (Biotec) Technische UniversitÃ¤t Dresden FetscherstraÃe 105 01307 Dresden Phone:+49 (351) 458 82362 <tel:%2b49%20%28351%29%20458%2082362> Email: antonio.domingues(at)biotec.tu-dresden.de <http: biotec="" .tu-dresden.de=""> -- The Unbearable Lightness of Molecular Biology On 05/08/2014 06:32 PM, Yu, Guangchuang wrote: > Dear AntÃ³nio, > > Can you compare the GRanges object you created and the one from > readPeakFile provided by ChIPseeker? > > > Bests, > Guangchuang On Fri, May 9, 2014 at 12:30 AM, AntÃ³nio domingues <amjdomingues@gmail.com <mailto:amjdomingues@gmail.com="">> wrote: Thanks for looking into this. Here it goes the result of traceback: > peakAnno <- annotatePeak(peaks_gr, + tssRegion=c(-3000, 0), + as="GRanges", + TranscriptDb=txdb_hg19, + annoDb="org.Hs.eg.db" + ) >> preparing features information... 2014-05-08 06:28:40 PM >> identifying nearest features... 2014-05-08 06:28:40 PM Error in IRanges:::normalizeSingleBracketSubscript(i, x) : subscript contains NAs or out of bounds indices > traceback() 8: stop("subscript contains NAs or out of bounds indices") 7: IRanges:::normalizeSingleBracketSubscript(i, x) 6: IRanges:::extractROWS(x, i) 5: IRanges:::extractROWS(x, i) 4: features[ps.idx] 3: features[ps.idx] 2: getNearestFeatureIndicesAndDistancespeak.gr <http: peak.gr="">, features) 1: annotatePeak(peaks_gr, tssRegion = c(-3000, 0), as = "GRanges", TranscriptDb = txdb_hg19, annoDb = "org.Hs.eg.db") AntÃ³nio -- AntÃ³nio Miguel de Jesus Domingues, PhD Postdoctoral researcher Deep Sequencing Group - SFB655 Biotechnology Center (Biotec) Technische UniversitÃ¤t Dresden FetscherstraÃe 105 01307 Dresden Phone:+49 (351) 458 82362 <tel:%2b49%20%28351%29%20458%2082362> Email: antonio.domingues(at)biotec.tu-dresden.de <http: biotec.tu-="" dresden.de=""> -- The Unbearable Lightness of Molecular Biology On 05/08/2014 06:26 PM, Yu, Guangchuang wrote: when the error was occurred, run the traceback() function, and show me the output. On Fri, May 9, 2014 at 12:22 AM, AntÃ³nio domingues <amjdomingues@gmail.com <mailto:amjdomingues@gmail.com="">> wrote: Hi Guangchuang, (skip to the end if want only the conclusion to the story) I might be missing something but the error tells me that there might me NAs in my GRanges (which there aren't): summary(as.data.frame(peaks_gr)) seqnames start end width 1 :1063 Min. : 50200 Min. : 52449 Min. : 200 6 :1036 1st Qu.: 36269338 1st Qu.: 36270812 1st Qu.: 1100 2 : 992 Median : 70613075 Median : 70614399 Median : 1500 3 : 884 Mean : 80254593 Mean : 80256357 Mean : 1766 4 : 754 3rd Qu.:115672575 3rd Qu.:115674324 3rd Qu.: 2150 5 : 745 Max. :249167000 Max. :249169199 Max. :13300 (Other):7362 strand score +: 0 Min. :1 -: 0 1st Qu.:1 *:12836 Median :1 Mean :1 3rd Qu.:1 Max. :1 One other alternative would be some confusion with the strand, since by default the function BED2RangedData makes strand as "+" in the absence of information. Very unlikely, but still I tried it: strand(peaks_gr) <- "*" and... >> preparing features information... 2014-05-08 05:59:14 PM >> identifying nearest features... 2014-05-08 05:59:14 PM Error in IRanges:::normalizeSingleBracketSubscript(i, x) : subscript contains NAs or out of bounds indices so at this point I was none the wiser. Then I read the file directly from the function: peakAnno <- annotatePeak("Tox-W50-G250-islands-summary-FDR0.01_50tags_2fold.bed", tssRegion=c(-300, 0), # as="GRanges", TranscriptDb=txdb_hg19, annoDb="org.Hs.eg.db" ) it worked without a glitch! it still does not explain the issue when creating a GRanges outside annotatePeak. Best, AntÃ³nio -- AntÃ³nio Miguel de Jesus Domingues, PhD Postdoctoral researcher Deep Sequencing Group - SFB655 Biotechnology Center (Biotec) Technische UniversitÃ¤t Dresden FetscherstraÃe 105 01307 Dresden Phone:+49 (351) 458 82362 <tel:%2b49%20%28351%29%20458%2082362> Email: antonio.domingues(at)biotec.tu-dresden.de <http: biotec.tu-="" dresden.de=""> -- The Unbearable Lightness of Molecular Biology Hi, Error in IRanges:::normalizeSingleBracketSubscript(i, x) : subscript contains NAs or out of bounds indices This line already point out the issue. Please check your bed file. Bests, Guangchuang On Thu, May 8, 2014 at 11:08 PM, AntÃ³nio domingues <amjdomingues@gmail.com <mailto:amjdomingues@gmail.com="">> wrote: Hi all, I was trying to ChIPseeker for the first time to annotate some peaks with the genomic features, but got the following error: peakAnno <- annotatePeak(peaks_gr, tssRegion=c(-300, 0), as="GRanges", TranscriptDb=txdb_hg19, annoDb="org.Hs.eg.db") Error in IRanges:::normalizeSingleBracketSubscript(i, x) : subscript contains NAs or out of bounds indices The peaks originated from Sicer which I converted to GRanges and then add the seqlengths independently. A tad convoluted but got the job done (or so I thought). Bellow the full code. I searched for the error but could not find out what is going wrong. Help is appreciated. Alternatively, would any one point me to another way to annotate a list of peaks with genomic features (utr's, exons, introns, extragenic, ..)? Cheers, AntÃ³nio ###################### ## Peaks ###################### sicer2RangedData <-function(path.to.file) { # load the file bed = read.table(path.to.file, header=F, sep="\t", stringsAsFactors=FALSE)[,c(1:3)] # generate GRanges object myrange <- BED2RangedData(bed) return(myrange) } peaks_file="Tox-W50-G250-islands-summary-FDR0.01_50tags_2fold.bed" peaks <- sicer2RangedData(peaks_file) ######################### ## ChIPseeker likes a proper GRanges obj ######################### ## chr size: hg19_exons <- exons(txdb_ens) peaks_gr <- as(peaks, "GRanges") idx <- c( # where are the proper chromosomes? grep("^\\d", names(seqlengths(hg19_exons))), grep("^X", names(seqlengths(hg19_exons))), grep("^Y", names(seqlengths(hg19_exons))) ) chrlen <- seqlengths(hg19_exons)[idx] # here they are chrlen_sorted <- chrlen[sort(names(seqlengths(hg19_exons)[idx]))] seqlengths(peaks_gr) <- chrlen_sorted ###################### ## Annotate ###################### peakAnno <- annotatePeak(peaks_gr, tssRegion=c(-300, 0), as="GRanges", TranscriptDb=txdb_hg19, annoDb="org.Hs.eg.db") ########### sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel grid stats graphics grDevices utils datasets [8] methods base other attached packages: [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0 [2] ChIPseeker_1.0.2 [3] org.Hs.eg.db_2.14.0 [4] GenomicFeatures_1.16.0 [5] GenomicRanges_1.16.2 [6] ChIPpeakAnno_2.12.1 [7] AnnotationDbi_1.26.0 [8] GenomeInfoDb_1.0.2 [9] Biobase_2.24.0 [10] RSQLite_0.11.4 [11] DBI_0.2-7 [12] Biostrings_2.32.0 [13] XVector_0.4.0 [14] IRanges_1.21.43 [15] BiocGenerics_0.10.0 [16] VennDiagram_1.6.5 [17] biomaRt_2.20.0 [18] RColorBrewer_1.0-5 [19] plyr_1.8.1 [20] ggplot2_0.9.3.1 loaded via a namespace (and not attached): [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.0 [4] bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0 [7] caTools_1.17 codetools_0.2-8 colorspace_1.2-4 [10] digest_0.6.4 fail_1.2 foreach_1.4.2 [13] gdata_2.13.3 GenomicAlignments_1.0.0 GO.db_2.14.0 [16] gplots_2.13.0 gtable_0.1.2 gtools_3.4.0 [19] iterators_1.0.7 KernSmooth_2.23-12 lattice_0.20-29 [22] limma_3.20.1 MASS_7.3-33 Matrix_1.1-3 [25] multtest_2.20.0 munsell_0.4.2 proto_0.3-10 [28] Rcpp_0.11.1 RCurl_1.95-4.1 reshape2_1.4 [31] Rsamtools_1.16.0 rtracklayer_1.23.22 scales_0.2.4 [34] sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0 [37] stringr_0.6.2 survival_2.37-7 tools_3.1.0 [40] XML_3.98-1.1 zlibbioc_1.10.0 -- AntÃ³nio Miguel de Jesus Domingues, PhD Postdoctoral researcher Deep Sequencing Group - SFB655 Biotechnology Center (Biotec) Technische UniversitÃ¤t Dresden FetscherstraÃe 105 01307 Dresden Phone:+49 (351) 458 82362 <tel:%2b49%20%28351%29%20458%2082362> Email: antonio.domingues(at)biotec.tu-dresden.de <http: biotec="" .tu-dresden.de=""> -- The Unbearable Lightness of Molecular Biology -- --~--~---------~--~----~------------~-------~--~----~ Guangchuang Yu, PhD Candidate School of Biological Sciences Rm 7N-07, Kadoorie Biological Sciences Bldg The University of Hong Kong Hong Kong SAR, China www: http://ygc.name -~----------~----~----~----~------~----~------~--~--- [[alternative HTML version deleted]]