ChipPeakAnno / GRanges isolate peaks that do NOT overlap
1
0
Entering edit mode
José LÓPEZ ▴ 110
@jose-lopez-5444
Last seen 4.9 years ago
Dear Julie, I am using findOverlappingPeaks and makeVennDiagram to compare 2 set of peaks (A and B). I get overlapping peaks of A with B ($Peaks1withOverlaps) (9992 see below). My question is to how can I get those peaks specific to A (13268) and those specific to B (2640)? I also tried with subsetByOverlaps from (GenomicRanges package) and get the intersect between A and B (9992 as with ChiPpeakAnno functions). Then, to identify peaks that do not overlap tried to isolate those A regions that were not present in the overlap with B, but get an error (see below)... > noOlap = peaks_AcH3_Ranged[!peaks_A_Ranged %in% peaks_B_Ranged] Error in peaks_AcH3_Ranged[!peaks_A_Ranged %in% peaks_B_Ranged] : selecting spaces: invalid subscript type Thank you in advance for your suggestions, Jose R version 2.15.0 (2012-03-30) Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [R.app GUI 1.51 (6148) x86_64-apple-darwin9.8.0] [History restored from /Users/joselopez/.Rapp.history] #load ChiPpeakAnno > library (ChIPpeakAnno) Loading required package: gplots Loading required package: gtools Loading required package: gdata gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED. gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLSX' (Excel 2007+) files. gdata: Run the function 'installXLSXsupport()' gdata: to automatically download and install the perl gdata: libaries needed to support Excel XLS and XLSX formats. Attaching package: ‘gdata’ The following object(s) are masked from ‘package:stats’: nobs The following object(s) are masked from ‘package:utils’: object.size Loading required package: caTools Loading required package: bitops Loading required package: grid Loading required package: KernSmooth KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009 Loading required package: MASS Attaching package: ‘gplots’ The following object(s) are masked from ‘package:stats’: lowess Loading required package: BiocGenerics Attaching package: ‘BiocGenerics’ The following object(s) are masked from ‘package:gdata’: combine The following object(s) are masked from ‘package:stats’: xtabs The following object(s) are masked from ‘package:base’: anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map, mapply, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, rownames, sapply, setdiff, table, tapply, union, unique Loading required package: biomaRt Loading required package: multtest Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Attaching package: ‘multtest’ The following object(s) are masked from ‘package:gplots’: wapply Loading required package: IRanges Attaching package: ‘IRanges’ The following object(s) are masked from ‘package:gplots’: space The following object(s) are masked from ‘package:caTools’: runmean The following object(s) are masked from ‘package:gdata’: trim Loading required package: Biostrings Loading required package: BSgenome Loading required package: GenomicRanges Loading required package: BSgenome.Ecoli.NCBI.20080805 Loading required package: GO.db Loading required package: AnnotationDbi Attaching package: ‘AnnotationDbi’ The following object(s) are masked from ‘package:MASS’: select Loading required package: DBI Loading required package: org.Hs.eg.db Loading required package: limma Warning messages: 1: package ‘gtools’ was built under R version 2.15.1 2: package ‘gdata’ was built under R version 2.15.1 3: package ‘KernSmooth’ was built under R version 2.15.1 4: package ‘MASS’ was built under R version 2.15.1 5: package ‘IRanges’ was built under R version 2.15.1 6: package ‘GenomicRanges’ was built under R version 2.15.1 7: replacing previous import ‘space’ when loading ‘IRanges’ #Generate GRanges objects > peaks_A = read.table("data_A.txt") > dim(peaks_A) [1] 23260 16 > peaks_A_Ranged = BED2RangedData(peaks_A,header=F) > peaks_A_Ranged RangedData with 23260 rows and 2 value columns across 21 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 00001 1 [3658000, 3663999] | 1 1 00002 1 [4481600, 4483799] | 1 1 00003 1 [4486200, 4488199] | 1 1 00004 1 [4561200, 4562399] | 1 1 00005 1 [4774000, 4776599] | 1 1 00006 1 [4796800, 4799399] | 1 1 00007 1 [4846400, 4849799] | 1 1 00008 1 [5007000, 5013799] | 1 1 00009 1 [5072800, 5075999] | 1 1 ... ... ... ... ... ... 23252 X [165110000, 165112999] | 1 1 23253 X [165474200, 165475399] | 1 1 23254 X [165755800, 165759199] | 1 1 23255 X [166317200, 166320199] | 1 1 23256 X [166416000, 166421999] | 1 1 23257 Y [ 234000, 234799] | 1 1 23258 Y [ 346400, 347799] | 1 1 23259 Y [ 581200, 582799] | 1 1 23260 Y [ 621400, 623599] | 1 1 > peaks_B = read.table("data_B.txt") > dim(peaks_B) [1] 12632 16 > peaks_B_Ranged = BED2RangedData(peaks_B,header=F) > peaks_B_Ranged RangedData with 12632 rows and 2 value columns across 20 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 00001 1 [3660400, 3662799] | 1 1 00002 1 [4775200, 4777199] | 1 1 00003 1 [4797000, 4798399] | 1 1 00004 1 [6204200, 6205799] | 1 1 00005 1 [7076400, 7080599] | 1 1 00006 1 [9288400, 9290999] | 1 1 00007 1 [9538200, 9541999] | 1 1 00008 1 [9764400, 9766399] | 1 1 00009 1 [9998600, 9999999] | 1 1 ... ... ... ... ... ... 12624 X [149932200, 149934799] | 1 1 12625 X [151695800, 151697199] | 1 1 12626 X [152057200, 152059599] | 1 1 12627 X [154255600, 154257999] | 1 1 12628 X [159173200, 159175599] | 1 1 12629 X [160707200, 160709599] | 1 1 12630 X [162676400, 162677599] | 1 1 12631 X [166356400, 166384999] | 1 1 12632 X [166416000, 166421999] | 1 1 #find overlaps A,B. > t1 = findOverlappingPeaks (peaks_A_Ranged, peaks_B_Ranged, maxgap = 0, NameOfPeaks1 = "A", NameOfPeaks2 = "B", select = "all") There were 20 warnings (use warnings() to see them) > overlappingPeaks = as.data.frame(t1$Peaks1withOverlaps) > head(overlappingPeaks) space start end width names strand 1 1 3658000 3663999 6000 00001 + 2 1 4774000 4776599 2600 00005 + 3 1 4796800 4799399 2600 00006 + 4 1 6202800 6207199 4400 00010 + 5 1 7076200 7081599 5400 00014 + 6 1 9285400 9290999 5600 00017 + > dim(overlappingPeaks) [1] 9992 6 > write.table(as.data.frame(t1$Peaks1withOverlaps), file="overlappingPeaks.txt", sep="\t", row.names=F) #perform venn diagram > makeVennDiagram(RangedDataList(peaks_A_Ranged, peaks_B_Ranged), NameOfPeaks=c("A","B"), maxgap = 0, totalTest = 3E4, cex =1, counts.col='red', useFeature=F) $p.value [1] 1.455914e-08 $vennCounts A B Counts [1,] 0 0 4100 [2,] 0 1 2640 [3,] 1 0 13268 [4,] 1 1 9992 attr(,"class") [1] "VennCounts" Warning message: In findOverlappingPeaks(Peaks[[1]], Peaks[[2]], NameOfPeaks1 = NameOfPeaks[1], : Please use select instead of multiple! > dev.copy2eps(file='VennDiagram.eps') > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] ChIPpeakAnno_2.4.0 limma_3.12.1 org.Hs.eg.db_2.7.1 [4] GO.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5 [7] AnnotationDbi_1.18.1 BSgenome.Ecoli.NCBI. 20080805_1.3.17 BSgenome_1.24.0 [10] GenomicRanges_1.8.9 Biostrings_2.24.1 IRanges_1.14.4 [13] multtest_2.12.0 Biobase_2.16.0 biomaRt_2.12.0 [16] BiocGenerics_0.2.0 gplots_2.11.0 MASS_7.3-19 [19] KernSmooth_2.23-8 caTools_1.13 bitops_1.0-4.1 [22] gdata_2.11.0 gtools_2.7.0 loaded via a namespace (and not attached): [1] RCurl_1.91-1 splines_2.15.0 stats4_2.15.0 survival_2.36-14 XML_3.9-4 [[alternative HTML version deleted]]
GO GUI BSgenome BSgenome GO GUI BSgenome BSgenome • 2.0k views
ADD COMMENT
0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 5 months ago
United States
Jose, Here is an example to get the non-overlapping peaks. peaks1 = RangedData(IRanges(start=c(1543200,1557200,1563000,1569800,167889600), end=c(1555199,1560599,1565199,1573799,167893599),names=c("p1","p2","p3 ","p4","p5")), strand=as.integer(1),space=c(6,6,6,6,5)) peaks2 = RangedData(IRanges(start=c(1549800,1554400,1565000,1569400,167888600, 1000), end=c(1550599,1560799,1565399,1571199,167888999,1200),names=c("f1","f2 ","f3","f4","f5", "f6")), strand=as.integer(1),space=c(6,6,6,6,5,8)) t1 =findOverlappingPeaks(peaks1, peaks2, maxgap=1000, NameOfPeaks1="TF1", NameOfPeaks2="TF2", select="all") t1$Peaks1withOverlaps t1$Peaks2withOverlaps peaks1[!rownames(peaks1) %in% rownames(t1$Peaks1withOverlaps),] peaks2[!rownames(peaks2) %in% rownames(t1$Peaks2withOverlaps),] Best regards, Julie On 8/30/12 12:40 PM, "José LÓPEZ" <jose.lopez@umh.es> wrote: Dear Julie, I am using findOverlappingPeaks and makeVennDiagram to compare 2 set of peaks (A and B). I get overlapping peaks of A with B ($Peaks1withOverlaps) (9992 see below). My question is to how can I get those peaks specific to A (13268) and those specific to B (2640)? I also tried with subsetByOverlaps from (GenomicRanges package) and get the intersect between A and B (9992 as with ChiPpeakAnno functions). Then, to identify peaks that do not overlap tried to isolate those A regions that were not present in the overlap with B, but get an error (see below)... > noOlap = peaks_AcH3_Ranged[!peaks_A_Ranged %in% peaks_B_Ranged] Error in peaks_AcH3_Ranged[!peaks_A_Ranged %in% peaks_B_Ranged] : selecting spaces: invalid subscript type Thank you in advance for your suggestions, Jose R version 2.15.0 (2012-03-30) Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [R.app GUI 1.51 (6148) x86_64-apple-darwin9.8.0] [History restored from /Users/joselopez/.Rapp.history] #load ChiPpeakAnno > library (ChIPpeakAnno) Loading required package: gplots Loading required package: gtools Loading required package: gdata gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED. gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLSX' (Excel 2007+) files. gdata: Run the function 'installXLSXsupport()' gdata: to automatically download and install the perl gdata: libaries needed to support Excel XLS and XLSX formats. Attaching package: ‘gdata’ The following object(s) are masked from ‘package:stats’: nobs The following object(s) are masked from ‘package:utils’: object.size Loading required package: caTools Loading required package: bitops Loading required package: grid Loading required package: KernSmooth KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009 Loading required package: MASS Attaching package: ‘gplots’ The following object(s) are masked from ‘package:stats’: lowess Loading required package: BiocGenerics Attaching package: ‘BiocGenerics’ The following object(s) are masked from ‘package:gdata’: combine The following object(s) are masked from ‘package:stats’: xtabs The following object(s) are masked from ‘package:base’: anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map, mapply, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, rownames, sapply, setdiff, table, tapply, union, unique Loading required package: biomaRt Loading required package: multtest Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Attaching package: ‘multtest’ The following object(s) are masked from ‘package:gplots’: wapply Loading required package: IRanges Attaching package: ‘IRanges’ The following object(s) are masked from ‘package:gplots’: space The following object(s) are masked from ‘package:caTools’: runmean The following object(s) are masked from ‘package:gdata’: trim Loading required package: Biostrings Loading required package: BSgenome Loading required package: GenomicRanges Loading required package: BSgenome.Ecoli.NCBI.20080805 Loading required package: GO.db Loading required package: AnnotationDbi Attaching package: ‘AnnotationDbi’ The following object(s) are masked from ‘package:MASS’: select Loading required package: DBI Loading required package: org.Hs.eg.db Loading required package: limma Warning messages: 1: package ‘gtools’ was built under R version 2.15.1 2: package ‘gdata’ was built under R version 2.15.1 3: package ‘KernSmooth’ was built under R version 2.15.1 4: package ‘MASS’ was built under R version 2.15.1 5: package ‘IRanges’ was built under R version 2.15.1 6: package ‘GenomicRanges’ was built under R version 2.15.1 7: replacing previous import ‘space’ when loading ‘IRanges’ #Generate GRanges objects > peaks_A = read.table("data_A.txt") > dim(peaks_A) [1] 23260 16 > peaks_A_Ranged = BED2RangedData(peaks_A,header=F) > peaks_A_Ranged RangedData with 23260 rows and 2 value columns across 21 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 00001 1 [3658000, 3663999] | 1 1 00002 1 [4481600, 4483799] | 1 1 00003 1 [4486200, 4488199] | 1 1 00004 1 [4561200, 4562399] | 1 1 00005 1 [4774000, 4776599] | 1 1 00006 1 [4796800, 4799399] | 1 1 00007 1 [4846400, 4849799] | 1 1 00008 1 [5007000, 5013799] | 1 1 00009 1 [5072800, 5075999] | 1 1 ... ... ... ... ... ... 23252 X [165110000, 165112999] | 1 1 23253 X [165474200, 165475399] | 1 1 23254 X [165755800, 165759199] | 1 1 23255 X [166317200, 166320199] | 1 1 23256 X [166416000, 166421999] | 1 1 23257 Y [ 234000, 234799] | 1 1 23258 Y [ 346400, 347799] | 1 1 23259 Y [ 581200, 582799] | 1 1 23260 Y [ 621400, 623599] | 1 1 > peaks_B = read.table("data_B.txt") > dim(peaks_B) [1] 12632 16 > peaks_B_Ranged = BED2RangedData(peaks_B,header=F) > peaks_B_Ranged RangedData with 12632 rows and 2 value columns across 20 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 00001 1 [3660400, 3662799] | 1 1 00002 1 [4775200, 4777199] | 1 1 00003 1 [4797000, 4798399] | 1 1 00004 1 [6204200, 6205799] | 1 1 00005 1 [7076400, 7080599] | 1 1 00006 1 [9288400, 9290999] | 1 1 00007 1 [9538200, 9541999] | 1 1 00008 1 [9764400, 9766399] | 1 1 00009 1 [9998600, 9999999] | 1 1 ... ... ... ... ... ... 12624 X [149932200, 149934799] | 1 1 12625 X [151695800, 151697199] | 1 1 12626 X [152057200, 152059599] | 1 1 12627 X [154255600, 154257999] | 1 1 12628 X [159173200, 159175599] | 1 1 12629 X [160707200, 160709599] | 1 1 12630 X [162676400, 162677599] | 1 1 12631 X [166356400, 166384999] | 1 1 12632 X [166416000, 166421999] | 1 1 #find overlaps A,B. > t1 = findOverlappingPeaks (peaks_A_Ranged, peaks_B_Ranged, maxgap = 0, NameOfPeaks1 = "A", NameOfPeaks2 = "B", select = "all") There were 20 warnings (use warnings() to see them) > overlappingPeaks = as.data.frame(t1$Peaks1withOverlaps) > head(overlappingPeaks) space start end width names strand 1 1 3658000 3663999 6000 00001 + 2 1 4774000 4776599 2600 00005 + 3 1 4796800 4799399 2600 00006 + 4 1 6202800 6207199 4400 00010 + 5 1 7076200 7081599 5400 00014 + 6 1 9285400 9290999 5600 00017 + > dim(overlappingPeaks) [1] 9992 6 > write.table(as.data.frame(t1$Peaks1withOverlaps), file="overlappingPeaks.txt", sep="\t", row.names=F) #perform venn diagram > makeVennDiagram(RangedDataList(peaks_A_Ranged, peaks_B_Ranged), NameOfPeaks=c("A","B"), maxgap = 0, totalTest = 3E4, cex =1, counts.col='red', useFeature=F) $p.value [1] 1.455914e-08 $vennCounts A B Counts [1,] 0 0 4100 [2,] 0 1 2640 [3,] 1 0 13268 [4,] 1 1 9992 attr(,"class") [1] "VennCounts" Warning message: In findOverlappingPeaks(Peaks[[1]], Peaks[[2]], NameOfPeaks1 = NameOfPeaks[1], : Please use select instead of multiple! > dev.copy2eps(file='VennDiagram.eps') > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] ChIPpeakAnno_2.4.0 limma_3.12.1 org.Hs.eg.db_2.7.1 [4] GO.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5 [7] AnnotationDbi_1.18.1 BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.24.0 [10] GenomicRanges_1.8.9 Biostrings_2.24.1 IRanges_1.14.4 [13] multtest_2.12.0 Biobase_2.16.0 biomaRt_2.12.0 [16] BiocGenerics_0.2.0 gplots_2.11.0 MASS_7.3-19 [19] KernSmooth_2.23-8 caTools_1.13 bitops_1.0-4.1 [22] gdata_2.11.0 gtools_2.7.0 loaded via a namespace (and not attached): [1] RCurl_1.91-1 splines_2.15.0 stats4_2.15.0 survival_2.36-14 XML_3.9-4 [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 666 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6