How to search a .narrowPeak file for a list of specific genes using GenomicRanges package
1
0
Entering edit mode
@fce3b503
Last seen 21 months ago
United States

I'm doing a ChipSeq analysis for the first time and have some basic questions. I successfully ran macs2 callpeak() and have a .narrowPeak file that I can load into IGV. I also have an .xls file with the names of specific genes we are interested in. I can load my .narrowPeak file into IGV, manually type in the gene name, and determine if my TF binds but the list is over 5000 genes long so doing this manually isn't an option. I've been told to look into the GenomicRanges package but would love some direction. I need output that lists each of the genes with a column of 0/1 to indicate if the gene bound somewhere in my .narrowPeak file.

Thanks in advance for any help, Stacy

sessionInfo( )
R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6.2

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] readxl_1.4.1                             openxlsx_4.2.5.1                         dplyr_1.0.10                            
 [4] fastqcr_0.1.2                            msigdbr_7.5.1                            clusterProfiler_4.6.0                   
 [7] ggupset_0.3.0                            UpSetR_1.4.0                             ChIPseeker_1.34.1                       
[10] rtracklayer_1.58.0                       org.Hs.eg.db_3.16.0                      TxDb.Hsapiens.UCSC.hg38.knownGene_3.16.0
[13] GenomicFeatures_1.50.3                   AnnotationDbi_1.60.0                     Rsamtools_2.14.0                        
[16] Biostrings_2.66.0                        XVector_0.38.0                           ChIPQC_1.34.0                           
[19] BiocParallel_1.32.5                      DiffBind_3.8.3                           SummarizedExperiment_1.28.0             
[22] Biobase_2.58.0                           MatrixGenerics_1.10.0                    matrixStats_0.63.0                      
[25] GenomicRanges_1.50.2                     GenomeInfoDb_1.34.4                      IRanges_2.32.0                          
[28] S4Vectors_0.36.1                         BiocGenerics_0.44.0                      ggplot2_3.4.0                           

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                                tidyselect_1.2.0                          RSQLite_2.2.20                           
  [4] htmlwidgets_1.6.0                         grid_4.2.2                                scatterpie_0.1.8                         
  [7] munsell_0.5.0                             codetools_0.2-18                          interp_1.1-3                             
 [10] systemPipeR_2.4.0                         withr_2.5.0                               colorspace_2.0-3                         
 [13] GOSemSim_2.24.0                           filelock_1.0.2                            knitr_1.41                               
 [16] DOSE_3.24.2                               labeling_0.4.2                            bbmle_1.0.25                             
 [19] GenomeInfoDbData_1.2.9                    mixsqp_0.3-48                             hwriter_1.3.2.1                          
 [22] polyclip_1.10-4                           bit64_4.0.5                               farver_2.1.1                             
 [25] downloader_0.4                            coda_0.19-4                               vctrs_0.5.1                              
 [28] treeio_1.22.0                             TxDb.Rnorvegicus.UCSC.rn4.ensGene_3.2.2   generics_0.1.3                           
 [31] xfun_0.36                                 gson_0.0.9                                BiocFileCache_2.6.0                      
 [34] R6_2.5.1                                  apeglm_1.20.0                             graphlayouts_0.8.4                       
 [37] invgamma_1.1                              locfit_1.5-9.6                            bitops_1.0-7                             
 [40] cachem_1.0.6                              fgsea_1.24.0                              gridGraphics_0.5-1                       
 [43] DelayedArray_0.24.0                       assertthat_0.2.1                          vroom_1.6.0                              
 [46] BiocIO_1.8.0                              scales_1.2.1                              ggraph_2.1.0                             
 [49] enrichplot_1.18.3                         gtable_0.3.1                              tidygraph_1.2.2                          
 [52] rlang_1.0.6                               splines_4.2.2                             lazyeval_0.2.2                           
 [55] selectr_0.4-2                             yaml_2.3.6                                reshape2_1.4.4                           
 [58] TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2 qvalue_2.30.0                             tools_4.2.2                              
 [61] ggplotify_0.1.0                           ellipsis_0.3.2                            gplots_3.1.3                             
 [64] jquerylib_0.1.4                           RColorBrewer_1.1-3                        Rcpp_1.0.9                               
 [67] plyr_1.8.8                                progress_1.2.2                            zlibbioc_1.44.0                          
 [70] purrr_1.0.0                               RCurl_1.98-1.9                            prettyunits_1.1.1                        
 [73] deldir_1.0-6                              viridis_0.6.2                             ashr_2.2-54                              
 [76] cowplot_1.1.1                             chipseq_1.48.0                            ggrepel_0.9.2                            
 [79] magrittr_2.0.3                            data.table_1.14.6                         TxDb.Hsapiens.UCSC.hg18.knownGene_3.2.2  
 [82] truncnorm_1.0-8                           mvtnorm_1.1-3                             SQUAREM_2021.1                           
 [85] amap_0.8-19                               TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2   evaluate_0.19                            
 [88] hms_1.1.2                                 patchwork_1.1.2                           HDO.db_0.99.1                            
 [91] XML_3.99-0.13                             emdbook_1.3.12                            jpeg_0.1-10                              
 [94] gridExtra_2.3                             compiler_4.2.2                            biomaRt_2.54.0                           
 [97] bdsmatrix_1.3-6                           tibble_3.1.8                              KernSmooth_2.23-20                       
[100] crayon_1.5.2                              shadowtext_0.1.2                          htmltools_0.5.4                          
[103] tzdb_0.3.0                                ggfun_0.0.9                               tidyr_1.2.1                              
[106] aplot_0.1.9                               DBI_1.1.3                                 tweenr_2.0.2                             
[109] dbplyr_2.2.1                              MASS_7.3-58.1                             rappdirs_0.3.3                           
[112] boot_1.3-28.1                             babelgene_22.9                            readr_2.1.3                              
[115] ShortRead_1.56.1                          Matrix_1.5-3                              cli_3.5.0                                
[118] parallel_4.2.2                            igraph_1.3.5                              pkgconfig_2.0.3                          
[121] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2   GenomicAlignments_1.34.0                  numDeriv_2016.8-1.1                      
[124] TxDb.Celegans.UCSC.ce6.ensGene_3.2.2      xml2_1.3.3                                ggtree_3.6.2                             
[127] bslib_0.4.2                               rvest_1.0.3                               yulab.utils_0.0.6                        
[130] stringr_1.5.0                             digest_0.6.31                             cellranger_1.1.0                         
[133] rmarkdown_2.19                            fastmatch_1.1-3                           tidytree_0.4.2                           
[136] restfulr_0.0.15                           GreyListChIP_1.30.0                       curl_4.3.3                               
[139] gtools_3.9.4                              rjson_0.2.21                              lifecycle_1.0.3                          
[142] nlme_3.1-161                              jsonlite_1.8.4                            viridisLite_0.4.1                        
[145] limma_3.54.0                              BSgenome_1.66.1                           fansi_1.0.3                              
[148] pillar_1.8.1                              lattice_0.20-45                           Nozzle.R1_1.1-1.1                        
[151] KEGGREST_1.38.0                           fastmap_1.1.0                             httr_1.4.4                               
[154] plotrix_3.8-2                             GO.db_3.16.0                              glue_1.6.2                               
[157] zip_2.2.2                                 png_0.1-8                                 bit_4.0.5                                
[160] sass_0.4.4                                ggforce_0.4.1                             stringi_1.7.8                            
[163] blob_1.2.3                                TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0 latticeExtra_0.6-30                      
[166] caTools_1.18.2                            memoise_2.0.1                             irlba_2.3.5.1                            
[169] ape_5.6-2
ChIPSeq PeakDetection • 1.6k views
ADD COMMENT
0
Entering edit mode

https://www.biostars.org/p/9550136/#9550147

Why spreading the same question over multiple communities? It is a simple overlap operation, if you provide some example data as requested at biostars this probably comes down to a one-liner.

ADD REPLY
0
Entering edit mode

Because you told me to look into GenomicRanges, which I did, and now I'm asking it on the page for the GenomicRanges package. What kind of examples do you need. I have a .narrowPeak file and a .xls with a column that has gene names (like 'MCIDAS') and I need to to somehow find where these two are the same and get output of adds a column to the .xls (or tab delimited format) that has a 1 or 0 if each of those genes are bound in the .narrowPeak file.

ADD REPLY
1
Entering edit mode
Vince Schulz ▴ 160
@vince-schulz-3553
Last seen 9 weeks ago
United States

Try using the annotatePeak function from the ChIPseeker package. This will give information for the nearest gene, whether the region is exonic, intergenic, etc. You can then use match(), %in% or other methods to match up your gene list to the annotated Genomic Range.

ADD COMMENT
0
Entering edit mode

Thank you Vince. Very helpful

ADD REPLY

Login before adding your answer.

Traffic: 531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6