Genes in ChIPseeker annotation do not exist in the organism
0
0
Entering edit mode
Sam ▴ 10
@sam-21502
Last seen 2 days ago
Jerusalem

Note: I have posted this on GitHub, as this seems more appropriate but I do not know how to remove a post.

I annotated using ChIPseeker mouse chip-seq data (aligned to GRCm38). I am interested in seeing the distribution of all the reads, not only of the peaks. For this purpose, I have downsampled the alignment bam files to 1M reads, and converted the file to bed format (hope this is kosher). The TxDb was created using Ensembl database.

Problem : The resulting geneChr column in the annotated files (should reflect the chromosome of the nearest gene) makes no sense - the chromosomes numbers do not exist in mouse. See bellow.

txdb <- makeTxDbFromBiomart(dataset="mmusculus_gene_ensembl")

file_list <- list(WT = "1.bed", CKO = "2.bed")

# Checking to see the chromosome names are ok in the files
unique(as.data.frame(files[[1]])$seqnames)

[1] 1          2          3          4          5          6          7          8          9         
[10] 10         11         12         13         14         15         16         17         18        
[19] 19         X          Y          MT         GL456233.1 GL456211.1 JH584304.1 GL456379.1 GL456216.1
[28] GL456393.1 GL456366.1 GL456383.1 GL456360.1 GL456378.1 GL456389.1 GL456370.1 GL456390.1 GL456394.1
[37] GL456392.1 GL456396.1 GL456368.1
39 Levels ...

files_anno <- lapply(files, annotatePeak, TxDb=txdb, tssRegion = c(-3000,3000), verbose=TRUE)


# Why should the gene chromosomes be such?
unique(as.data.frame(files_anno[[1]])$geneChr)

 [1]  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95 103  97 139
[26] 100


sessionInfo( )

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] diffloop_1.12.0        GenomicFeatures_1.36.4 AnnotationDbi_1.46.1   Biobase_2.44.0        
 [5] GenomicRanges_1.36.1   GenomeInfoDb_1.20.0    IRanges_2.18.3         S4Vectors_0.22.1      
 [9] BiocGenerics_0.30.0    ChIPseeker_1.20.0     

loaded via a namespace (and not attached):
  [1] fgsea_1.10.1                            colorspace_2.0-0                       
  [3] ellipsis_0.3.1                          ggridges_0.5.2                         
  [5] qvalue_2.16.0                           XVector_0.24.0                         
  [7] base64enc_0.1-3                         rstudioapi_0.13                        
  [9] farver_2.0.3                            urltools_1.7.3                         
 [11] graphlayouts_0.7.1                      ggrepel_0.8.2                          
 [13] bit64_4.0.5                             xml2_1.3.2                             
 [15] codetools_0.2-18                        splines_3.6.3                          
 [17] GOSemSim_2.10.0                         knitr_1.28                             
 [19] polyclip_1.10-0                         jsonlite_1.7.1                         
 [21] Rsamtools_2.0.3                         gridBase_0.4-7                         
 [23] GO.db_3.8.2                             ggforce_0.3.2                          
 [25] readr_1.4.0                             BiocManager_1.30.10                    
 [27] compiler_3.6.3                          httr_1.4.2                             
 [29] rvcheck_0.1.8                           Matrix_1.2-18                          
 [31] limma_3.40.6                            TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [33] tweenr_1.0.1                            htmltools_0.5.0                        
 [35] prettyunits_1.1.1                       tools_3.6.3                            
 [37] igraph_1.2.6                            gtable_0.3.0                           
 [39] glue_1.4.2                              GenomeInfoDbData_1.2.1                 
 [41] reshape2_1.4.4                          DO.db_2.9                              
 [43] dplyr_1.0.2                             fastmatch_1.1-0                        
 [45] Rcpp_1.0.5                              enrichplot_1.4.0                       
 [47] vctrs_0.3.5                             Biostrings_2.52.0                      
 [49] rtracklayer_1.44.4                      iterators_1.0.13                       
 [51] ggraph_2.0.4                            xfun_0.13                              
 [53] stringr_1.4.0                           lifecycle_0.2.0                        
 [55] gtools_3.8.2                            statmod_1.4.35                         
 [57] XML_3.99-0.3                            Sushi_1.22.0                           
 [59] DOSE_3.10.2                             edgeR_3.26.8                           
 [61] zoo_1.8-8                               europepmc_0.4                          
 [63] zlibbioc_1.30.0                         MASS_7.3-53                            
 [65] scales_1.1.1                            tidygraph_1.2.0                        
 [67] hms_0.5.3                               SummarizedExperiment_1.14.1            
 [69] RColorBrewer_1.1-2                      yaml_2.2.1                             
 [71] curl_4.3                                pbapply_1.4-3                          
 [73] memoise_1.1.0                           gridExtra_2.3                          
 [75] ggplot2_3.3.2                           UpSetR_1.4.0                           
 [77] biomaRt_2.40.5                          triebeard_0.3.0                        
 [79] stringi_1.5.3                           RSQLite_2.2.1                          
 [81] foreach_1.5.1                           plotrix_3.7-8                          
 [83] caTools_1.18.0                          boot_1.3-25                            
 [85] BiocParallel_1.18.1                     rlang_0.4.8                            
 [87] pkgconfig_2.0.3                         matrixStats_0.57.0                     
 [89] bitops_1.0-6                            evaluate_0.14                          
 [91] lattice_0.20-41                         purrr_0.3.4                            
 [93] labeling_0.4.2                          GenomicAlignments_1.20.1               
 [95] cowplot_1.1.0                           bit_4.0.4                              
 [97] tidyselect_1.1.0                        plyr_1.8.6                             
 [99] magrittr_2.0.1                          R6_2.5.0                               
[101] gplots_3.1.0                            generics_0.1.0                         
[103] DelayedArray_0.10.0                     DBI_1.1.0                              
[105] pillar_1.4.6                            RCurl_1.98-1.2                         
[107] tibble_3.0.4                            crayon_1.3.4                           
[109] KernSmooth_2.23-18                      rmarkdown_2.1                          
[111] viridis_0.5.1                           progress_1.2.2                         
[113] locfit_1.5-9.4                          grid_3.6.3                             
[115] data.table_1.13.2                       blob_1.2.1                             
[117] digest_0.6.27                           tidyr_1.1.2                            
[119] gridGraphics_0.5-0                      munsell_0.5.0                          
[121] viridisLite_0.3.0                       ggplotify_0.0.5
ChIPseeker • 729 views
ADD COMMENT

Login before adding your answer.

Traffic: 622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6