Question

Peak annotation

0

Entering edit mode

michela ▴ 20

@216f0d39

Last seen 2.6 years ago

Netherlands

I'm performing PeakAnnotation but when I do this command it say that there's no gene and I don't understand why because I see it

> peakAnno <- annotatePeak(peakchr, tssRegion=c(-500, 500),
                         TxDb=txdb, annoDb = "org.Ce.eg.db")

> peak.anno <- as.data.frame(peakAnno)

> peak.anno$geneId
   [1] "WBGene00023193.1" "WBGene00022276.1" "WBGene00022276.1" "WBGene00022276.1" "WBGene00022276.1"
   [6] "WBGene00022278.1" "WBGene00022275.1" "WBGene00000812.1" "WBGene00021681.1" "WBGene00004274.1"
  [11] "WBGene00018774.1" "WBGene00018955.1" "WBGene00016905.1" "WBGene00016906.1" "WBGene00002077.1"


> pathway1 <- enrichPathway(peak.anno$geneId)

--> No gene can be mapped....
--> Expected input gene ID: 51520,58484,23198,475,4361,6678
--> return NULL...

HELP ChIC.data PeakDetection ChipOnChipData • 2.3k views

ADD COMMENT • link 2.8 years ago • updated 2.7 years ago michela ▴ 20

score 0 · Answer 1 · 2022-10-07

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 5 hours ago

United States

You are using WormBase IDs (which I know because I googled one of them), and annotatePeak is telling you that it expects a different type of ID (and gives you examples that you could have googled yourself to see what kind of IDs those are). So you need to convert the IDs, which means you will use the AnnotationDbi package which has a vignette you should read.

I happen to know that those IDs are NCBI Gene IDs, so here's how you do it.

> wbids <- scan("clipboard", "c", sep = " ")
Read 5 items
> wbids
[1] "WBGene00023193.1" "WBGene00022276.1" "WBGene00022276.1" "WBGene00022276.1"
[5] "WBGene00022276.1"
> ncbids <- mapIds(org.Ce.eg.db, wbids, "ENTREZID","WORMBASE")
Error in .testForValidKeys(x, keys, keytype, fks) : 
  None of the keys entered are valid keys for 'WORMBASE'. Please use the keys method to see a listing of valid arguments.

## Huh. Got an error. It says 
## None of the keys entered are valid keys for 'WORMBASE'. Please use the keys method to see a listing of valid arguments.
## That's a HINT!

> head(keys(org.Ce.eg.db, "WORMBASE"))
[1] "WBGene00022277" "WBGene00022276" "WBGene00022278" "WBGene00022279"
[5] "WBGene00021677" "WBGene00021678"

## Oh, I get it

> wbids <- gsub("\\.[0-9]+$", "", wbids)
> wbids
[1] "WBGene00023193" "WBGene00022276" "WBGene00022276" "WBGene00022276"
[5] "WBGene00022276"
> ncbids <- mapIds(org.Ce.eg.db, wbids, "ENTREZID","WORMBASE")
'select()' returned 1:1 mapping between keys and columns
> ncbids
WBGene00023193 WBGene00022276 WBGene00022276 WBGene00022276 WBGene00022276 
            NA       "171591"       "171591"       "171591"       "171591"

Now you have NCBI Gene IDs and can proceed.

ADD COMMENT • link 2.8 years ago James W. MacDonald 68k

0

Entering edit mode

ncbids <- mapIds(org.Ce.eg.db, wbids, "ENTREZID","WORMBASE") Error in .testForValidKeys(x, keys, keytype, fks) : None of the keys entered are valid keys for 'WORMBASE'. Please use the keys method to see a listing of valid arguments.

When I do this command I have this error and I don't know how to solve it, I don't understand why in both ways you sent me I always get the same error

ADD REPLY • link 2.7 years ago michela ▴ 20

0

Entering edit mode

Did you carefully read James's answer? He got the same error as you report now, and he showed this was due to the version identifier included in the IDs you provided (= the number after the period). You should thus strip these of (= the period + any number following this), and James also provided code to do this with gsub...

Or did you do this, and it still doesn't work in your hands? Then you should provide code... otherwise you would not be able to obtain any furher help...

ADD REPLY • link 2.7 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

Yes I saw that we had the same error, but even with the second code I get the same error

ADD REPLY • link 2.7 years ago michela ▴ 20

0

Entering edit mode

Then you should post all relevant code that results in this error, including head(wbids) and class(wbids), and your sessionInfo().

ADD REPLY • link 2.7 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

wbids <- gsub("\.[0-9]+$", "", peak.anno$geneId)

head(wbids) [1] "WBGene00199422" "WBGene00201274" "WBGene00219692" "WBGene00219692" "WBGene00018233" "WBGene00018233"

class(wbids) [1] "character"

ncbids <- mapIds(org.Ce.eg.db, wbids, "ENTREZID","WORMBASE") 'select()' returned 1:1 mapping between keys and columns

pathway1 <- enrichPathway(as.data.frame(peakAnno)$geneId) --> No gene can be mapped.... --> Expected input gene ID: 79109,5091,54600,347688,5337,3563 --> return NULL...

I think I solved the problem but do not understand why if I do the command pathway1 <- enrichPathway(as.data.frame(peakAnno)$geneId) it keeps telling me that there are no genes

sessionInfo() R version 4.1.3 (2022-03-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale: [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 LC_MONETARY=Italian_Italy.1252 [4] LC_NUMERIC=C LC_TIME=Italian_Italy.1252

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] dplyr_1.0.10 ensembldb_2.18.4
[3] AnnotationFilter_1.18.0 biomaRt_2.50.3
[5] rtracklayer_1.54.0 ChIPpeakAnno_3.28.1
[7] org.Hs.eg.db_3.14.0 ReactomePA_1.38.0
[9] org.Ce.eg.db_3.14.0 clusterProfiler_4.2.2
[11] TxDb.Celegans.UCSC.ce11.ensGene_3.12.0 GenomicFeatures_1.46.5
[13] AnnotationDbi_1.56.2 Biobase_2.54.0
[15] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
[17] IRanges_2.28.0 S4Vectors_0.32.4
[19] BiocGenerics_0.40.0 ChIPseeker_1.30.3

loaded via a namespace (and not attached): [1] utf8_1.2.2 tidyselect_1.1.2
[3] RSQLite_2.2.17 grid_4.1.3
[5] BiocParallel_1.28.3 scatterpie_0.1.8
[7] munsell_0.5.0 codetools_0.2-18
[9] statmod_1.4.37 withr_2.5.0
[11] colorspace_2.0-3 GOSemSim_2.20.0
[13] filelock_1.0.2 knitr_1.40
[15] rstudioapi_0.14 DOSE_3.20.1
[17] MatrixGenerics_1.6.0 labeling_0.4.2
[19] GenomeInfoDbData_1.2.7 polyclip_1.10-0
[21] bit64_4.0.5 farver_2.1.1
[23] downloader_0.4 vctrs_0.4.1
[25] treeio_1.18.1 generics_0.1.3
[27] lambda.r_1.2.4 xfun_0.33
[29] BiocFileCache_2.2.1 regioneR_1.26.1
[31] R6_2.5.1 graphlayouts_0.8.1
[33] locfit_1.5-9.6 bitops_1.0-7
[35] cachem_1.0.6 fgsea_1.20.0
[37] gridGraphics_0.5-1 DelayedArray_0.20.0
[39] assertthat_0.2.1 BiocIO_1.4.0
[41] scales_1.2.1 ggraph_2.0.6
[43] enrichplot_1.14.2 gtable_0.3.1
[45] tidygraph_1.2.2 rlang_1.0.5
[47] splines_4.1.3 lazyeval_0.2.2
[49] checkmate_2.1.0 BiocManager_1.30.18
[51] yaml_2.3.5 reshape2_1.4.4
[53] ggimage_0.3.1 backports_1.4.1
[55] qvalue_2.26.0 RBGL_1.70.0
[57] tools_4.1.3 ggplotify_0.1.0
[59] ggplot2_3.3.6 ellipsis_0.3.2
[61] gplots_3.1.3 RColorBrewer_1.1-3
[63] Rcpp_1.0.9 plyr_1.8.7
[65] progress_1.2.2 zlibbioc_1.40.0
[67] purrr_0.3.4 RCurl_1.98-1.8
[69] prettyunits_1.1.1 pbapply_1.5-0
[71] viridis_0.6.2 zoo_1.8-11
[73] SummarizedExperiment_1.24.0 ggrepel_0.9.1
[75] magrittr_2.0.3 data.table_1.14.2
[77] futile.options_1.0.1 magick_2.7.3
[79] DO.db_2.9 reactome.db_1.77.0
[81] ProtGenerics_1.26.0 matrixStats_0.62.0
[83] hms_1.1.2 patchwork_1.1.2
[85] evaluate_0.17 XML_3.99-0.10
[87] VennDiagram_1.7.3 gridExtra_2.3
[89] ggupset_0.3.0 compiler_4.1.3
[91] tibble_3.1.8 KernSmooth_2.23-20
[93] crayon_1.5.2 shadowtext_0.1.2
[95] htmltools_0.5.3 tzdb_0.3.0
[97] ggfun_0.0.7 tidyr_1.2.1
[99] aplot_0.1.7 DBI_1.1.3
[101] tweenr_2.0.2 formatR_1.12
[103] dbplyr_2.2.1 MASS_7.3-55
[105] rappdirs_0.3.3 boot_1.3-28
[107] readr_2.1.3 Matrix_1.5-1
[109] cli_3.3.0 parallel_4.1.3
[111] igraph_1.3.4 pkgconfig_2.0.3
[113] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicAlignments_1.30.0
[115] foreach_1.5.2 xml2_1.3.3
[117] InteractionSet_1.22.0 ggtree_3.2.1
[119] multtest_2.50.0 XVector_0.34.0
[121] yulab.utils_0.0.5 stringr_1.4.1
[123] digest_0.6.29 graph_1.72.0
[125] Biostrings_2.62.0 rmarkdown_2.17
[127] fastmatch_1.1-3 tidytree_0.4.1
[129] edgeR_3.36.0 diffloop_1.22.0
[131] restfulr_0.0.15 curl_4.3.2
[133] Rsamtools_2.10.0 gtools_3.9.3
[135] graphite_1.40.0 rjson_0.2.21
[137] lifecycle_1.0.3 nlme_3.1-155
[139] jsonlite_1.8.0 futile.logger_1.4.3
[141] limma_3.50.3 BSgenome_1.62.0
[143] viridisLite_0.4.1 fansi_1.0.3
[145] pillar_1.8.1 lattice_0.20-45
[147] survival_3.4-0 KEGGREST_1.34.0
[149] fastmap_1.1.0 httr_1.4.4
[151] plotrix_3.8-2 GO.db_3.14.0
[153] glue_1.6.2 iterators_1.0.14
[155] Sushi_1.32.0 png_0.1-7
[157] bit_4.0.4 ggforce_0.3.4
[159] stringi_1.7.6 blob_1.2.3
[161] caTools_1.18.2 memoise_2.0.1
[163] ape_5.6-2

ADD REPLY • link 2.7 years ago michela ▴ 20

0

Entering edit mode

your call to enrichPathway does not select the appropriate organism? If you relax the criterion, you get an answer....

> enrichPathway(ncbids, organism="celegans", pvalueCutoff=.35)
#
# over-representation test
#
#...@organism    celegans 
#...@ontology    Reactome 
#...@keytype     ENTREZID 
#...@gene    chr [1:12] "353377" "171591" "171592" "190694" "266817" "3564797" "171601" ...
#...pvalues adjusted by 'BH' with cutoff <0.35 
#...20 enriched terms found
'data.frame':   20 obs. of  9 variables:
 $ ID         : chr  "R-CEL-8863795" "R-CEL-5620916" "R-CEL-8854214" "R-CEL-156590" ...
 $ Description: chr  "Downregulation of ERBB2 signaling" "VxPx cargo-targeting to cilium" "TBC/RABGAPs" "Glutathione conjugation" ...
 $ GeneRatio  : chr  "1/3" "1/3" "1/3" "1/3" ...
 $ BgRatio    : chr  "11/3973" "17/3973" "17/3973" "19/3973" ...
 $ pvalue     : num  0.00829 0.01279 0.01279 0.01428 0.01802 ...
 $ p.adjust   : num  0.0692 0.0692 0.0692 0.0692 0.0692 ...
 $ qvalue     : num  0.0401 0.0401 0.0401 0.0401 0.0401 ...
 $ geneID     : chr  "266817" "171601" "171601" "171619" ...
 $ Count      : int  1 1 1 1 1 1 1 1 1 1 ...
#...Citation
  Guangchuang Yu, Qing-Yu He. ReactomePA: an R/Bioconductor package for
  reactome pathway analysis and visualization. Molecular BioSystems
  2016, 12(2):477-479

ADD REPLY • link 2.7 years ago Vincent J. Carey, Jr. 6.7k

0

Entering edit mode

Thank you!

ADD REPLY • link 2.7 years ago michela ▴ 20

0

Entering edit mode

You wrote: ... but do not understand why if I do the command pathway1 <- enrichPathway(as.data.frame(peakAnno)$geneId) it keeps telling me that there are no genes

Note that as input for the function enrichPathway you used as.data.frame(peakAnno)$geneId. I don't know what geneId is, because you did not indicate in your code how you generated this.... It should be ncbids, or maybe it is something completely different... ???

Also realize that it could be that the ids you use as input are not annotated with reactome! See code below to illustrate this, and that analysis with enrichPathway as such do work for C. Elegans!

> library(ReactomePA)
> library(org.Ce.eg.db)
> library(reactome.db)
> ## some wormbase ids from topic starter, with version stiped of
> wbids <- c("WBGene00199422", "WBGene00201274", "WBGene00219692", "WBGene00219692", "WBGene00018233", "WBGene00018233")
> 
> ## convert into entrezid, since this is the type of id reactome.db expects
> ncbids <- mapIds(org.Ce.eg.db, wbids, "ENTREZID","WORMBASE")
'select()' returned 1:1 mapping between keys and columns
> 
> ## check content; note that only one of these 5 wormbase ids has an ncbi mapping"
> ## WBGene00018233 corresponds to ncbid 353381
> ncbids
WBGene00199422 WBGene00201274 WBGene00219692 WBGene00219692 WBGene00018233 
            NA             NA             NA             NA       "353381" 
WBGene00018233 
      "353381" 
> 
> ## run enrichPathway, with organism specified and no sigificance cutoff
> pathway1 <- enrichPathway(ncbids, organism = "celegans", pvalueCutoff = 1 ) 
--> No gene can be mapped....
--> Expected input gene ID: 179362,176908,176295,184442,176114,172007
--> return NULL...
> 
> ## Mmm, got the same error as TS.
> ## Let's check reactome.db in more detail, since data is used from this db
> keytypes(reactome.db)
[1] "ENTREZID"   "GO"         "PATHID"     "PATHNAME"   "REACTOMEID"
> 
> ## Query ncbids: indeed identical error
> select(reactome.db, keys = ncbids, keytype = "ENTREZID", columns = c( "ENTREZID","PATHID") )
Error in .testForValidKeys(x, keys, keytype, fks) : 
  None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments.
> 
> ## OK, could it be that this ncbid has no REACTOME annotation?
> ##
> ## Check this by using all C. Elegans ids from (org.Ce.eg.db)
> 
> all.ce.ids <- select(org.Ce.eg.db, keys =keys(org.Ce.eg.db), keytype = "ENTREZID", columns = c("ENTREZID","WORMBASE", "SYMBOL") )
'select()' returned 1:many mapping between keys and columns
> head(all.ce.ids)
  ENTREZID       WORMBASE   SYMBOL
1   171590 WBGene00022277   homt-1
2   171591 WBGene00022276   nlp-40
3   171592 WBGene00022278   rcor-1
4   171593 WBGene00022279   sesn-1
5   171594 WBGene00021677    pgs-1
6   171595 WBGene00021678 Y48G1C.5
>
> ## extract reactome pathways for all C. elegans genes.
> c.elegans.reactome  <- select(reactome.db, keys = keys(org.Ce.eg.db), keytype = "ENTREZID", columns = c( "ENTREZID","PATHID") )
'select()' returned 1:many mapping between keys and columns
> head(c.elegans.reactome)
  ENTREZID        PATHID
1   171590          <NA>
2   171591          <NA>
3   171592          <NA>
4   171593  R-CEL-212436
5   171593 R-CEL-2262752
6   171593 R-CEL-3700989
> 
> ## merge both data.frames
> merged.annotation <- merge(x=all.ce.ids,y=c.elegans.reactome,by.x ="ENTREZID",by.y="ENTREZID",all=TRUE)
> 
> ## check info for ncbid "353381": this gene indeed has NO reactome pathway annotation!
> merged.annotation[merged.annotation$ENTREZID == "353381",] 
      ENTREZID       WORMBASE  SYMBOL PATHID
76262   353381 WBGene00018233 F40E3.6   <NA>
> 
> ## check another, random ncbid "171593": this gene has been annotated to multiple pathways!
> merged.annotation[merged.annotation$ENTREZID == "171593",] 
      ENTREZID       WORMBASE SYMBOL        PATHID
24520   171593 WBGene00022279 sesn-1  R-CEL-212436
24521   171593 WBGene00022279 sesn-1 R-CEL-2262752
24522   171593 WBGene00022279 sesn-1 R-CEL-3700989
24523   171593 WBGene00022279 sesn-1 R-CEL-5628897
24524   171593 WBGene00022279 sesn-1   R-CEL-73857
24525   171593 WBGene00022279 sesn-1   R-CEL-74160
24526   171593 WBGene00022279 sesn-1 R-CEL-8953897
24527   171593 WBGene00022279 sesn-1 R-CEL-9711123
24528   171593 WBGene00022279 sesn-1 R-CEL-9755511
> 
> ## What if enrichPathway is run with (say) 250 genes that do have reactome annoations?
> ## WORKS!
> react.annoatated.genes <- unique(merged.annotation[!is.na(merged.annotation$PATHID),]$ENTREZID)
> pathway2 <- enrichPathway(react.annoatated.genes[1:250], organism = "celegans", pvalueCutoff = 1 ) 
>  pathway2
#
# over-representation test
#
#...@organism    celegans 
#...@ontology    Reactome 
#...@keytype     ENTREZID 
#...@gene        chr [1:250] "13184808" "13184830" "13188188" "13190517" "13190957" ...
#...pvalues adjusted by 'BH' with cutoff <1 
#...28 enriched terms found
'data.frame':   28 obs. of  9 variables:
 $ ID         : chr  "R-CEL-373752" "R-CEL-983169" "R-CEL-983168" "R-CEL-6806834" ...
 $ Description: chr  "Netrin-1 signaling" "Class I MHC mediated antigen processing & presentation" "Antigen processing: Ubiquitination & Proteasome degradation" "Signaling by MET" ...
 $ GeneRatio  : chr  "5/250" "24/250" "20/250" "8/250" ...
 $ BgRatio    : chr  "10/3973" "177/3973" "146/3973" "34/3973" ...
 $ pvalue     : num  0.000184 0.000232 0.000682 0.000951 0.001051 ...
 $ p.adjust   : num  0.0561 0.0561 0.1011 0.1011 0.1011 ...
 $ qvalue     : num  0.0519 0.0519 0.0936 0.0936 0.0936 ...
 $ geneID     : chr  "171722/171988/172135/172174/172233" "171607/171639/171647/171694/171700/171719/171734/171852/171875/171893/171953/171968/171975/172009/172012/172068"| __truncated__ "171607/171647/171700/171719/171734/171852/171875/171893/171953/171968/171975/172009/172012/172068/172139/172145"| __truncated__ "171635/171722/171906/172078/172135/172141/172215/172264" ...
 $ Count      : int  5 24 20 8 21 12 17 17 7 25 ...
#...Citation
  Guangchuang Yu, Qing-Yu He. ReactomePA: an R/Bioconductor package for
  reactome pathway analysis and visualization. Molecular BioSystems
  2016, 12(2):477-479 

>