Having issues mapping gene Ids during ReportingTools::publish() from Bioconductor.
1
0
Entering edit mode
anthony.nash ▴ 20
@anthonynash-14843
Last seen 11 months ago
University of Oxford

Hi all, appreciate any assistance with using Bioconductor's ReportingTools, I'm running into a couple of errors and my Google skills have failed me. This has been posted to Biostars with no traction, then I thought a post to the source would be more appropriate. I have two issues, help with either is hugely appreciated.

First issue

I'm passing in a DESeqResults object into ReportingTools::publish() after first having populated the results object with SYMBOL and ENTREZID from the accompanying ENSEMBL ids using the AnnotationDb::mapIds function.

Unfortunately, I'm getting the following error when I attempt to publish the DESeqResults object and assign annotations:

Error: Ids do not appear to be the correct key for this annotation database.

My results code:

resAmi <- results(dds, contrast = c("Drug","Amitriptyline","DMSO"),alpha=alphaValue, lfcThreshold = 0.58)
resAmi <- subset(resAmi, padj < 0.1

Although I understand the next step is not essential or a hinderance when it comes to generating the HTML reports, I've included it for the sake of completeness: Assigning entrez and symbol IDs using the Ensembl IDs:

ens.str <- substr(rownames(resAmi), 1, 15)
resAmi$symbol <- mapIds(org.Hs.eg.db, keys=ens.str, column="SYMBOL", keytype="ENSEMBL", multiVals="first")
resAmi$entrez <- mapIds(org.Hs.eg.db, keys=ens.str, column="ENTREZID", keytype="ENSEMBL", multiVals="first")

This results in the following DESeqResults object:

                  baseMean log2FoldChange     lfcSE      stat      pvalue        padj      symbol
                 <numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric> <character>
ENSG00000100053   1.851747       -17.2457   3.50063  -4.76078 1.92847e-06 8.94392e-03      CRYBB3
ENSG00000112473 371.466157        23.8466   5.56999   4.17713 2.95207e-05 8.45537e-02     SLC39A7
ENSG00000127362   0.876239       -19.3906   2.71788  -6.92106 4.48269e-12 1.15555e-07      TAS2R3
ENSG00000205238   1.808115        16.5194   3.35897   4.74532 2.08176e-06 8.94392e-03      SPDYE2
ENSG00000234651 119.459775        18.6629   3.55231   5.09046 3.57190e-07 2.30191e-03        BAG6
ENSG00000284548  24.447361        19.3357   4.10352   4.57065 4.86220e-06 1.56672e-02      METTL9
ENSG00000285238  38.491655       -28.3539   4.34045  -6.39884 1.56558e-10 2.01787e-06          NA
ENSG00000285332  10.242617        16.2567   2.54002   6.17189 6.74790e-10 5.79824e-06          NA
ENSG00000286139   1.425090       -16.6164   3.44713  -4.65212 3.28545e-06 1.20989e-02   ARHGAP11B
                     entrez
                <character>
ENSG00000100053        1417
ENSG00000112473        7922
ENSG00000127362       50831
ENSG00000205238      441273
ENSG00000234651        7917
ENSG00000284548       51108
ENSG00000285238          NA
ENSG00000285332          NA
ENSG00000286139       89839

Notice the two NAs in the entrez ID column, I'm not sure what to do with those!

I then go on to publish, ignoring plots and annotation:

des2Report <- HTMLReport(shortName="RNAseq_analysis_with_DESeq2", title="RNA-seq analysis of differential expression using DESeq2 of human iPSC cells exposed to Amitriptyline.", reportDirectory = "./reports/amitriptyline/")
publish(resAmiBak, des2Report, DataSet=dds)
URL <- finish(des2Report)
browseURL(URL)

This works fine. However, I would like URL links using gene annotations and I would like the stripplots showing the expression values for each transcript from the DataSet (dds).

Firstly when I just call:

publish(resAmi, des2Report, lfc=0.58, make.plots=TRUE, DataSet=dds)

I see a table report but there are no expression stripplots. If I then try to include annotation terms:

publish(resAmi, des2Report, annotation.db="org.Hs.eg.db", lfc=0.58, make.plots=TRUE, DataSet=dds)

I'm faced with the error: Ids do not appear to be the correct key for this annotation database.

I'm not 100% certain what publish() uses for the annotation matches, so have also tried manually retrieved the entrez IDs from annotation.db and using them as row names in the DESeqResult object. I took out all instances where the entrez ID was NA. This still throws the same error.

Second issue

I'm putting together my workflow in RStudio / RMarkdown and knitting into HTLM. Does anyone know how to knit the ReportingTools publish() returning object into an existing HTML page? My understanding is that it creates the .html file structure first with a call to the HTMLReport function.

Lots of thanks!

My sessionInfo():

R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ReportingTools_2.29.0       knitr_1.29                  org.Hs.eg.db_3.11.4        
 [4] AnnotationDbi_1.51.3        RColorBrewer_1.1-2          pheatmap_1.0.12            
 [7] genefilter_1.71.0           magrittr_1.5                dplyr_1.0.1                
[10] GGally_2.0.0                tidyr_1.1.1                 broom_0.7.0                
[13] ggplot2_3.3.2               vsn_3.57.0                  DESeq2_1.29.8              
[16] SummarizedExperiment_1.19.6 DelayedArray_0.15.4         matrixStats_0.56.0         
[19] Matrix_1.2-18               Biobase_2.49.0              GenomicRanges_1.41.5       
[22] GenomeInfoDb_1.25.10        IRanges_2.23.10             S4Vectors_0.27.12          
[25] BiocGenerics_0.35.4         tximeta_1.7.7              

loaded via a namespace (and not attached):
  [1] backports_1.1.8               GOstats_2.55.0                Hmisc_4.4-0                  
  [4] AnnotationHub_2.21.2          BiocFileCache_1.13.1          plyr_1.8.6                   
  [7] lazyeval_0.2.2                GSEABase_1.51.1               splines_4.0.2                
 [10] BiocParallel_1.23.2           digest_0.6.25                 ensembldb_2.13.1             
 [13] htmltools_0.5.0               GO.db_3.11.4                  fansi_0.4.1                  
 [16] checkmate_2.0.0               memoise_1.1.0                 BSgenome_1.57.5              
 [19] cluster_2.1.0                 limma_3.45.10                 Biostrings_2.57.2            
 [22] annotate_1.67.0               R.utils_2.9.2                 ggbio_1.37.0                 
 [25] askpass_1.1                   bdsmatrix_1.3-4               prettyunits_1.1.1            
 [28] jpeg_0.1-8.1                  colorspace_1.4-1              blob_1.2.1                   
 [31] rappdirs_0.3.1                apeglm_1.11.1                 xfun_0.16                    
 [34] crayon_1.3.4                  RCurl_1.98-1.2                jsonlite_1.7.0               
 [37] tximport_1.17.3               graph_1.67.1                  VariantAnnotation_1.35.3     
 [40] survival_3.2-3                glue_1.4.1                    gtable_0.3.0                 
 [43] zlibbioc_1.35.0               XVector_0.29.3                Rgraphviz_2.33.0             
 [46] scales_1.1.1                  mvtnorm_1.1-1                 DBI_1.1.0                    
 [49] edgeR_3.31.4                  Rcpp_1.0.5                    htmlTable_2.0.1              
 [52] xtable_1.8-4                  progress_1.2.2                emdbook_1.3.12               
 [55] foreign_0.8-80                bit_4.0.4                     OrganismDbi_1.31.0           
 [58] preprocessCore_1.51.0         Formula_1.2-3                 AnnotationForge_1.31.2       
 [61] htmlwidgets_1.5.1             httr_1.4.2                    acepack_1.4.1                
 [64] ellipsis_0.3.1                R.methodsS3_1.8.0             pkgconfig_2.0.3              
 [67] reshape_0.8.8                 XML_3.99-0.5                  farver_2.0.3                 
 [70] nnet_7.3-14                   dbplyr_1.4.4                  locfit_1.5-9.4               
 [73] tidyselect_1.1.0              labeling_0.3                  rlang_0.4.7                  
 [76] reshape2_1.4.4                later_1.1.0.1                 munsell_0.5.0                
 [79] BiocVersion_3.12.0            tools_4.0.2                   cli_2.0.2                    
 [82] generics_0.0.2                RSQLite_2.2.0                 evaluate_0.14                
 [85] stringr_1.4.0                 fastmap_1.0.1                 yaml_2.2.1                   
 [88] bit64_4.0.2                   purrr_0.3.4                   AnnotationFilter_1.13.0      
 [91] RBGL_1.65.0                   mime_0.9                      R.oo_1.23.0                  
 [94] biomaRt_2.45.2                compiler_4.0.2                rstudioapi_0.11              
 [97] png_0.1-7                     curl_4.3                      interactiveDisplayBase_1.27.5
[100] affyio_1.59.0                 PFAM.db_3.11.4                tibble_3.0.3                 
[103] geneplotter_1.67.0            stringi_1.4.6                 highr_0.8                    
[106] GenomicFeatures_1.41.2        lattice_0.20-41               ProtGenerics_1.21.0          
[109] vctrs_0.3.2                   pillar_1.4.6                  lifecycle_0.2.0              
[112] BiocManager_1.30.10           data.table_1.13.0             bitops_1.0-6                 
[115] httpuv_1.5.4                  rtracklayer_1.49.5            hwriter_1.3.2                
[118] latticeExtra_0.6-29           R6_2.4.1                      affy_1.67.0                  
[121] promises_1.1.1                gridExtra_2.3                 dichromat_2.0-0              
[124] MASS_7.3-51.6                 assertthat_0.2.1              openssl_1.4.2                
[127] Category_2.55.0               withr_2.2.0                   GenomicAlignments_1.25.3     
[130] Rsamtools_2.5.3               GenomeInfoDbData_1.2.3        hms_0.5.3                    
[133] rpart_4.1-15                  grid_4.0.2                    coda_0.19-3                  
[136] rmarkdown_2.3                 biovizBase_1.37.0             bbmle_1.0.23.1               
[139] base64enc_0.1-3               numDeriv_2016.8-1.1           shiny_1.5.0
ReportingTools RNA-Seq RMarkdown • 258 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 5 hours ago
United States

The nice thing about ReportingTools is that it will do something reasonable with an input object, without you having to do any extra work. But that is dependent on you doing something reasonable that the authors expected. And having Ensembl IDs and using an OrgDb package isn't one of those things.

Put a different way, the org.Hs.eg.db package has as its central ID NCBI Gene IDs, not Ensembl IDs. So publish is trying to map things using Ensembl IDs in such a manner that you really have to have NCBI Gene IDs, so it fails. I used to make much use of ReportingTools, and have several functions in my affycoretools package intended to allow you to make HTML links for whatever column you like, and to put the little glyphs in the HTML table, but in what I consider a much better format. If you are interested, things like ensemblLinks and nuccorLinks, etc will add the links and makeImages will add the glyphs to a data.frame, and then you can pass that to publish and it will do what you want.

But these days I don't do any of that, because I have found that people much prefer what you get from the Glimma package. Something like glMDPlot makes an interactive plot that people tend to like way more than the HTML tables you get from ReportingTools. So you might take a look at that.

You cannot, so far as I know, get the table into an HTML page generated using rmarkdown if you want that sort of thing, you can use the DT package, but again I think you will like what Glimma does better. But with Glimma you have to add a link to the interactive HTML page as well. I tend to make a table with the contrast name in one column, which is also a link to the MD plot, and in the next column a count of significant genes. So people can see that there are N differentially expressed genes, and a link that will bring up the interactive page.

ADD COMMENT

Login before adding your answer.

Traffic: 161 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6