Entering edit mode
fawazfebin
▴
60
@fawazfebin-14053
Last seen 4.3 years ago
Hi I was performing a differential expression analysis on RNA-seq data from TCGA using edgeR. The results of differential expression analysis has NAs under Gene names and Gene symbols. The EntrezID corresponding to it doesn't give a valid Gene name. What could be wrong? The following command was run for annotating the gene expression data with Entrez ID.
> gnsOXP <- select(org.Hs.eg.db, keys=rownames(matrix_OXP),columns=c("SYMBOL","GENENAME"), keytype="ENTREZID")
![enter image description here][2]
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
locale:
[1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252 LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C LC_TIME=English_India.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.26.8 limma_3.40.6 org.Hs.eg.db_3.8.2 AnnotationDbi_1.46.1 IRanges_2.18.3
[6] S4Vectors_0.22.1 Biobase_2.44.0 BiocGenerics_0.30.0 TCGAbiolinks_2.15.3
loaded via a namespace (and not attached):
[1] pkgcond_0.1.0 colorspace_1.4-1 selectr_0.4-1 ggsignif_0.6.0
[5] hwriter_1.3.2 testextra_0.1.0.1 XVector_0.24.0 GenomicRanges_1.36.1
[9] rstudioapi_0.10 ggpubr_0.2.3 ggrepel_0.8.1 bit64_0.9-7
[13] xml2_1.2.2 codetools_0.2-16 splines_3.6.1 R.methodsS3_1.7.1
[17] doParallel_1.0.15 DESeq_1.36.0 geneplotter_1.62.0 knitr_1.25
[21] jsonlite_1.6 Rsamtools_2.0.3 broom_0.5.2 km.ci_0.5-2
[25] annotate_1.62.0 R.oo_1.22.0 readr_1.3.1 compiler_3.6.1
[29] httr_1.4.1 backports_1.1.5 assertthat_0.2.1 Matrix_1.2-17
[33] lazyeval_0.2.2 prettyunits_1.0.2 tools_3.6.1 gtable_0.3.0
[37] glue_1.3.1 GenomeInfoDbData_1.2.1 dplyr_0.8.3 ggthemes_4.2.0
[41] ShortRead_1.42.0 Rcpp_1.0.3 vctrs_0.2.2 Biostrings_2.52.0
[45] nlme_3.1-141 rtracklayer_1.44.4 iterators_1.0.12 xfun_0.10
[49] stringr_1.4.0 testthat_2.2.1 rvest_0.3.4 lifecycle_0.1.0
[53] XML_3.98-1.20 postlogic_0.1.0.1 zlibbioc_1.30.0 zoo_1.8-6
[57] scales_1.1.0 aroma.light_3.14.0 hms_0.5.1 SummarizedExperiment_1.14.1
[61] RColorBrewer_1.1-2 curl_4.2 memoise_1.1.0 gridExtra_2.3
[65] KMsurv_0.1-5 ggplot2_3.2.1 downloader_0.4 biomaRt_2.40.5
[69] latticeExtra_0.6-28 stringi_1.4.3 RSQLite_2.1.2 genefilter_1.66.0
[73] foreach_1.4.7 GenomicFeatures_1.36.4 BiocParallel_1.18.1 GenomeInfoDb_1.20.0
[77] rlang_0.4.4 pkgconfig_2.0.3 matrixStats_0.55.0 bitops_1.0-6
[81] lattice_0.20-38 purrr_0.3.3 GenomicAlignments_1.20.1 bit_1.1-14
[85] tidyselect_0.2.5 plyr_1.8.5 magrittr_1.5 R6_2.4.1
[89] generics_0.0.2 DelayedArray_0.10.0 DBI_1.0.0 mgcv_1.8-30
[93] pillar_1.4.3 survival_2.44-1.1 RCurl_1.95-4.12 tibble_2.1.3
[97] EDASeq_2.18.0 crayon_1.3.4 purrrogress_0.1.1 survMisc_0.5.5
[101] progress_1.2.2 locfit_1.5-9.1 grid_3.6.1 sva_3.32.1
[105] data.table_1.12.6 blob_1.2.0 digest_0.6.23 xtable_1.8-4
[109] tidyr_1.0.0 R.utils_2.9.0 munsell_0.5.0 survminer_0.4.6
[113] parsetools_0.1.1
Hi @Gordon Smyth
So how should I report the top differentially expressed genes? Any options to get the Entrez IDs that got expired? Great thanks in advance!
The first
NA
gene, 33479, is a Drosophila Gene ID. The next twoNA
genes (47738, 44153), according to NCBI don't exist (and apparently never have? NCBI will tell you if it's been retired, but says 'Wrong UID' instead). So it appears that your IDs are problematic, and you should go back and figure out where you got them and make sure they are correct.Ok. Thanks for the guidance!
Would the presence of outliers be a probable reason?
No. This issue has nothing to do with your observed data. The problem has to do with your annotation of the data, where you are saying what underlying gene is being measured. There is no way an outlier would cause you (or a collaborator) to incorrectly annotate your data, saying for instance that one of your genes is a Drosophila gene.
OK Sir, Great thanks!