'NA' found under Gene Names in the list of top differentially expressed genes found using edgeR
1
0
Entering edit mode
fawazfebin ▴ 60
@fawazfebin-14053
Last seen 3.8 years ago

Hi I was performing a differential expression analysis on RNA-seq data from TCGA using edgeR. The results of differential expression analysis has NAs under Gene names and Gene symbols. The EntrezID corresponding to it doesn't give a valid Gene name. What could be wrong? The following command was run for annotating the gene expression data with Entrez ID.

 > gnsOXP  <- select(org.Hs.eg.db, keys=rownames(matrix_OXP),columns=c("SYMBOL","GENENAME"), keytype="ENTREZID")

![enter image description here][2]
> sessionInfo()

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252    LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C                   LC_TIME=English_India.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.26.8         limma_3.40.6         org.Hs.eg.db_3.8.2   AnnotationDbi_1.46.1 IRanges_2.18.3      
[6] S4Vectors_0.22.1     Biobase_2.44.0       BiocGenerics_0.30.0  TCGAbiolinks_2.15.3 

loaded via a namespace (and not attached):
  [1] pkgcond_0.1.0               colorspace_1.4-1            selectr_0.4-1               ggsignif_0.6.0             
  [5] hwriter_1.3.2               testextra_0.1.0.1           XVector_0.24.0              GenomicRanges_1.36.1       
  [9] rstudioapi_0.10             ggpubr_0.2.3                ggrepel_0.8.1               bit64_0.9-7                
 [13] xml2_1.2.2                  codetools_0.2-16            splines_3.6.1               R.methodsS3_1.7.1          
 [17] doParallel_1.0.15           DESeq_1.36.0                geneplotter_1.62.0          knitr_1.25                 
 [21] jsonlite_1.6                Rsamtools_2.0.3             broom_0.5.2                 km.ci_0.5-2                
 [25] annotate_1.62.0             R.oo_1.22.0                 readr_1.3.1                 compiler_3.6.1             
 [29] httr_1.4.1                  backports_1.1.5             assertthat_0.2.1            Matrix_1.2-17              
 [33] lazyeval_0.2.2              prettyunits_1.0.2           tools_3.6.1                 gtable_0.3.0               
 [37] glue_1.3.1                  GenomeInfoDbData_1.2.1      dplyr_0.8.3                 ggthemes_4.2.0             
 [41] ShortRead_1.42.0            Rcpp_1.0.3                  vctrs_0.2.2                 Biostrings_2.52.0          
 [45] nlme_3.1-141                rtracklayer_1.44.4          iterators_1.0.12            xfun_0.10                  
 [49] stringr_1.4.0               testthat_2.2.1              rvest_0.3.4                 lifecycle_0.1.0            
 [53] XML_3.98-1.20               postlogic_0.1.0.1           zlibbioc_1.30.0             zoo_1.8-6                  
 [57] scales_1.1.0                aroma.light_3.14.0          hms_0.5.1                   SummarizedExperiment_1.14.1
 [61] RColorBrewer_1.1-2          curl_4.2                    memoise_1.1.0               gridExtra_2.3              
 [65] KMsurv_0.1-5                ggplot2_3.2.1               downloader_0.4              biomaRt_2.40.5             
 [69] latticeExtra_0.6-28         stringi_1.4.3               RSQLite_2.1.2               genefilter_1.66.0          
 [73] foreach_1.4.7               GenomicFeatures_1.36.4      BiocParallel_1.18.1         GenomeInfoDb_1.20.0        
 [77] rlang_0.4.4                 pkgconfig_2.0.3             matrixStats_0.55.0          bitops_1.0-6               
 [81] lattice_0.20-38             purrr_0.3.3                 GenomicAlignments_1.20.1    bit_1.1-14                 
 [85] tidyselect_0.2.5            plyr_1.8.5                  magrittr_1.5                R6_2.4.1                   
 [89] generics_0.0.2              DelayedArray_0.10.0         DBI_1.0.0                   mgcv_1.8-30                
 [93] pillar_1.4.3                survival_2.44-1.1           RCurl_1.95-4.12             tibble_2.1.3               
 [97] EDASeq_2.18.0               crayon_1.3.4                purrrogress_0.1.1           survMisc_0.5.5             
[101] progress_1.2.2              locfit_1.5-9.1              grid_3.6.1                  sva_3.32.1                 
[105] data.table_1.12.6           blob_1.2.0                  digest_0.6.23               xtable_1.8-4               
[109] tidyr_1.0.0                 R.utils_2.9.0               munsell_0.5.0               survminer_0.4.6            
[113] parsetools_0.1.1          
edger differential expression TCGA Gene ID • 1.1k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 24 minutes ago
WEHI, Melbourne, Australia

A small proportion of Entrez GeneIDs have been retired over the years because the definition of the gene has changed. If the GeneID has been retired, then it will be no longer associated with a gene name.

ADD COMMENT
0
Entering edit mode

Hi @Gordon Smyth

So how should I report the top differentially expressed genes? Any options to get the Entrez IDs that got expired? Great thanks in advance!

ADD REPLY
2
Entering edit mode

The first NA gene, 33479, is a Drosophila Gene ID. The next two NA genes (47738, 44153), according to NCBI don't exist (and apparently never have? NCBI will tell you if it's been retired, but says 'Wrong UID' instead). So it appears that your IDs are problematic, and you should go back and figure out where you got them and make sure they are correct.

ADD REPLY
0
Entering edit mode

Ok. Thanks for the guidance!

ADD REPLY
0
Entering edit mode

Would the presence of outliers be a probable reason?

ADD REPLY
1
Entering edit mode

No. This issue has nothing to do with your observed data. The problem has to do with your annotation of the data, where you are saying what underlying gene is being measured. There is no way an outlier would cause you (or a collaborator) to incorrectly annotate your data, saying for instance that one of your genes is a Drosophila gene.

ADD REPLY
0
Entering edit mode

OK Sir, Great thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1052 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6