Hi all,
There have been a number of questions asked on this topic (I've linked the relevant threads below), but I'm not sure if anyone has found a working solution. I have run GOSeq on my gene set and am looking to identify the genes in each category (numDEInCat). Here is what I've done so far. After obtaining go_bp as the result of running GOSeq:
head(go_bp)
category over_represented_pvalue under_represented_pvalue numDEInCat numInCat term ontology 1667 GO:0006955 7.257450e-14 1 135 668 immune response BP 6542 GO:0045087 1.235722e-13 1 83 336 innate immune response BP 578 GO:0002376 3.926914e-13 1 229 1369 immune system process BP 2271 GO:0009605 6.242604e-12 1 186 1079 response to external stimulus BP 2272 GO:0009607 5.035819e-11 1 98 478 response to biotic stimulus BP 10415 GO:0098542 7.923474e-11 1 68 286 defense response to other organism BP
So for example, I want to know what those 135 genes are in the first category.
To get the right gene mappings, I did this, where diff_exp_ensembl is just a named vector containing genes and 1 or 0 depending on whether it is differentially expressed or not:
go2gene = getBM(attributes = c("ensembl_gene_id", "mgi_symbol", "go_id"), mart = mouse) ## map from gene name to ensembl go2gene = go2gene[which(go2gene$ensembl_gene_id %in% names(diff_exp_ensembl)),]
Seems reasonable to me. But then when I run this line:
table(diff_exp_ensembl[which(names(diff_exp_ensembl) %in% go2gene[which(go2gene$go_id == "GO:0006955"),]$ensembl_gene_id)])
I obtain:
0 1 122 31
But there should be 135 differentially expressed genes (i.e. 135 should be 1, in the line above) according to GOSeq.
I've been struggling with this for quite a while- all of this seems quite clunky to me and I'm wondering if there's something I'm missing which makes it much easier. Here are the relevant threads I've found. Any tips are much appreciated!
GOseq package: Which genes are included in the "numDEInCat"?
goseq analysis - extracting list of my genes which are DE in the enriched GO category
Can I know the number of genes in enriched GO category using GOseq
How to get list of genes that are DE from the goseq analysis results
sessionInfo()
R version 3.2.4 (2016-03-10) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.9.5 (Mavericks) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] annotate_1.48.0 XML_3.98-1.4 BiocInstaller_1.20.1 org.Hs.eg.db_3.2.3 [5] org.Mm.eg.db_3.2.3 biomaRt_2.26.1 TxDb.Mmusculus.UCSC.mm10.ensGene_3.2.2 GenomicFeatures_1.22.13 [9] goseq_1.22.0 geneLenDataBase_1.6.0 BiasedUrn_1.07 DESeq2_1.10.1 [13] RcppArmadillo_0.6.600.4.0 Rcpp_0.12.4 SummarizedExperiment_1.0.2 GenomicRanges_1.22.4 [17] GenomeInfoDb_1.6.3 reactome.db_1.54.1 PANTHER.db_1.0.3 RSQLite_1.0.0 [21] DBI_0.6-1 AnnotationDbi_1.32.3 IRanges_2.4.8 S4Vectors_0.8.11 [25] Biobase_2.30.0 BiocGenerics_0.16.1 KEGGREST_1.10.1 loaded via a namespace (and not attached): [1] locfit_1.5-9.1 lattice_0.20-33 GO.db_3.2.2 png_0.1-7 Rsamtools_1.22.0 Biostrings_2.38.4 [7] R6_2.1.2 plyr_1.8.3 futile.options_1.0.0 acepack_1.4.1 httr_1.1.0 ggplot2_2.1.0 [13] zlibbioc_1.16.0 curl_0.9.6 rpart_4.1-10 Matrix_1.2-4 splines_3.2.4 BiocParallel_1.4.3 [19] geneplotter_1.48.0 foreign_0.8-66 RCurl_1.95-4.8 munsell_0.4.3 rtracklayer_1.30.4 mgcv_1.8-12 [25] nnet_7.3-12 gridExtra_2.2.1 Hmisc_3.17-3 GenomicAlignments_1.6.3 bitops_1.0-6 grid_3.2.4 [31] nlme_3.1-126 xtable_1.8-2 gtable_0.2.0 scales_0.4.0 XVector_0.10.0 genefilter_1.52.1 [37] latticeExtra_0.6-28 futile.logger_1.4.1 Formula_1.2-1 lambda.r_1.1.7 RColorBrewer_1.1-2 tools_3.2.4
How do you decide if a gene is differentially expressed in diff_exp_ensembl ? The reason of dissagrement might be that a gene is considered DE differently in GOSeq and by your vector (If I understood correctly the question).
Hi Lluis, thanks for your reply. In diff_exp_ensembl, a gene is differentially expressed according to DESeq- i.e. any gene with adjusted p-value less than some cutoff.