Question

Getting Differentially Expressed Genes from GOSeq Categories

0

Entering edit mode

krc3004 ▴ 10

@krc3004-12978

Last seen 5.9 years ago

Hi all,

There have been a number of questions asked on this topic (I've linked the relevant threads below), but I'm not sure if anyone has found a working solution. I have run GOSeq on my gene set and am looking to identify the genes in each category (numDEInCat). Here is what I've done so far. After obtaining go_bp as the result of running GOSeq:

head(go_bp)

category    over_represented_pvalue under_represented_pvalue  numDEInCat  numInCat  term ontology

1667  GO:0006955            7.257450e-14                        1        135      668                    immune response       BP
6542  GO:0045087            1.235722e-13                        1         83      336             innate immune response       BP
578   GO:0002376            3.926914e-13                        1        229     1369              immune system process       BP
2271  GO:0009605            6.242604e-12                        1        186     1079      response to external stimulus       BP
2272  GO:0009607            5.035819e-11                        1         98      478        response to biotic stimulus       BP
10415 GO:0098542            7.923474e-11                        1         68      286 defense response to other organism       BP

So for example, I want to know what those 135 genes are in the first category.

To get the right gene mappings, I did this, where diff_exp_ensembl is just a named vector containing genes and 1 or 0 depending on whether it is differentially expressed or not:

go2gene = getBM(attributes = c("ensembl_gene_id", "mgi_symbol", "go_id"), mart = mouse) ## map from gene name to ensembl
go2gene = go2gene[which(go2gene$ensembl_gene_id %in% names(diff_exp_ensembl)),]

Seems reasonable to me. But then when I run this line:

table(diff_exp_ensembl[which(names(diff_exp_ensembl) %in% go2gene[which(go2gene$go_id == "GO:0006955"),]$ensembl_gene_id)])

I obtain:

 0   1 
122  31

But there should be 135 differentially expressed genes (i.e. 135 should be 1, in the line above) according to GOSeq.

I've been struggling with this for quite a while- all of this seems quite clunky to me and I'm wondering if there's something I'm missing which makes it much easier. Here are the relevant threads I've found. Any tips are much appreciated!

GOseq package: Which genes are included in the "numDEInCat"?

goseq analysis - extracting list of my genes which are DE in the enriched GO category

Can I know the number of genes in enriched GO category using GOseq

How to get list of genes that are DE from the goseq analysis results

sessionInfo()

R version 3.2.4 (2016-03-10)

Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] annotate_1.48.0                        XML_3.98-1.4                           BiocInstaller_1.20.1                   org.Hs.eg.db_3.2.3                    
 [5] org.Mm.eg.db_3.2.3                     biomaRt_2.26.1                         TxDb.Mmusculus.UCSC.mm10.ensGene_3.2.2 GenomicFeatures_1.22.13               
 [9] goseq_1.22.0                           geneLenDataBase_1.6.0                  BiasedUrn_1.07                         DESeq2_1.10.1                         
[13] RcppArmadillo_0.6.600.4.0              Rcpp_0.12.4                            SummarizedExperiment_1.0.2             GenomicRanges_1.22.4                  
[17] GenomeInfoDb_1.6.3                     reactome.db_1.54.1                     PANTHER.db_1.0.3                       RSQLite_1.0.0                         
[21] DBI_0.6-1                              AnnotationDbi_1.32.3                   IRanges_2.4.8                          S4Vectors_0.8.11                      
[25] Biobase_2.30.0                         BiocGenerics_0.16.1                    KEGGREST_1.10.1                       

loaded via a namespace (and not attached):
 [1] locfit_1.5-9.1          lattice_0.20-33         GO.db_3.2.2             png_0.1-7               Rsamtools_1.22.0        Biostrings_2.38.4      
 [7] R6_2.1.2                plyr_1.8.3              futile.options_1.0.0    acepack_1.4.1           httr_1.1.0              ggplot2_2.1.0          
[13] zlibbioc_1.16.0         curl_0.9.6              rpart_4.1-10            Matrix_1.2-4            splines_3.2.4           BiocParallel_1.4.3     
[19] geneplotter_1.48.0      foreign_0.8-66          RCurl_1.95-4.8          munsell_0.4.3           rtracklayer_1.30.4      mgcv_1.8-12            
[25] nnet_7.3-12             gridExtra_2.2.1         Hmisc_3.17-3            GenomicAlignments_1.6.3 bitops_1.0-6            grid_3.2.4             
[31] nlme_3.1-126            xtable_1.8-2            gtable_0.2.0            scales_0.4.0            XVector_0.10.0          genefilter_1.52.1      
[37] latticeExtra_0.6-28     futile.logger_1.4.1     Formula_1.2-1           lambda.r_1.1.7          RColorBrewer_1.1-2      tools_3.2.4

goseq gene ontology biomart annotate category • 2.2k views

ADD COMMENT • link updated 6.8 years ago by b.nota ▴ 360 • written 6.8 years ago by krc3004 ▴ 10

0

Entering edit mode

How do you decide if a gene is differentially expressed in diff_exp_ensembl ? The reason of dissagrement might be that a gene is considered DE differently in GOSeq and by your vector (If I understood correctly the question).

ADD REPLY • link 6.8 years ago Lluís Revilla Sancho ▴ 730

0

Entering edit mode

Hi Lluis, thanks for your reply. In diff_exp_ensembl, a gene is differentially expressed according to DESeq- i.e. any gene with adjusted p-value less than some cutoff.

ADD REPLY • link 6.8 years ago krc3004 ▴ 10

score 0 · Answer 1 · 2017-07-11

In the manual they give an example with stack and getgo function. In your case something like this should work:

allGos <- stack(getgo(diff_exp_ensembl, 'mm10', 'ensGene', fetch.cats = 'GO:BP'))

head(allGos)

edit: "diff_exp_ensembl" should be the list of DE ensembl gene names

To get only the genes in GO:0006955:

genes_of_interest <- subset(allGos, values %in% "GO:0006955")

genes_of_interest