How to use TopGO with gene symbols extracted from Illumina probes
1
1
Entering edit mode
Ahdee ▴ 50
@ahdee-8938
Last seen 18 months ago
United States

Hi all, I have a named vector with gene symbols and p-value extracted previously from an Illumina microarray; I'm wondering how to create the topgo object with the proper annotation call; so far I have something like this. 

glist <- ko_pk[,4] # this are p-values
names(glist) <- row.names(ko_pk)

sum(topDiffGenes(glist))

sampleGOdata <- new("topGOdata",
                    description = "Simple session", ontology = "BP",
                    allGenes = glist, geneSel = topDiffGenes,
                    nodeSize = 10,
                    ?? annot = annFUN.org, ??)

 

 

thanks in advance. 

Ahdee

 

topgo • 2.3k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 49 minutes ago
United States

You are making things more difficult for yourself. Rather than coming up with a vector of p-values with HUGO gene symbols as the names, you should be using the Illumina IDs as names, and using annFUN.db, just like in the vignette. That way you can just follow along with code that makes sense.

You could use the vector you have, but the help page for annFUN.org() is, like, not very helpful. So I can show you how to use your vector, but without saying how or why I know that you should be doing this. I will use the data from the vignette as an example.

## load stuff

> library(topGO)
> data(geneList)
> library(hgu95av2.db)

## we need a vector like yours, so do some stuff
> z <- select(hgu95av2.db, names(geneList), "SYMBOL")
'select()' returned 1:many mapping between keys and columns
> z <- z[!duplicated(z[,1]),]
> geneList2 <- geneList
> names(geneList2) <- z[,2]

## the original geneList
> head(geneList)
1095_s_at   1130_at   1196_at 1329_s_at 1340_s_at 1342_g_at
1.0000000 1.0000000 0.6223795 0.5412240 1.0000000 1.0000000

## something similar to what you have
> head(geneList2)
      HGF    MAP2K1      RCC1     TERF1       HGF     TERF1
1.0000000 1.0000000 0.6223795 0.5412240 1.0000000 1.0000000
> sampleGOdata <- new("topGOdata", description = "whatevs",ontology = "BP", allGenes = geneList2, geneSel = topDiffGenes, nodeSize = 10, annot = annFUN.org, ID = "alias", mapping = "org.Hs.eg")

Building most specific GOs .....    ( 1566 GO terms found. )

Build GO DAG topology ..........    ( 4215 GO terms and 9916 relations. )

Annotating nodes ...............    ( 225 genes annotated to the GO terms. )

> resultFisher <- runTest(sampleGOdata, "classic","fisher")

             -- Classic Algorithm --

         the algorithm is scoring 776 nontrivial nodes
         parameters:
             test statistic:  fisher
> resultFisher

Description: whatevs
Ontology: BP
'classic' algorithm with the 'fisher' test
797 GO terms scored: 11 terms with p < 0.01
Annotation data:
    Annotated genes: 310
    Significant genes: 46
    Min. no. of genes annotated to a GO: 10
    Nontrivial nodes: 776

Note that I get fewer GO terms this way (compare to the results on page 4 of the vignette), which is probably because gene symbols are really not useful for most data analysis. If you want to do things 'the right way', you will instead rely on actual IDs like the Illumina IDs, or Entrez Gene or Ensembl IDs, which are more likely to be unique.

ADD COMMENT
0
Entering edit mode

this is perfect thank you.  

ADD REPLY

Login before adding your answer.

Traffic: 824 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6