Question

Error in topGO analysis

0

Entering edit mode

maria.ratto • 0

@mariaratto-24085

Last seen 3.6 years ago

I'm trying to construct the topGOdata, but I keep getting an error. What am I doing wrong? Thanks for the help.

allmarkers1 <- as.vector(unlist(markers1$gene_1))
print(head(allmarkers1))
allgenes <- as.numeric(c(1:length(allmarkers1)))
names(allgenes) <- allmarkers1
head(allgenes)
[1] "ENSG00000109103:UNC119_negation" "ENSG00000132639:SNAP25_negation"
[3] "ENSG00000166900:STX3_negation"   "ENSG00000067715:SYT1_negation"  
[5] "ENSG00000163618:CADPS_negation"  "ENSG00000175416:CLTB_negation"  
ENSG00000109103:UNC119_negation ENSG00000132639:SNAP25_negation 
                              1                               2 
  ENSG00000166900:STX3_negation   ENSG00000067715:SYT1_negation 
                              3                               4 
 ENSG00000163618:CADPS_negation   ENSG00000175416:CLTB_negation 
                              5                               6 

symbol2go <- read.table(file = "/home/user08/tirocinio/symbol2go.txt", sep = "\t",  header = TRUE, stringsAsFactors = FALSE)

symbol2go <- symbol2go[grep("GO", symbol2go$GO), ]
print(head(symbol2go))


SYMBOL
<chr>

GO
<chr>

1   A0A023HHK9  GO:0055114      
2   A0A023HHL0  GO:0055114      
3   COX1    GO:1902600      
4   COX1    GO:1902600      
5   COX1    GO:0022900      
6   COX1    GO:0022900

GO_terms <- sapply(unique(symbol2go$SYMBOL), function(x){ 
                        rows <- which(symbol2go$SYMBOL == x)
                        symbol2go[rows, 2]
                      })
    print((head(GO_terms)))

$A0A023HHK9
[1] "GO:0055114"

$A0A023HHL0
[1] "GO:0055114"

$COX1
   [1] "GO:1902600" "GO:1902600" "GO:0022900" "GO:0022900" "GO:1902600"
   [6] "GO:1902600" "GO:0022900" "GO:0022900" "GO:1902600" "GO:1902600"
  [11] "GO:0022900" "GO:0022900" "GO:1902600" "GO:1902600" "GO:0022900"
  [16] "GO:0022900" "GO:1902600" "GO:1902600" "GO:0022900" "GO:0022900"
  [21] "GO:1902600" "GO:1902600" "GO:0022900" "GO:0022900" "GO:1902600"
etc.
 [ reached getOption("max.print") -- omitted 2760 entries ]

$SLC6A8
 [1] "GO:0055085" "GO:0055085" "GO:0055085" "GO:0055085" "GO:0055085"
 [6] "GO:0015812" "GO:0055085" "GO:0015881" "GO:0055085" "GO:0015881"
[11] "GO:0055085" "GO:0015881" "GO:0055085" "GO:0015881"

$A0A023JDW0
[1] "GO:0098655"

$A0A023PX70
[1] "GO:0018108" "GO:0018108" "GO:0018108"



top500 <- function(allScore){
          return(allScore < 501)
        }

        x <- top500(geneList)
        sum(x)

[1] 500
GOdata <- new(
  "topGOdata", 
  ontology = "BP", 
  allGenes = allgenes, 
  geneSel= top500,
  annotation = annFUN.gene2GO, 
  gene2GO = GO_terms)

resultFisher <- runTest(GOdata, algorithm = "classic", statistic = "fisher")

> Error in if is.na(index) || index < 0 || index > length(nd))
> stop("vertex is not in graph: ", : missing value where TRUE/FALSE
> needed

    sessionInfo()

R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] org.Hs.eg.db_3.10.0  topGO_2.38.1         SparseM_1.78        
 [4] GO.db_3.10.0         AnnotationDbi_1.48.0 IRanges_2.20.2      
 [7] S4Vectors_0.24.4     Biobase_2.46.0       graph_1.64.0        
[10] BiocGenerics_0.32.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3          knitr_1.29          bit_4.0.4           lattice_0.20-38    
 [5] rlang_0.4.7         blob_1.2.1          tools_3.6.0         grid_3.6.0         
 [9] xfun_0.16           DBI_1.1.0           matrixStats_0.55.0  bit64_4.0.2        
[13] digest_0.6.22       BiocManager_1.30.10 vctrs_0.3.2         memoise_1.1.0      
[17] RSQLite_2.2.0       compiler_3.6.0      pkgconfig_2.0.3

software error topGO go • 1.1k views

ADD COMMENT • link updated 3.6 years ago by Kevin Blighe ★ 3.9k • written 3.6 years ago by maria.ratto • 0

score 2 · Accepted Answer · 2020-09-01

The main problem is that your named vector, allgenes, has names that comprise an Ensembl gene ID and a gene symbol (possiibly HGNC symbol):

[1] "ENSG00000109103:UNC119_negation" "ENSG00000132639:SNAP25_negation"
[3] "ENSG00000166900:STX3_negation"   "ENSG00000067715:SYT1_negation"  
[5] "ENSG00000163618:CADPS_negation"  "ENSG00000175416:CLTB_negation"

There is no way for topGo to intelligibly parse this information; so, you will have to split these into either Ensembl gene IDs or gene symbols.

x <- c("ENSG00000109103:UNC119_negation", "ENSG00000132639:SNAP25_negation",
  "ENSG00000166900:STX3_negation", "ENSG00000067715:SYT1_negation",
  "ENSG00000163618:CADPS_negation", "ENSG00000175416:CLTB_negation")

unlist(lapply(strsplit(x, ':'), function(x) x[1]))
[1] "ENSG00000109103" "ENSG00000132639" "ENSG00000166900" "ENSG00000067715" "ENSG00000163618" "ENSG00000175416"

unlist(lapply(strsplit(x, ':|_'), function(x) x[2]))
[1] "UNC119" "SNAP25" "STX3"   "SYT1"   "CADPS"  "CLTB"

--------------

For a reproducible working example, see my two answers here:

Note the key part where the input IDs are specified (see ID="symbol"):

allGO2genes <- annFUN.org(whichOnto="BP", feasibleGenes=NULL, mapping="org.Hs.eg.db", ID="symbol")

GOdata <- new(
  ...,
  annot = annFUN.GO2genes,
  GO2genes = allGO2genes)

Kevin