Error in topGO analysis
1
0
Entering edit mode
@mariaratto-24085
Last seen 3.6 years ago

I'm trying to construct the topGOdata, but I keep getting an error. What am I doing wrong? Thanks for the help.

allmarkers1 <- as.vector(unlist(markers1$gene_1))
print(head(allmarkers1))
allgenes <- as.numeric(c(1:length(allmarkers1)))
names(allgenes) <- allmarkers1
head(allgenes)
[1] "ENSG00000109103:UNC119_negation" "ENSG00000132639:SNAP25_negation"
[3] "ENSG00000166900:STX3_negation"   "ENSG00000067715:SYT1_negation"  
[5] "ENSG00000163618:CADPS_negation"  "ENSG00000175416:CLTB_negation"  
ENSG00000109103:UNC119_negation ENSG00000132639:SNAP25_negation 
                              1                               2 
  ENSG00000166900:STX3_negation   ENSG00000067715:SYT1_negation 
                              3                               4 
 ENSG00000163618:CADPS_negation   ENSG00000175416:CLTB_negation 
                              5                               6 

symbol2go <- read.table(file = "/home/user08/tirocinio/symbol2go.txt", sep = "\t",  header = TRUE, stringsAsFactors = FALSE)

symbol2go <- symbol2go[grep("GO", symbol2go$GO), ]
print(head(symbol2go))


SYMBOL
<chr>

GO
<chr>

1   A0A023HHK9  GO:0055114      
2   A0A023HHL0  GO:0055114      
3   COX1    GO:1902600      
4   COX1    GO:1902600      
5   COX1    GO:0022900      
6   COX1    GO:0022900

GO_terms <- sapply(unique(symbol2go$SYMBOL), function(x){ 
                        rows <- which(symbol2go$SYMBOL == x)
                        symbol2go[rows, 2]
                      })
    print((head(GO_terms)))

$A0A023HHK9
[1] "GO:0055114"

$A0A023HHL0
[1] "GO:0055114"

$COX1
   [1] "GO:1902600" "GO:1902600" "GO:0022900" "GO:0022900" "GO:1902600"
   [6] "GO:1902600" "GO:0022900" "GO:0022900" "GO:1902600" "GO:1902600"
  [11] "GO:0022900" "GO:0022900" "GO:1902600" "GO:1902600" "GO:0022900"
  [16] "GO:0022900" "GO:1902600" "GO:1902600" "GO:0022900" "GO:0022900"
  [21] "GO:1902600" "GO:1902600" "GO:0022900" "GO:0022900" "GO:1902600"
etc.
 [ reached getOption("max.print") -- omitted 2760 entries ]

$SLC6A8
 [1] "GO:0055085" "GO:0055085" "GO:0055085" "GO:0055085" "GO:0055085"
 [6] "GO:0015812" "GO:0055085" "GO:0015881" "GO:0055085" "GO:0015881"
[11] "GO:0055085" "GO:0015881" "GO:0055085" "GO:0015881"

$A0A023JDW0
[1] "GO:0098655"

$A0A023PX70
[1] "GO:0018108" "GO:0018108" "GO:0018108"



top500 <- function(allScore){
          return(allScore < 501)
        }

        x <- top500(geneList)
        sum(x)

[1] 500
GOdata <- new(
  "topGOdata", 
  ontology = "BP", 
  allGenes = allgenes, 
  geneSel= top500,
  annotation = annFUN.gene2GO, 
  gene2GO = GO_terms)

resultFisher <- runTest(GOdata, algorithm = "classic", statistic = "fisher")

> Error in if is.na(index) || index < 0 || index > length(nd))
> stop("vertex is not in graph: ", : missing value where TRUE/FALSE
> needed

    sessionInfo()

R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] org.Hs.eg.db_3.10.0  topGO_2.38.1         SparseM_1.78        
 [4] GO.db_3.10.0         AnnotationDbi_1.48.0 IRanges_2.20.2      
 [7] S4Vectors_0.24.4     Biobase_2.46.0       graph_1.64.0        
[10] BiocGenerics_0.32.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3          knitr_1.29          bit_4.0.4           lattice_0.20-38    
 [5] rlang_0.4.7         blob_1.2.1          tools_3.6.0         grid_3.6.0         
 [9] xfun_0.16           DBI_1.1.0           matrixStats_0.55.0  bit64_4.0.2        
[13] digest_0.6.22       BiocManager_1.30.10 vctrs_0.3.2         memoise_1.1.0      
[17] RSQLite_2.2.0       compiler_3.6.0      pkgconfig_2.0.3
software error topGO go • 1.1k views
ADD COMMENT
2
Entering edit mode
Kevin Blighe ★ 3.9k
@kevin
Last seen 7 days ago
Republic of Ireland

The main problem is that your named vector, allgenes, has names that comprise an Ensembl gene ID and a gene symbol (possiibly HGNC symbol):

[1] "ENSG00000109103:UNC119_negation" "ENSG00000132639:SNAP25_negation"
[3] "ENSG00000166900:STX3_negation"   "ENSG00000067715:SYT1_negation"  
[5] "ENSG00000163618:CADPS_negation"  "ENSG00000175416:CLTB_negation"

There is no way for topGo to intelligibly parse this information; so, you will have to split these into either Ensembl gene IDs or gene symbols.

x <- c("ENSG00000109103:UNC119_negation", "ENSG00000132639:SNAP25_negation",
  "ENSG00000166900:STX3_negation", "ENSG00000067715:SYT1_negation",
  "ENSG00000163618:CADPS_negation", "ENSG00000175416:CLTB_negation")

unlist(lapply(strsplit(x, ':'), function(x) x[1]))
[1] "ENSG00000109103" "ENSG00000132639" "ENSG00000166900" "ENSG00000067715" "ENSG00000163618" "ENSG00000175416"

unlist(lapply(strsplit(x, ':|_'), function(x) x[2]))
[1] "UNC119" "SNAP25" "STX3"   "SYT1"   "CADPS"  "CLTB"

--------------

For a reproducible working example, see my two answers here:

Note the key part where the input IDs are specified (see ID="symbol"):

allGO2genes <- annFUN.org(whichOnto="BP", feasibleGenes=NULL, mapping="org.Hs.eg.db", ID="symbol")

GOdata <- new(
  ...,
  annot = annFUN.GO2genes,
  GO2genes = allGO2genes)

Kevin

ADD COMMENT
0
Entering edit mode

Thanks a lot, it works now. I don't know how I didn't notice.

ADD REPLY
0
Entering edit mode

Sure thing - no problem

ADD REPLY

Login before adding your answer.

Traffic: 833 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6