Problems with RefNet hanging
2
0
Entering edit mode
Grimes Mark ▴ 40
@grimes-mark-5393
Last seen 8.0 years ago
United States

Problems with RefNet

I am getting timeout and other errors when sending  a group of genes to RefNet.  In summary, simple queries work, more complex ones hang and return errors.  The worst part is that my browser becomes dysfunctional for a while afterwards. Even if R is not showing activity after an escape, the internet connection is tied up somehow for 10-15 min or longer.

Session info pasted below.

Thanks for your help,

Mark

Queries with single genes work. Queries with 10 genes that retrieve no interactions also do not produce errors.
_______________________
Examples of Failures
______________________

> genes
 [1] "TMEM25"       "TAMM41"       "LOC619574"    "HMGN3"        "ZFAND2B"      "CDK5RAP1"     "AHI1"         "TDP2"         "CPOX"         "MRPL16"      
[11] "DNM1L"        "CSNK1A1"      "RAB40B"       "GPC4"         "KBTBD11"      "RGD1563863"   "CYP51"        "RB1"          "MEF2A"        "TCF15"       
[21] "CNO"          "LOC100174910" "FAM69A"       "SPHK2"        "SCYL1"        "SPARC"        "EPHA5"        "RGD1560433"   "LOC501282"    "ANKRD10"     
[31] "ZFP821"     


# After failures, Try removing orfs with names unlikely to possess interaction information:

>     genes.1 <- genes[-grep("LOC", genes)]
>     genes.2 <- genes.1[-grep("RGD", genes.1)]


>     tbl.2 <- interactions(refnet, species="9606", id=genes.2, speciesExclusive=FALSE, provider=c("gerstein-2012", "BioGrid")) 
List of 2
 $ message: chr "Failed connect to tyersrest.tyerslab.com:8805; Operation timed out"
 $ call   : language function (type, msg, asError = TRUE)  { ...
 - attr(*, "class")= chr [1:4] "COULDNT_CONNECT" "GenericCurlError" "error" "condition"
character(0)
[1] "failed url: http://tyersrest.tyerslab.com:8805/psicquic/webservices/current/search/query/identifier:%28CPOX%20AND%20ANKRD10%29%20AND%20species:9606"
List of 2
 $ message: chr "Failed connect to tyersrest.tyerslab.com:8805; Operation timed out"
 $ call   : language function (type, msg, asError = TRUE)  { ...
 - attr(*, "class")= chr [1:4] "COULDNT_CONNECT" "GenericCurlError" "error" "condition"
character(0)
[1] "failed url: http://tyersrest.tyerslab.com:8805/psicquic/webservices/current/search/query/identifier:%28MRPL16%20AND%20ANKRD10%29%20AND%20species:9606"
List of 2
 $ message: chr "Failed connect to tyersrest.tyerslab.com:8805; Operation timed out"
 $ call   : language function (type, msg, asError = TRUE)  { ...
 - attr(*, "class")= chr [1:4] "COULDNT_CONNECT" "GenericCurlError" "error" "condition"
character(0)
[1] "failed url: http://tyersrest.tyerslab.com:8805/psicquic/webservices/current/search/query/identifier:%28DNM1L%20AND%20ANKRD10%29%20AND%20species:9606"
<ESC>
traceback()


> traceback()
18: match.fun(FUN)
17: lapply(X = X, FUN = FUN, ...)
16: sapply(paste("str", cl, sep = "."), function(ob) exists(ob, mode = "function", 
        inherits = TRUE))
15: str.default(err)
14: str(err)
13: sprintf("%s, %s", str(err), "server not responding")
12: print(sprintf("%s, %s", str(err), "server not responding"))
11: value[[3L]](cond)
10: tryCatchOne(expr, names, parentenv, handlers[[1L]])
9: tryCatchList(expr, classes, parentenv, handlers)
8: tryCatch({
       txt <- getURL(query.url)
       if (nchar(txt) == 0) 
           return(data.frame(stringsAsFactors = FALSE))
       ftmp <- tempfile()
       write(txt, file = ftmp)
       result <- read.table(file = ftmp, , sep = "\t", header = FALSE, 
           fill = TRUE, quote = "\"", stringsAsFactors = FALSE)
       if (!quiet) 
           .printf("--- %s result: %d %d", query.url, nrow(result), 
               ncol(result))
       return(result)
   }, error = function(err) {
       print(sprintf("%s, %s", str(err), "server not responding"))
       print(sprintf("failed url: %s", query.url))
       return(data.frame(stringsAsFactors = FALSE))
   })
7: .retrieveData(query.url, quiet)
6: .runQuery(base.url, a, b, species, speciesExclusive, type, detectionMethod, 
       publicationID, quiet)
5: interactions(object@psicquic, id, species, speciesExclusive, 
       type, psicquic.providers, detectionMethod, publicationID, 
       quiet)
4: interactions(object@psicquic, id, species, speciesExclusive, 
       type, psicquic.providers, detectionMethod, publicationID, 
       quiet)
3: .interactions(object, id, species, speciesExclusive, type, provider, 
       detectionMethod, publicationID, quiet)
2: interactions(refnet, species = "9606", id = genes.2, speciesExclusive = FALSE, 
       provider = c("gerstein-2012", "BioGrid"))
1: interactions(refnet, species = "9606", id = genes.2, speciesExclusive = FALSE, 
       provider = c("gerstein-2012", "BioGrid"))

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] parallel  splines   grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocInstaller_1.16.0 rgl_0.93.1098        plyr_1.8.1           RefNet_1.0.4         RCurl_1.95-4.3       bitops_1.0-6         AnnotationHub_1.4.0 
 [8] PSICQUIC_1.2.1       biomaRt_2.20.0       Biostrings_2.32.1    XVector_0.4.0        IRanges_1.22.10      GOstats_2.30.0       Category_2.30.0     
[15] Matrix_1.1-4         GO.db_2.14.0         org.Hs.eg.db_2.14.0  RSQLite_1.0.0        DBI_0.3.1            AnnotationDbi_1.26.1 GenomeInfoDb_1.0.2  
[22] Biobase_2.24.0       BiocGenerics_0.10.0  RCytoscape_1.14.0    XMLRPC_0.3-0         graph_1.42.0         devtools_1.6.1       gplots_2.14.2       
[29] Rtsne_0.9            tsne_0.1-2           vegan_2.0-10         permute_0.8-3        RUnit_0.4.27         adegenet_1.4-2       ade4_1.6-2          
[36] cluster_1.15.3       Hmisc_3.14-5         Formula_1.1-2        survival_2.37-7      lattice_0.20-29      stringr_0.6.2        fdrtool_1.2.12      
[43] qgraph_1.2.5        

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3          annotate_1.42.1          AnnotationForge_1.6.1    ape_3.1-4                caTools_1.17.1           colorspace_1.2-4        
 [7] corpcor_1.6.7            digest_0.6.4             ellipse_0.3-8            foreign_0.8-61           gdata_2.13.3             genefilter_1.46.1       
[13] GenomicRanges_1.16.4     ggplot2_1.0.0            glasso_1.8               gridSVG_1.4-0            GSEABase_1.26.0          gtable_0.1.2            
[19] gtools_3.4.1             htmltools_0.2.6          httpuv_1.3.2             httr_0.5                 huge_1.2.6               igraph_0.7.1            
[25] interactiveDisplay_1.2.0 jpeg_0.1-8               KernSmooth_2.23-13       latticeExtra_0.6-26      lavaan_0.5-17            MASS_7.3-35             
[31] matrixcalc_1.0-3         mime_0.2                 mnormt_1.5-1             munsell_0.4.2            nlme_3.1-118             nnet_7.3-8              
[37] pbivnorm_0.5-1           png_0.1-7                proto_0.3-10             psych_1.4.8.11           quadprog_1.5-5           R6_2.0.1                
[43] RBGL_1.40.1              RColorBrewer_1.0-5       Rcpp_0.11.3              reshape2_1.4             rjson_0.2.14             RJSONIO_1.3-0           
[49] rpart_4.1-8              scales_0.2.4             sem_3.1-5                shiny_0.10.2.1           sna_2.3-2                stats4_3.1.1            
[55] tools_3.1.1              XML_3.98-1.1             xtable_1.7-4             zlibbioc_1.10.0         

 

tbl.1 <- interactions(refnet, species="9606", id=genes[1:20], speciesExclusive=FALSE, provider=c( "STRING" ))    
List of 2
 $ message: chr "Failed connect to string.uzh.ch:80; Operation timed out"
 $ call   : language function (type, msg, asError = TRUE)  { ...
 - attr(*, "class")= chr [1:4] "COULDNT_CONNECT" "GenericCurlError" "error" "condition"
character(0)
[1] "failed url: http://string.uzh.ch/psicquic/webservices/current/search/query/identifier:%28DNM1L%20AND%20CSNK1A1%29%20AND%20species:9606"


… (error messages like this repeat with different proteins and providers)
____

# This problem ties up my computer for some time afterwards.  I got this: 
>     mapper <- IDMapper("9606")
connecting to biomart...
Request to BioMart web service failed. Verify if you are still connected to the internet.  Alternatively the BioMart web service is temporarily down.  Check http://www.biomart.org and verify if this website is available.
Error: XML content does not seem to be XML: 
  
??? Internet problem???  My browser becomes dysfunctional for a while afterwards. Even if R is not showing activity. 

# A bit later, after the browser starts working again:
>     mapper <- IDMapper("9606")
connecting to biomart...
> mapper
An object of class "IDMapper"
Slot "species":
[1] "9606"

Slot "mart":
Object of class 'Mart':
 Using the ensembl BioMart database
 Using the hsapiens_gene_ensembl dataset

software error • 1.5k views
ADD COMMENT
0
Entering edit mode
pshannon ▴ 100
@pshannon-6931
Last seen 8.5 years ago
United States

Hi Mark,

Thanks for this bug report.  I am glad to see the RefNet (and thus PSICQUIC) packages in use.

I will figure out and fix the browser hang you see, which follows upon one of the PSICQUIC servers not responding.  Your report shows that happening for BioGrid, whose PSICQUIC server at http://tyersrest.tyerslab.com:8805 seems to be the proximate cause of your trouble.

Something to keep in mind is the combinatorial expansion of queries you get when, as in your example, many genes are submitted in one query.  The PSICQUIC protocol supports either one or two gene symbols per query.  In the first case, all interactions involving that gene are returned.   In the second, only interactions between the two genes are returned.  Thus (as I say, but perhaps not clearly enough) in the PSICQUIC vignette: 

6 Retrieve Interactions Among a Set of Genes
If the id argument to the interactions method contains two or more gene symbols, then all interactions among all possible
pairs of those genes will be retrieved. Keep in mind that the number of unique combinations grows larger non-linearly with
the number of genes supplied, and that each unique pair becomes a distinct query to each of the specified providers.

 

With your query looking for interactions among 26 genes, this leads to 325 queries to BioGrid.  Apparently that is more than that particular server can handle, and some of the queries stalled.  

If the PSCICQUIC (and thus RefNet) protocol supported sending >= 3 genes per query, then your query to BioGrid would mapped to just a single query to BioGrid.  Alas, that is not yet offered by the PSICQUIC community.

A good workaround -- maybe we could add this to the PSICQUIC and RefNet packages -- is to do single-gene queries using the interactions method, combining the resulting data.frames (use plyr's rbind.fill) into a single data.frame, and then filtering that so that only your desired interactions remain.   (Note that for your 26 genes, no joint interactions were reported by BioGrid.  Your query runs fine for me today (a Saturday,  so maybe the BioGrid load is low) but an empty data.frame is returned.)

I also recommend that you set "speciesExclusive=TRUE".   I think that is what most people want most of the time.  The best use of "speciesExclusive=FALSE" is for protein-protein interactions between host and parasite, which some of the PSICQUC sources report.

When things go awry, or you suspect they might, then  quiet=FALSE can be helpful in calls to the interaction method.

On Monday I will followup with some pointers on running the RefNet shiny app, which you might find useful for point-and-click examination of the pubmed abstracts for the papers from which the interactions were inferred, along with Entrez gene web pages for the two interacting genes.

Let me know if the separate-query, divide-and-conquer approach sidesteps the BioGrid hang that you see.

I will look for a way to prevent failed queries (hung queries) from hanging up your entire computer.  Sorry about that!

 - Paul

 

ADD COMMENT
0
Entering edit mode
Grimes Mark ▴ 40
@grimes-mark-5393
Last seen 8.0 years ago
United States

Paul

 

Thanks for your reply.  My normal workflow with web-based interaction queries (STRING, GeneMANIA, etc.) involves copying and pasting a list of gene names and setting the number of non-queried genes returned to 0, 10, 20, etc.  

 

Thanks for addressing the unexpected hang issue somehow. I never got 26 genes to run, but it did sometimes work with 10, but only if no interactions were returned.  The one-at-a-time query method you propose may be encoded in a loop and the network files concatenated and filtered afterwards.  This might take a long time, but I'll give it a try.

 

I set "speciesExclusive=TRUE" because I am using rat genes for a query (most gene names are conserved except the case convention is different) and I also would like to retrieve pathway and genetic interactions known from model organism studies. Which again has been part of my workflow when working with web-based PPI databases.  Perhaps this was a mistake?  

 

--Mark

 

 

 

 

ADD COMMENT

Login before adding your answer.

Traffic: 556 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6