Question

STRINGdb : accessing all types of (sub-)scores from interactions

0

Entering edit mode

Wolfgang RAFFELSBERGER ▴ 140

@wolfgang-raffelsberger-2876

Last seen 8 months ago

France

Dear list,

I'm trying to run protein-protein-interaction network analysis (PPI) out of R using the package STRINGdb. The package STRINGdb allows me to run some 'minimal' analysis, however I have some more specific needs : When I extract the interactions between given proteins, only the combined score is returned. In the context of some specific projects, however, I don't want not use the max of all types of (sub-)scores (like 'Textmining', 'Experiments', 'Co‑expression', etc), but to ignore some of them (like 'Textmining').

Does anyone have a hint how to access these specific scores (as it is possible when running an analysis site https://string-db.org/ by manually clicking in 'Settings' in the fiels 'active interaction sources:' ) ?

Many thanks in advance, Wolfgang Raffelsberger

Here a tiny exmaple / minimal code :

library(BiocManager)
library("STRINGdb")
string_db <- STRINGdb::STRINGdb$new(version="11.5", species=9606, score_threshold=200, input_directory="")    # most recent
data1 <- data.frame(Gene.name=c("tp53","atm","egfr"), STRING_id=c("9606.ENSP00000269305","9606.ENSP00000278616","9606.ENSP00000275493"))
netwInt1 <- string_db$get_interactions(data1$STRING_id)

head(netwInt1)    # combined score only

STRINGdb$help("get_interactions")  # thus, the function operates with single argument only, no way to specify which scores I'd like to use

## My sessionInfo gives:
sessionInfo( )
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252 LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] STRINGdb_2.4.1            BiocManager_1.30.16       EnsDb.Hsapiens.v79_2.99.0 ensembldb_2.16.4          AnnotationFilter_1.16.0   GenomicFeatures_1.44.1   
 [7] GenomicRanges_1.44.0      GenomeInfoDb_1.28.1       org.Hs.eg.db_3.13.0       AnnotationDbi_1.54.1      IRanges_2.26.0            S4Vectors_0.30.0         
[13] Biobase_2.52.0            BiocGenerics_0.38.0

STRINGdb • 2.2k views

ADD COMMENT • link updated 4.2 years ago by damian.szk ▴ 110 • written 4.2 years ago by Wolfgang RAFFELSBERGER ▴ 140

0

Entering edit mode

At the moment it's not possible.

STRINGdb R package doesn't know about specific channels scores. The file with sub-scores tended to time-out during bioconductor testing, due to its size.

For now the only way to get these scores is to use flat-files. Sorry for the inconvenience.

Also the combined score is not calculates as max of subscores, so if we want (and we would have to for consistency sake) to reproduce how website works it is much more involved (and slower). We are looking into it.

ADD REPLY • link 4.2 years ago damian.szk ▴ 110

0

Entering edit mode

Dear Damian, thanks for the explanations. if one day it will be possible to access the various sub-scores I'd be glad to use them. You are right, when looking at some examples I realized the combined score is not a simple maximum...

Since one of my collaborators is very much interested in the co-expression part I've identified some co-expression data-bases to use for this purpose and I'm considering building some tools for this specific 'component' and making a small R package. However, some preliminary checks on a few proteins revealed that these scores/results may differ quite a bit to the co-expression component of String. Of course at this point I can't tell how is better "representative". I'll post here if I'll have more...

Wolfgang

ADD REPLY • link 4.2 years ago Wolfgang RAFFELSBERGER ▴ 140

0

Entering edit mode

Dear Wolfgang,

Thanks for the update. STRING coexpression is build on all available array data from GEO, RNASeq from Expression Atlas and proteomics data from ProteomeHD (Rappsilber lab). It should be quite comprehensive with little noise for high-scoring edges.

The sub-score eventually will come, but first there are other issues to tackle with the STRINGdb R package. Meantime I would encourage you to use download files or the API (HELP->API) The explanation how to combine the script together with python script that does that is available Help->FAQ.

Best, Damian.

ADD REPLY • link 4.2 years ago damian.szk ▴ 110