Hello, I have noticed that the server response time for mapping 15-20 genes/proteins is 40-70 seconds. Is there a way to speed up the process?"
Initialize the STRINGdb object
string_db <- STRINGdb$new(version = "12.0", species = OMcode, score_threshold = score_threshold, network_type = network_type, input_directory = "")
Map genes in the merged_tab_reaz dataframe to STRING IDs
my_df <- string_db$map(reazionifinale, "GENE", removeUnmappedRows = TRUE)
platform x86_64-w64-mingw32
arch x86_64
os mingw32
crt ucrt
system x86_64, mingw32
status
major 4
minor 3.1
year 2023
month 06
day 16
svn rev 84548
language R
version.string R version 4.3.1 (2023-06-16 ucrt)
nickname Beagle Scouts
It should not take that long.
That said the method loads all aliases every time you call it which takes 5 to 10s, so it's not particularly efficient in mapping small protein sets repeatedly.
40-70s is way too long though.
If it's a new species (you map it the first time) STRINGdb package must download the alias file (which may take tens of seconds). Ensure the alias files with all the names is already downloaded by setting the input_directory to a directory that does not change between the runs. The second time you run the species it should take less than 10s more or less independently of your input size.
Hope that solves the problem.
Thank you for your prompt reply, below I show you the code I am working with. I have inserted commands to check the duration of the most time consuming steps. For mapping I had 33 seconds. I point out that getting the interactions is also quite time consuming :
The question I ask is: is there any way to reduce the response time for mapping?
You can try it yourself using the code below:
<h6>#</h6>Start time monitoring
start_time <- Sys.time()
Initialize the STRINGdb object
string_db <- STRINGdb$new(version = "12.0", species = 9606, score_threshold = 600, network_type = "physical", input_directory = "")
end_time1 <- Sys.time() print(paste("Initialize the STRINGdb object:", as.numeric(difftime(end_time1, start_time, units = "secs")), "seconds"))
input <-c("PLD2","PLD1","PLD4","PLD3","EPT1","FAM73B","FAM73A", "CEPT1") input_df <- data.frame(gene_name = input)
Map genes in the dataframe merged_tab_reaz to STRING IDs
input_mapped <- string_db$map(input_df, my_data_frame_id_col_names = c("gene_name"), removeUnmappedRows = TRUE)
end_time2 <- Sys.time() print(paste("Map genes in the dataframe:", as.numeric(difftime(end_time2, end_time1, units = "secs")), "seconds"))
Display input_mapped
print(input_mapped)
<h6>#</h6>btw. The mapping in STRINGdb bionc package is local. R package does not communicate with the STRING server to map the proteins. You should be able to run it without an internet connection.
Again. Make sure it does not redownload the mapping data every time you run this script. I cannot know this from the provided code.