How to download All Protein Protein Interactions of an organism with Bioconductor
1
0
Entering edit mode
hkarakurt ▴ 20
@hkarakurt-12988
Last seen 7 months ago

Hello everyone,

I need to download whole protein-protein interaction network of Mus musculus. I downloaded from STRING but the file is too big to manipulate (about 12 million lines). I need to delete some interactions with low confidence score but I cannot open this file with Matlab, R, Python or Excel. 

I really need to find this network. Has anyone ever download such a network with a package of R?

ppi stringdb mus musculus • 422 views
ADD COMMENT
0
Entering edit mode
damian.szk ▴ 20
@damianszk-12963
Last seen 17 days ago
Switzerland

You are right, it's impossible to open such a large file in Excel or any other visual editing tool.

However R, Python or Matlab can easily parse the file, because for simple parsing the size does not matter. Just read it line by line and output the lines you actually need. Here is a simple python script that outputs only high confidence scores:

import gzip

fh_out = open("pruned_file.tsv", "w")

header = True
for line in gzip.open("10090.protein.links.v10.5.txt.gz"):
    if header: # skip the first line
        header = False
        continue

    row = line.strip().split("\t")
    score = int(row[-1])

    if score >= 700: # only high confidence interactions 
        fh_out.write(line)

fh_out.close()

This will considerably cut the size of the file. Probably still not enough to open it in Excel but enough to load all of the remaining interactions into memory in just a few seconds. 

Hope this somehow helps.

 

 

ADD COMMENT

Login before adding your answer.

Traffic: 237 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6