I am trying to use RpsiXML to parse human interaction data downloaded from IntAct with the goal of building a human PPI network. I have downloaded the file human.zip
from ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psi25/species/
The archive unzips to 172 individual files.
For example, human01.xml is a 43.2 Mb file;
grep "<primaryRef db=\"uniprotkb\"" human_01.xml | wc -l
... gives me 1187 lines. This already seems low - but I don't even get that with RpsiXML:
library("RpsiXML") intact01xml <- parsePsimi25Interaction("./human/human_01.xml", INTACT.PSIMI25, verbose=FALSE) length(interactions(intact01xml)) # 2 intactGraph <- psimi25XML2Graph("./human/human_01.xml", INTACT.PSIMI25, type = "interaction", verbose=FALSE) length(nodes(intactGraph)) # 87 length(edges(intactGraph)) # 87 ... |nodes| == |edges| ??? table(degree(intactGraph)) # outDegree # inDegree 0 1 2 3 5 # 0 0 37 5 0 0 # 1 9 12 4 1 0 # 2 4 4 4 0 0 # 3 1 2 0 0 0 # 4 0 0 1 0 0 # 5 1 0 0 0 0 # 7 0 0 0 0 1 # 16 1 0 0 0 0
87 interactions in 43.2 MB of data? Something seems amiss. I might misunderstand what to expect in this set of IntAct files, or how to properly parse the file. Help appreciated.