I am trying to use RpsiXML to parse human interaction data downloaded from IntAct with the goal of building a human PPI network. I have downloaded the file human.zip from ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psi25/species/ The archive unzips to 172 individual files.
For example, human01.xml is a 43.2 Mb file;
grep "<primaryRef db=\"uniprotkb\"" human_01.xml | wc -l
... gives me 1187 lines. This already seems low - but I don't even get that with RpsiXML:
library("RpsiXML")
intact01xml <- parsePsimi25Interaction("./human/human_01.xml",
INTACT.PSIMI25,
verbose=FALSE)
length(interactions(intact01xml)) # 2
intactGraph <- psimi25XML2Graph("./human/human_01.xml",
INTACT.PSIMI25,
type = "interaction",
verbose=FALSE)
length(nodes(intactGraph)) # 87
length(edges(intactGraph)) # 87 ... |nodes| == |edges| ???
table(degree(intactGraph))
# outDegree
# inDegree 0 1 2 3 5
# 0 0 37 5 0 0
# 1 9 12 4 1 0
# 2 4 4 4 0 0
# 3 1 2 0 0 0
# 4 0 0 1 0 0
# 5 1 0 0 0 0
# 7 0 0 0 0 1
# 16 1 0 0 0 0
87 interactions in 43.2 MB of data? Something seems amiss. I might misunderstand what to expect in this set of IntAct files, or how to properly parse the file. Help appreciated.
