Entering edit mode
Hi Frank,
I am most happiliy using the rBiopaxParser package, and your vignette,
in order to extract detailed (but topologically simple) interaction
data from the latest Reactome "Homosapiens.owl". Your package offers
great power and convenience.
However, I run into difficulty with namespaces.
For a simple example, consider this one line from the method
listIntances, found in the file R/selectBiopax.R:
sel = sel & (tolower(biopax$df$class) %in% tolower(stripns(class)))
As parsed from Homosapiens.owl, the class column of biopax$df has
values like these, always containing a namespace prefix:
head(unique(biopax$df$class))
"bp:BiochemicalReaction" "bp:Protein"
"bp:CellularLocationVocabulary" "bp:UnificationXref"
"bp:ProteinReference" "bp:BioSource"
By stripping the namespace off of "bp:Protein" (the right hand side of
the %in% clause) it cannot match the biopax$df$class value, as parsed
from the owl file (which preserves the "bp:").
I believe I see similar logic in other places, with these methods
specifically encountered so far:
selectInstances
listPathwayComponents
Namespaces are used with the "property" column as well:
head(table(biopax$df$property), n=3)
bp:author bp:cellularLocation bp:comment
55654 23838 123750
Speaking from the nickel seats, and not claiming to understand all of
the implications: perhaps these could be neatly avoided if your
readBiopax method could optionally eliminate namespaces when reading
in an owl file?
Thanks,
- Paul