Question

paxtoolsr how to extract Modificationfeature related to a node

1

Entering edit mode

Guido Leoni ▴ 200

@guido-leoni-3328

Last seen 10.5 years ago

European Union

Dear All

I built a small network with paxtoolsr

genes <- c("AKT1", "MTOR")
t1 <- graphPc(source=genes)

and I correctly saved a biopax file with the results of my query

saveXML(t1,file="parser.owl")

Looking into my network I have (for example for AKT1) several nodes of the same gene corresponding to the protein with different combinations of ModificationFeatures. Each modificated node has a specific ID field

Is there a way to extract Modification feature related to one node?

I try quering the Pathway Commons Web Services with the ID of nodes

query<-getPc("http://purl.org/pc2/4/Protein_7e40725729befee922644c6245676e02")

But for different ID associated to AKT i retrieve always the same ModificationFeature

Any suggestion could be very helpful

Thank you very much

Guido

pathway paxtools • 1.7k views

ADD COMMENT • link 10.5 years ago Guido Leoni ▴ 200

score 1 · Answer 1 · 2014-10-22

This example should pull out information about protein modification features for a given gene. It uses the traverse() command in paxtoolsr. Since the results from traverse is an XML document you need to use XPath with xpathSApply().

library(paxtoolsr)

# Get the protein Uniprot ID for a given HGNC gene symbol
id <- idMapping("AKT1")

# Covert the Uniprot ID to a URI that would be found in the Pathway Commons database
uri <- paste0("http://identifiers.org/uniprot/", id)

# Get URIs for only the ModificationFeatures of the protein
xml <- traverse(uri=uri, path="ProteinReference/entityFeature:ModificationFeature")

# Extract all the URIs
uris <- xpathSApply(xml, "//value/text()", xmlValue)

results <- NULL

# For each URI get the modification position, status, and type
for(i in 1:length(uris)) {
    tmpXml <- traverse(uri=uris[i], path="ModificationFeature/featureLocation:SequenceSite/sequencePosition")
    modPos <- xpathSApply(tmpXml, "//value/text()", xmlValue)

    tmpXml <- traverse(uri=uris[i], path="ModificationFeature/featureLocation:SequenceSite/positionStatus")
    modPosStatus <- xpathSApply(tmpXml, "//value/text()", xmlValue)

    tmpXml <- traverse(uri=uris[i], path="ModificationFeature/modificationType/term")
    modPosType <- xpathSApply(tmpXml, "//value/text()", xmlValue)

    tmpResults <- c(modPos, modPosStatus, modPosType)
    results <- rbind(results, tmpResults)
}

colnames(results) <- c("modificationPosition", "modificationPositionStatus", "modificationType")

score 0 · Answer 2 · 2014-10-23

Thank for your answer

Unfortunately It solve partially my problem. With your solution I obtain all the modificationFeatures related to the gene AKT1.

On the other hand I can have an AKT1 protein that has for example 2 of these modification. My final goal is to have a list of all the proteins annotated for a gene with their modificationFeatures.

For example In the network created for AKT1,MTOR I have the AKT1 proteins:

Protein_e9285d649a9462c9f5994835bdc846d9 with 2 ModificationFeatures( Phosphoserine473 and Phosphothreonine308)

Protein_c3c09444ef92f46585df73f96b89a4b6 with only 1 (Phosphoserine473)

My final goal is to obtain a list with these protein ID and their modificationFeatures.

today I was able to obtain my goal with the code pasted below that utilizes your package combined with another ones. But the problem is that the other package converts the biopax model in a dataframe and this conversion it could be quite long for big networks.

Best

Guido

library(paxtoolsr)
library(rBiopaxParser)
library(RCurl)
#download path of interactions from CPTA2
genes <- c("AKT1","MTOR")
t1 <- graphPc(source=genes,kind="NEIGHBORHOOD",format="BIOPAX")

#save the biopax file
saveXML(t1,file="net.owl")
biopax<-readBiopax("net.owl")

#select the protein name in biopax
protein_name<-selectInstances(biopax,class="Protein")
protein_name_unique<-unique(protein_name$id)

#for each protein name
data <- data.frame()
for (i in 1:length(protein_name_unique)) {
#extract the modification features
protein_feat<-selectInstances(biopax, id = protein_name_unique[i],property="feature" )
protein_name<-selectInstances(biopax,id=protein_name_unique[i],property="name")
protein_feat2<-subset(protein_feat,grepl("ModificationFeature",protein_feat$property_attr_value))
#print(protein_name_unique[i])
#print (nrow(protein_feat2))
#if there are some
if (nrow(protein_feat2)>0) {
#we download the biopax of modification feature from biopax
collect_vect<-character()
for (z in 1:nrow(protein_feat2)) {
print(protein_feat2$id[z])
print(protein_feat2$property_attr_value[z])
uri<-paste("http://www.pathwaycommons.org/pc2/",substr(protein_feat2$property_attr_value[z],2,53),sep="")
print(uri)
tmpXML<-getURI(uri)
tmpresults <- xmlTreeParse(tmpXML, useInternalNodes = TRUE)
saveXML(tmpresults,file="tmp.owl")
biopaxtmp<-readBiopax("tmp.owl")
type_feat<-selectInstances(biopaxtmp,class="SequenceModificationVocabulary",property="term")
position_feat<-selectInstances(biopaxtmp,class="SequenceSite",property="sequencePosition")
string_feat<-paste(position_feat$property_value,type_feat$property_value,sep=" ")
#print (string_feat)
collect_vect<-append(collect_vect,string_feat)
}
sort(collect_vect)
line0<-paste(collect_vect,collapse=";")
line01<-paste(protein_name_unique[i],line0,sep="\t")
line<-paste(protein_name$property_value,line01,sep="\t")
#protein_name_unique[i]
#collect_vect
#print(collect_vect)
write(line,file="myfile.txt",append=TRUE)
}
else {
line<-paste(protein_name$property_value,protein_name_unique[i],sep="\t")
write(line,file="myfile.txt",append=TRUE)

}


}