Question: KEGGSOAP questions: how to handle multiple annotations in R data frames, and why does a "for" loop only use one annotation.
0
gravatar for ALAN SMITH
13.0 years ago by
ALAN SMITH40
ALAN SMITH40 wrote:
Hello, I am attempting to use R to query KEGG in order to find cpd IDs from neutral masses and eventually link these cpd IDs up to the pathways they are part of. I have several question that are listed after the example R code for the problems. Finally, at the end is what I think the ideal result would look like, that I cannot achieve. ######### session info ################## > sessionInfo() R version 2.4.0 (2006-10-03) i386-pc-mingw32 attached base packages: [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets" "base" other attached packages: KEGG KEGGSOAP SSOAP RCurl XML "1.8.1" "1.9.1" "0.4-0" "0.8-0" "1.2-0" #################################################################### #Example R code for the problem I am having.# cpdID<-c(1,2,3,4,5,6) mass<-c(129.0426, 147.0532, 208.0848, 220.0848, 204.0899, 777.0317) RT<-c(1,2,3,4,5,6) ppmerror<-c(4,11,75,7,21,55) floatmass=NULL for (i in 1:length(cpdID)) { floatmass[i]<-if(ppmerror[i]<10) {1e-5*mass[i]} else{(ppmerror[i]/10^6)*mass[i]} } testdata<-as.data.frame(cbind(cpdID, mass, RT, ppmerror, floatmass)) library(KEGGSOAP) library(KEGG) KEGGID=NULL for (i in 1:length(testdata$cpdID)) { KEGGID[i]<-(search.compounds.by.mass(testdata$mass[i], testdata$floatmass[i])) } KEGGID tt<-cbind(KEGGID,testdata) ### this cbind does not work vectors are different sizes #### ###the objects below contain the full query results for each output in the loop above#### a<-t(as.data.frame(search.compounds.by.mass(129.0426,0.00129))) b<-t(as.data.frame(search.compounds.by.mass(147.0532,0.00161))) c<-t(as.data.frame(search.compounds.by.mass(208.0848,0.0156))) d<-t(as.data.frame(search.compounds.by.mass(220.0848,0.0022))) e<-t(as.data.frame(search.compounds.by.mass(204.0899,0.0042))) f<-t(as.data.frame(search.compounds.by.mass(777.0317,0.0427))) Problem #1 (probably has to do with how R works) Each queried mass except the last which has none has more than one annotation. Why does R only fill one (the first annotation returned of the query result, while truncating the rest) value in the KEGGID loop? How can I produce an output that will allow all the annotations for each mass to be hooked back up to the table testdata using a loop that can cycle through the table testdata? Problem #2 What does R consider an output value from a KEGG query to be? If I am going to solve problem 1 I need some way to fill in the missing annotation where nothing is returned from KEGG. Currently this data is skipped causing output to be too short. How can I write an if else statement (or something similar) to fill in NA or a phrase like "no hit" when no annotation is present? I was thinking that something like the loop below, but i dont know what "X" should be in the IF statement KEGGID2=NULL for (i in 1:length(testdata$cpdID)) { KEGGID2[i]<-if((search.compounds.by.mass(testdata$mass[i], testdata$floatmass[i]))==X) {search.compounds.by.mass(testdata$mass[i], testdata$floatmass[i]} else{NA} } #Ideal result, that I cannot achieve with my knowledge of R# a11<-c(1, 129.0426,10, 4, 0.001290426,"cpd:C01877","cpd:C01879","cpd:C02237","cpd:C02238","cpd:C 04281", "cpd:C04282","blank","blank","blank","blank","blank") b11<-c(2, 147.0532,15,11, 0.001617585,"cpd:C00025","cpd:C00217", "cpd:C00302", "cpd:C00979", "cpd:C03618","cpd:C03790","cpd:C05574", "cpd:C05938", "cpd:C05941", "cpd:C12269","blank") c11<-c(3, 208.0848, 20,75, 0.015606360,"cpd:C00328", "cpd:C01484", "cpd:C01718", "cpd:C02381", "cpd:C05610", "cpd:C05647", "cpd:C06487", "cpd:C09816", "cpd:C11433", "cpd:C11690", "cpd:C15589") d11<-c(4,220.0848,4,7,0.002200848,"cpd:C00643","cpd:C01017","cpd:C0998 5","blank","blank","blank","blank","blank","blank","blank","blank") e11<-c(5,204.0899,7,21, 0.004285888,"cpd:C00078", "cpd:C00525", "cpd:C00806", "cpd:C07839", "cpd:C10743", "cpd:C10968", "cpd:C14916","blank","blank","blank","blank") f11<-c(6,777.0317, 11, 55, 0.042736744,"no hit","blank","blank","blank","blank","blank","blank","blank","blank"," blank","blank") ideal<-rbind(a11,b11,c11,d11,e11,f11) colnames(ideal)<-c("cpdID","mass","RT","ppmerror","floatmass","cpd1"," cpd2","cpd3","cpd4","cpd5","cpd6","cpd7","cpd8","cpd9","cpd10","cpd11" ) ideal Thank you, Alan Smith University of Wisconsin-Madison
annotation keggsoap • 541 views
ADD COMMENTlink written 13.0 years ago by ALAN SMITH40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 325 users visited in the last hour