I tried to annotate a codelink file using h20kcod.db with the following script.
library(codelink) library(h20kcod.db) library(GEOquery) f = list.files(pattern = "TXT") codset = readCodelinkSet(filename = f[[1]]) probes.sel <- as.character(fData(codset)$probeName[1:100]) select(h20kcod.db, keys = probes.sel, columns = c("SYMBOL", "GENENAME"), )
The codelink file looks as shown below.
probeName probeType logicalRow logicalCol meanSNR 1 NM_012429.1_PROBE1 DISCOVERY 1 9 0.7156364 2 NM_003980.2_PROBE1 DISCOVERY 1 10 1.7474178 3 AY044449_PROBE1 DISCOVERY 1 11 1.4991542 4 NM_005015.1_PROBE1 DISCOVERY 1 12 7.0496070 5 AB037823_PROBE1 DISCOVERY 1 13 10.1904870 6 NM_032986.1_PROBE1 DISCOVERY 1 14 1.0015290 7 AB032981_PROBE1 DISCOVERY 1 16 1.4521567 8 NM_001907.1_PROBE1 DISCOVERY 1 17 1.7437525 9 AB037745_PROBE1 DISCOVERY 1 18 0.9601950 10 NM_007039.1_PROBE1 DISCOVERY 1 19 0.9245058
I get a message :
Error in .testForValidKeys(x, keys, keytype) : None of the keys entered are valid keys for 'PROBEID'. Please use the keys method to see a listing of valid arguments.
May i know what is going wrong here. Thanks
Perhaps you can help us. You have an error message that is intended to explain the problem and point to a solution. Can you explain what you don't understand about the error message so we can improve it?
I understood , none of the keys (for example i use probNames as keys to search in h20kcod.db to get annotation ) are valid . So i think i need valid keys to be matched in annotation file to get each probe name annotated. probe-ID's are valid which unfortunately I do not have in above data frame. So don't know what should i use as valid key.
OK, so what are those probeNames in that table? Do you perhaps think you could use those somehow?
I think if i am able to get the probe ID's of those probeNames, then it should work. I am not sure about this but i think it should be done. Am i right??
Well hypothetically, yes. I don't know anything about the data you have in hand, so I can't say if you can get the probe IDs or not. But you are missing my point, which is to say that you might already have everything you need, so long as you are willing to think about what you are doing. So here is the first probeName from your file:
NM_012429.1_PROBE1
That is a concatenation of two things: NM_012429.1 and _PROBE1. Do you know what that first thing is? If not, why don't you try googling it?
If i am not wrong, then its REFSEQ key in h20kcod.db. But i have not seen .1 or something in that, may be i have missed something.
Exactly! Now we are getting somewhere. So if you have the RefSeq IDs for each row, then how would you get the HUGO symbol and gene name?
As a hint, the RefSeq ID is something specific to the organism (H. sapiens, in this case), not the array used. So it is not likely to be a useful key for the h20kcod.db package.
On the basis of RefSeqIDs i can get the ProbeID's, Gene symbol, Gene Name, indeed anything else from h20kcod.db
But as you mentioned ,its not a reliable key to trust on, what about
or
both of which are present in my data file.
I don't recall saying RefSeq IDs were not reliable, but whatever you like. It's up to you.
Nice way of solving my problem :) . Thanks a lot.
Thanks James for your contribution and suggestion. Indeed, this could be a possible approach. I suggested another one in my response, plus I stole and added your suggestion.