I recently posted a question regarding identifying GO terms that were mapped to GOslims and received a great answer on how to do this.
However, in the grand scheme of things, I really want to find out which genes/transcripts are mapped to a set of GOslims. Since GSEABase is able to map GO terms to GOslims, it seems like identifying corresponding gene/transcript IDs should be feasible. Unfortunately, I have no idea how to proceed.
If I have an input file with GO terms and corresponding gene/transcript IDs (note: actual format/layout is unimportant - I'm confident I can change things around to match needed input format for use in GSEABase):
GO:0016874 TRINITY_DN8634_c0_g1
GO:0016874 TRINITY_DN9386_c0_g1
GO:0019693 TRINITY_DN10297_c1_g1
GO:0019693 TRINITY_DN12835_c0_g1
GO:0019693 TRINITY_DN1421_c0_g1
I know how to map the GO terms to GOslims, as well as identify which GO terms map to GOslims.
Now, I am hoping there's a "simple" solution built-in to GSEABase that will allow me to subset gene/transcript IDs from any given GOslim.
Would anyone have any suggestions as to how to approach this within GSEABase? It seems like creating a GeneSetCollection (or, a GOCollection?) would be the first approach, but I'm not familiar enough with R and GSEABase to figure out where to go from there. Heck, I might not even be able to figure out how to properly construct a GeneSetCollection properly with the information provided above...
I can probably wrangle some command line stuff to accomplish this task, but it would be nicer (and probably easier?) to just use GSEABase to accomplish this.
Oh, one last thing to add, all of our work is with non-model organisms with limited/non-existent genomic resources. So, I do have UniProt IDs associated with these gene/transcript IDs, but won't be able to tap into the standard ENTREZ/PMID/etc databases.
Thanks so much for this! This is very useful.
Unfortunately, I still need some hand-holding to figure out how to work with a GeneSetCollection and pass the necessary info to the goSlim function (and, in turn, get the GO terms and/or gene/transcript IDs pulled out of the GOslim mappings).
If I'm interpreting all this correctly, you could modify your
mappedIds()
function from the other post as follows:Modifying the example of GO annotations you gave above, as follows:
We can follow up the example you started in your previous post, as follows:
Note that i'm using the
gsc
object with the annotations built in the way I described in my answer from the updated GO annotations in this comment.I have no idea how I missed this response, but it's amazing! Thank you so much!!