12 months ago by
If you want to add extra information to a post, use the ADD COMMENT button, not Add Answer. Because you aren't adding an answer, you are adding a comment.
There are a few misconceptions in your post. First, gene annotations (like mappings between GO and say Ensembl IDs) aren't tied to a genome build, because they have nothing to do with locations of genes on the genome (which is what the genome build provides). What Ensembl IDs there are, and what GO terms there are, and how they interact is quite fluid, being updated regularly (unlike the genome builds, which in general have big releases years apart, with smaller updates between). So if you use an archived version of the Biomart data, you get the mappings that were extant at that point in time (like 2014, which is a long time ago!), and you are missing out on any updates that may have occurred in the intervening time.
When you provide Ensembl IDs to goseq, what happens under the hood is that your IDs are mapped to Entrez Gene IDs and then mapped to the GO IDs. That will affect your results, as you invariably 'lose' genes by doing that, because the two annotation services (EBI and NCBI) don't necessarily agree on what is and isn't a gene. Do note that this is orthogonal to the question at hand, which is what GO terms are appended to a given gene - if you lose data because you can't map between Ensembl IDs to Gene IDs, that has nothing to do with the mapping for those IDs to the GO ID.
If you want to use Ensembl IDs, you are better off providing
goseq with a gene2cat
data.frame that specifies the Ensembl ID -> GO ID mappings you want to use (based off of data from Biomart, which keeps the mappings within EBI's databases).