If you want to add extra information to a post, use the ADD COMMENT button, not Add Answer. Because you aren't adding an answer, you are adding a comment.
There are a few misconceptions in your post. First, gene annotations (like mappings between GO and say Ensembl IDs) aren't tied to a genome build, because they have nothing to do with locations of genes on the genome (which is what the genome build provides). What Ensembl IDs there are, and what GO terms there are, and how they interact is quite fluid, being updated regularly (unlike the genome builds, which in general have big releases years apart, with smaller updates between). So if you use an archived version of the Biomart data, you get the mappings that were extant at that point in time (like 2014, which is a long time ago!), and you are missing out on any updates that may have occurred in the intervening time.
When you provide Ensembl IDs to goseq, what happens under the hood is that your IDs are mapped to Entrez Gene IDs and then mapped to the GO IDs. That will affect your results, as you invariably 'lose' genes by doing that, because the two annotation services (EBI and NCBI) don't necessarily agree on what is and isn't a gene. Do note that this is orthogonal to the question at hand, which is what GO terms are appended to a given gene - if you lose data because you can't map between Ensembl IDs to Gene IDs, that has nothing to do with the mapping for those IDs to the GO ID.
If you want to use Ensembl IDs, you are better off providing goseq
with a gene2cat data.frame
that specifies the Ensembl ID -> GO ID mappings you want to use (based off of data from Biomart, which keeps the mappings within EBI's databases).
I found that if using
org.Hs.egENSEMBL
will return the GRCh38 ESIDs, while my former analyses used hg19 annotations.Therefore I first got all the entrez ids for each GO term then using biomaRt to retrieve the hg19 ESIDs:
If I only use the GO filter in biomaRt to retrieve ESIDs of GO terms, only a few gene ids in the results, I don't know why: