clusterProfiler does exactly what you have done, and then passes those data on to the
fgsea package, which then apparently filters out some of the genes. It's not clear to me when/why that happens, but maybe the maintainer of
fgsea will provide an answer.
I should also point out that using Ensembl Gene IDs in this context is problematic. What will happen is the Ensembl Gene IDs will first be mapped to NCBI Gene IDs, and then those will be used to map to GO terms. This is unavoidable if you use an
OrgDb, because the underlying database has NCBI Gene IDs as the central key, so any query between tables is mapped via the NCBI Gene ID. Anyway, the first step is quite frail, as EBI/EMBL and NCBI do not agree on which genes are the same, so you will silently drop genes at that first step.
As a super simple example, here's how many Ensembl Gene IDs cannot be mapped
> mapper <- mapIds(org.Mm.eg.db, keys(org.Mm.eg.db), "ENSEMBL", "ENTREZID", multiVals = "list")
'select()' returned 1:many mapping between keys and columns
> nagns <- names(mapper)[sapply(mapper, function(x) all(is.na(x)))]
## how many of those NCBI Gene IDs does EBI/EMBL map?
> mart <- useEnsembl("ensembl","mmusculus_gene_ensembl")
> ensmap <- getBM(c("ensembl_gene_id","entrezgene_id"), "entrezgene_id", nagns, mart)
1 ENSMUSG00000102439 14246
2 ENSMUSG00000121135 15365
3 ENSMUSG00000070645 19702
4 ENSMUSG00000107355 100740
5 ENSMUSG00000120083 67644
6 ENSMUSG00000085385 68108
 217 2
So NCBI only maps ~45% of their IDs to Ensembl IDs, and of those that don't map, Ensembl says 217 DO map. You can do the converse (use
biomaRt to map Ensembl IDs to NCBI Gene IDs) and you will get a different set of genes that do/don't map.
Anyway, long story short, you would be better served if you could work with NCBI Gene IDs throughout.