Hi there!
I am trying to use the package biomaRt to find the Ensambl ID for my genes that are in HGNC. I am doing an iteration because some genes do not have an Ensamble ID apparently, so I wanted to check them per row. However, all the time I keep getting an error about the server. I have tried different mirrors (www, asia and useast), but none of them seem to work. My code looks fine, but maybe I am unable to spot the error and that is why it is not working?
mart <- useEnsembl(biomart = "genes", dataset = "hsapiens_gene_ensembl", host = "https://www.ensembl.org")
all_migraine$ensembl_id <- NA
for (i in 1:nrow(all_migraine)) {
ensembl_matrix <- getBM(attributes = 'ensembl_gene_id',
filters = 'hgnc_symbol',
values = all_migraine$mappedGenes[i],
mart = mart)
ensembl_id <- ensembl_matrix[1, 1]
all_migraine$ensembl_id[i] <- ifelse(!is.na(ensembl_id), ensembl_id, NA)
}
Error: biomaRt has encountered an unknown server error. HTTP error code: 405
Please report this on the Bioconductor support site at https://support.bioconductor.org/
Consider trying one of the Ensembl mirrors (for more details look at ?useEnsembl)
Session info:
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
I am new to bioinformatics, so any suggestion on what may be the issue would totally help! Thanks a bunch!
I should also point out that the Biomart server can be a bit temperamental, and it's difficult enough some times to connect once. Trying to repeatedly connect will only exacerbate that issue.
Yes, the Biomart service seems to be really bad this week and it's affecting all the mirror sites too. Unfortunately there's nothing biomaRt can do if Ensembl's servers aren't working very well.
As James says, you really don't want to be querying Ensembl BioMart one gene at a time. It will take forever and is very prone to failure and/or getting your IP banned. I'm not even sure how it's managed to return a 405 error, that indicates trying to access the server with a method they don't support, which biomaRt shouldn't be able to do.
One thing I will point out from James' answer is that if you query for an HGNC symbol that doesn't have a matching Ensembl ID it won't return an NA - it will just be dropped silently. This is because Ensembl BioMart is totally centred around the Ensembl IDs. Again, as James pointed out, you want to make sure you're returning both the query column and the thing you're looking for. You can then use this to make sure you can also identify the HGNC symbols that don't have a match. Here's a small example.
Thank you everyone! Those are really good suggestions, and now I know a little more. I will try what you've recommended and see if I can make it work!
You can always use an
OrgDb
to do the mapping as well, which will not have an issue with access to an online resource. However there are still trade-offs. TheOrgDb
packages are NCBI-centric, which means any mapping of HGNC symbol to Ensembl Gene ID will actually be HGNC -> NCBI Gene ID -> Ensembl Gene ID. It's the last step that can be problematic as mapping between NCBI and Ensembl IDs is not necessarily consistent. But it is an available resource.As an example, ABCF1 maps to a single NCBI Gene ID (23), but then it maps to 7 Ensembl Gene IDs. But if you go to genenames.org, it says the mapping is to a single Ensembl Gene ID, apparently because they have manually curated it and someone says that's the one.