Retrieving true orthologs between species BiomaRt
1
0
Entering edit mode
Colin • 0
@6ba57687
Last seen 5 days ago
Belgium

Hi,

I am conducting an analysis between zebrafish and mouse data. In that sens, I need to precisely find orthologs to evaluate conservation. In order to do so, I relied on BiomaRt to easily convert my gene sets as I perform my analyses in R. Here is the code for my LDS object :

Code should be placed in three backticks as shown below


zebra <- useMart("ensembl", dataset = "drerio_gene_ensembl", host = "https://feb2021.archive.ensembl.org")
mouse <- useMart("ensembl", dataset = "mmusculus_gene_ensembl", host = "https://feb2021.archive.ensembl.org")

zebraMouseOrtholog <- getLDS(

attributes = c("ensembl_gene_id","external_gene_name","description","mmusculus_homolog_orthology_type", "mmusculus_homolog_orthology_confidence"), mart = zebra,
attributesL = c("ensembl_gene_id","external_gene_name","description","drerio_homolog_orthology_type", "drerio_homolog_orthology_confidence"), martL = mouse)


However, this tab returns as well orthologs (certified by ZFIN) as homologs. The attribute homolog_orthology_type don't give direct access to true orthologs. I think about using the homolog_orthology_confidence to select closely genes with high chance to be orthologs but I am not sure if it is the best way to achieve it.

I've seen that it exists an attribute that displays percentage of orthology, but I can't find it in my mart.

Thank you for your time,

Orthologs BiomaRt • 92 views
0
Entering edit mode

I've figured out that while most of mouse genes return 1 or 2 orthologs only, only 5% of the total genes of the table biomaRt return more than 2 genes, up to 21 for the higher occurrence (Reg1 genes for example). Would anyone have an idea of why and if it is more like an misannotation or if there is a biological sense behind?

0
Entering edit mode
@james-w-macdonald-5106
Last seen 5 hours ago
United States

The biomaRt package simply queries data available at the Ensembl Biomart server, using the available infrastructure. Your question has to do with the underlying data, and how Ensembl decides to map genes between species. That's off-topic for this site, which is intended for questions pertaining to the functionality of Bioconductor packages. You would be better off asking Ensembl directly, or failing that maybe somebody on biostars.org might have an opinion.