Hi,
I am conducting an analysis between zebrafish and mouse data. In that sens, I need to precisely find orthologs to evaluate conservation. In order to do so, I relied on BiomaRt to easily convert my gene sets as I perform my analyses in R. Here is the code for my LDS object :
Code should be placed in three backticks as shown below
zebra <- useMart("ensembl", dataset = "drerio_gene_ensembl", host = "https://feb2021.archive.ensembl.org")
mouse <- useMart("ensembl", dataset = "mmusculus_gene_ensembl", host = "https://feb2021.archive.ensembl.org")
zebraMouseOrtholog <- getLDS(
attributes = c("ensembl_gene_id","external_gene_name","description","mmusculus_homolog_orthology_type", "mmusculus_homolog_orthology_confidence"), mart = zebra,
attributesL = c("ensembl_gene_id","external_gene_name","description","drerio_homolog_orthology_type", "drerio_homolog_orthology_confidence"), martL = mouse)
However, this tab returns as well orthologs (certified by ZFIN) as homologs. The attribute homolog_orthology_type don't give direct access to true orthologs. I think about using the homolog_orthology_confidence to select closely genes with high chance to be orthologs but I am not sure if it is the best way to achieve it.
I've seen that it exists an attribute that displays percentage of orthology, but I can't find it in my mart.
Thank you for your time,
I've figured out that while most of mouse genes return 1 or 2 orthologs only, only 5% of the total genes of the table biomaRt return more than 2 genes, up to 21 for the higher occurrence (Reg1 genes for example). Would anyone have an idea of why and if it is more like an misannotation or if there is a biological sense behind?