Transcript biotypes downloaded from Biomart R package were not categorized same with Ensembl website
0
0
Entering edit mode
llmiao • 0
@llmiao-20991
Last seen 4.9 years ago

Hi,

I downloaded all the transcripts and annotation with biotypes from a human dataset through the Biomart R package, the listed transcript biotypes including: protein coding, processed transcript, retained intron, pseudogene, processed pseudogene, etc. When I looked at the ensembl website for transcript biotypes (https://useast.ensembl.org/info/genome/genebuild/biotypes.html), I found retained intron was a subcategory of processed transcript, similar as processed pseudogene was a subcategory of pseudogene. So it looks the biotypes listed in the biomart package were not categorized same as the ensmbl database. Does anyone have any idea how the biotypes in Biomart R package was assigned for each transcript? Is there any reference to look for details/interpretation of the biotypes in Biomart?

Thanks!

Lingling

annotation • 798 views
ADD COMMENT
0
Entering edit mode

Why do you say the annotation returned by biomaRt is different from what you see in the browser? Can you provide an example?

From reading the biotypes pages you link to, my assumption would be that if something is annotated as 'Retained intron' then it is implicitly also annotated as 'Long non-coding RNA (lncRNA)' and 'Processed transcript'.

The results returned by biomaRt are retrieved directly from Ensembl, so the assignment of annotation is already done and should be consistent with results found via other methods of accessing Ensembl data like the browser.

ADD REPLY
0
Entering edit mode

Thanks for your reply. The difference is not for a specific gene, it's about the category of the biotypes. I found some transcripts were annotated with "processed transcript" in biomaRt, while some others were annotated with "retained intron". I assume the ones that are annotated with "retained intron" are "processed transcript" as well, but for the transcripts that are annotated with "processed transcript", does this mean they are unclassified processed transcripts that cannot be placed in one of the other processed transcript sub-categories? Similarly as some transcripts were annotated with "pseudogene", while some others were annotated with "processed pseudogene", which I assume would also be "pseudogene". I may have to focus on those transcripts that were annotated with "processed transcript", but don't know how to interpret them, as I have others annotated with "retained intron" as well, which is more clear.

ADD REPLY

Login before adding your answer.

Traffic: 733 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6