The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: Ortholog conversions of non-model organisms not present in Biomart, ensembl or KEGG
0
gravatar for laural710
5 months ago by
laural7100
laural7100 wrote:

Hi 

I am wondering whether i can get some advice. I have read through the postings on this and could not find anything that helped.

I am working on rainbow trout and gotten to the point where i have a list of DEG's (~8,000) which i would like to run through KEGG. I have checked withing the species list at KEGG and rainbow trout (taxid= 8022) is not present so the first thing i need to is find the orthologous genes to this species. Rainbow trout is also not located on Biomart or in the ensembl. 

Studies have previoulsy used zebrafish or salmon (sasa) which are present in the kegg database. I have checked through various recommended online programmes to find a concersion tool, but my species is not listed on any of them. The closest i could find to working is bioDBnet, and they only have the concersion species, not my species. 

I have managed to build a annotation package for rainbow trout within R, but this does not have kegg information and i'm struggling now. I have converted to various different id's including GID, Entrez ID, uniprot etc from the original REFSEQ id, but nothing has worked. Is there a specific package that works with non-model organisms that actually contains non-model species? 

I have submitted some of my sequeces to Ghostkoala as a last ditch effort, but is there a package within R that can do this?

Any help would be very much appreciated. 

L

ADD COMMENTlink modified 5 months ago by James W. MacDonald49k • written 5 months ago by laural7100
Answer: Ortholog conversions of non-model organisms not present in Biomart, ensembl or K
1
gravatar for James W. MacDonald
5 months ago by
United States
James W. MacDonald49k wrote:

The annotation data in Bioconductor are, as a rule, simply re-packaging of existing data. And this in general does not include many non-model species, because (as you have found) there just isn't much data out there.

I recently did an RNA-Seq analysis using O. mykiss, (using the salmon aligner - ha!), and I found that there really wasn't much difference in the number of reads that align to the S. salar transcriptome as compared to O. mykiss, so we ended up aligning to the more well annotated transcriptome, which allowed us to do GO and KeGG stuff on the back end.

If the number of reads that were aligning to the 'wrong' transcriptome were much different I probably would have done something slightly different, instead aligning to the O. mykiss transcriptome, and then trying to map the transcripts to their S. salar equivalents using BLAST. That brings up some added complexity, because you may have variable numbers of transcripts for a given gene that map across species. Since the alignments to Salmo seemed OK, we just went with the cross-species alignment.

ADD COMMENTlink written 5 months ago by James W. MacDonald49k

Thanks for that, and for now knowing someone else has had the same issues with this species as me even though its ironically a model species!

It was an option but i managed to get to the point of GO enrichment using clusterProfiler and building a rainbow trout annotation package via AnnotationForge, so i'd like to continue if possible using the rainbow trout genome even though its been a struggle. I've managed to get some results via GhostKoala but wondered with your aligment ot the S.salar transcriptome, what was your annotation data like for KEGG? Mine is within the region of 22% but i don't have anything to ground this data with? 

ADD REPLYlink written 5 months ago by laural7100

KeGG has pretty much all of them:

> library(KEGGREST)
> zz <- keggList("sasa")
> length(zz)
[1] 55214
## read in salmon alignments using tximport and compare
> sum(row.names(counts$counts) %in% gsub("sasa:", "", names(zz))/nrow(counts$counts)
[1] 0.9980598
ADD REPLYlink written 5 months ago by James W. MacDonald49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 331 users visited in the last hour