Question

Ortholog conversions of non-model organisms not present in Biomart, ensembl or KEGG

0

Entering edit mode

laural710 • 0

@laural710-14567

Last seen 4.4 years ago

Hi

I am wondering whether i can get some advice. I have read through the postings on this and could not find anything that helped.

I am working on rainbow trout and gotten to the point where i have a list of DEG's (~8,000) which i would like to run through KEGG. I have checked withing the species list at KEGG and rainbow trout (taxid= 8022) is not present so the first thing i need to is find the orthologous genes to this species. Rainbow trout is also not located on Biomart or in the ensembl.

Studies have previoulsy used zebrafish or salmon (sasa) which are present in the kegg database. I have checked through various recommended online programmes to find a concersion tool, but my species is not listed on any of them. The closest i could find to working is bioDBnet, and they only have the concersion species, not my species.

I have managed to build a annotation package for rainbow trout within R, but this does not have kegg information and i'm struggling now. I have converted to various different id's including GID, Entrez ID, uniprot etc from the original REFSEQ id, but nothing has worked. Is there a specific package that works with non-model organisms that actually contains non-model species?

I have submitted some of my sequeces to Ghostkoala as a last ditch effort, but is there a package within R that can do this?

Any help would be very much appreciated.

L

rnaseq_analysis KEGG nonmodel species • 1.3k views

ADD COMMENT • link updated 5.6 years ago by James W. MacDonald 65k • written 5.6 years ago by laural710 • 0

score 1 · Answer 1 · 2018-09-18

1

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 15 hours ago

United States

The annotation data in Bioconductor are, as a rule, simply re-packaging of existing data. And this in general does not include many non-model species, because (as you have found) there just isn't much data out there.

I recently did an RNA-Seq analysis using O. mykiss, (using the salmon aligner - ha!), and I found that there really wasn't much difference in the number of reads that align to the S. salar transcriptome as compared to O. mykiss, so we ended up aligning to the more well annotated transcriptome, which allowed us to do GO and KeGG stuff on the back end.

If the number of reads that were aligning to the 'wrong' transcriptome were much different I probably would have done something slightly different, instead aligning to the O. mykiss transcriptome, and then trying to map the transcripts to their S. salar equivalents using BLAST. That brings up some added complexity, because you may have variable numbers of transcripts for a given gene that map across species. Since the alignments to Salmo seemed OK, we just went with the cross-species alignment.

ADD COMMENT • link 5.6 years ago James W. MacDonald 65k

0

Entering edit mode

Thanks for that, and for now knowing someone else has had the same issues with this species as me even though its ironically a model species!

It was an option but i managed to get to the point of GO enrichment using clusterProfiler and building a rainbow trout annotation package via AnnotationForge, so i'd like to continue if possible using the rainbow trout genome even though its been a struggle. I've managed to get some results via GhostKoala but wondered with your aligment ot the S.salar transcriptome, what was your annotation data like for KEGG? Mine is within the region of 22% but i don't have anything to ground this data with?

ADD REPLY • link 5.6 years ago laural710 • 0

0

Entering edit mode

KeGG has pretty much all of them:

> library(KEGGREST)
> zz <- keggList("sasa")
> length(zz)
[1] 55214
## read in salmon alignments using tximport and compare
> sum(row.names(counts$counts) %in% gsub("sasa:", "", names(zz))/nrow(counts$counts)
[1] 0.9980598

ADD REPLY • link 5.6 years ago James W. MacDonald 65k