GO/KEEG enrichment for non-model bacteria - help for beginner
Entering edit mode
Last seen 2.1 years ago

GO/KEEG enritchment for non-model bacteria - help for beginner

Hi everyone, I just begin with my RNAseq analysis and I am on the last step

My organism is a new strain of Aminobacter genus, closely related to Aminobacter aminovorans KTC2477 (KEEG entry T04342, aak 83263). In fact is the same specie but, in contrast with the reference specie, my strain won an symbiotic island from other organism (Mesorhizobium sp) and the insertion is located on the chromosome. My strain is the first Aminobacter strain able to perform symbiosis with plants and is able to survive under abiotic stress as well.

I want to know which genes and pathways are involved with the abiotic stress response. In order to, I performed an RNAseq under abiotic stress. I have the complete sequenced genome and I made the annotation using RAST server. The server used the KTC2477 genome as reference, both files were used for the analysis (FA and GTF files) . The Differential expression analysis was made by DEseq2 .

This is how my GTF file looks, beside the ID column (ID=fig|83263.11.peg.1) is the description (name)

##gff-version 3
tig00000001 FIG CDS 190 300  .      +   1   ID=fig|83263.11.peg.1;Name=hypothetical protein
tig00000001 FIG CDS 892 1479      . +   1   ID=fig|83263.11.peg.2;Name=hypothetical protein
tig00000001 FIG CDS 1839    1967    .   -   0   ID=fig|83263.11.peg.3;Name=hypothetical protein
tig00000001 FIG CDS 1971    2294    .   +   0   ID=fig|83263.11.peg.4;Name=hypothetical protein
tig00000001 FIG CDS 2322    2594    .   +   0   ID=fig|83263.11.peg.5;Name=Dipeptide transport system permease protein DppB (TC 3.A.1.5.2)

and my Deseq like this

DataFrame with 663 rows and 6 columns
                              baseMean     log2FoldChange             lfcSE
                             <numeric>          <numeric>         <numeric>
fig|83263.11.peg.3018 1470.74139158347  -4.56799911487077 0.209938589442722
fig|83263.11.peg.6033 499.052105615201   4.93572771540093 0.271157509696713
fig|83263.11.peg.2326 1561.17740754287  -4.09525319727112 0.236243487651701
fig|83263.11.peg.2325 694.205461173177  -3.85382768516696 0.226404901957689
fig|83263.11.peg.6032 304.943042427634   4.89500515555429 0.314517930030089
...                                ...                ...               ...
fig|83263.11.peg.1111  715.51725115005  0.523448875593969 0.189880233819811
fig|83263.11.peg.3889 177.084420993171 -0.842781044899269 0.305803225177248
fig|83263.11.peg.580  275.712626883396  0.688122866894606 0.249859595748499
fig|83263.11.peg.240  242.504276377507  0.625038523181459 0.227011499884674
fig|83263.11.peg.3003 287.946224525665   0.63544858587411 0.231341793195383

I would like to use clusterProfilers tool but as you can see the ID column have names like "fig|83263.11.peg.3018" and it's no possible (for me) to perform GO classification by usual way. I don't know how to aboard it. Can anyone suggest me any solution?

Also I was wondering due my organism have in the chromosome an island insertion from other bacteria, it could be a problem for the ID gene annotation?

Each answer will be deeply appreciate!



ps: My organism doesn't have OrgDb, or query OrgDb (AnnotationHub)

bacteria GO KEEG RNAseq • 446 views
Entering edit mode
Last seen 19 months ago

For non model organisms, it is not possible to do direct enrichment with GO/KEGG as the gene ID are not mapped already.

But another approach is taking orthologoes genes form a nearest species for which GO/KEGG data available and do enrichment on them.

Entering edit mode

Yes, the Strain ATCC2477 is the closest one. My problem is particularly about the ID of my DE file. I mean, they looks like "fig|83263.11.peg.3018" , are they correct for analysis? Can I use it for enrichment?. I've been reading and I should to proceed with the annotation of my IDs first. But I don't know how to do that. Sorry for the questions this is my first analysis.

Entering edit mode

If you using AnnotationHub for orgdb retrieval then chcek here - https://annotationhub.bioconductor.org/species if you strain is present. Then u can check if the IDs belongs to that strain. If yes then u can go with enrichment using clusterProfiler as u mentioned.

If not, then u need take the sequences for those IDs from ur data and do orthology search. In that case sequences will mapped IDs of that statin, thak those orthologoes mapped IDs for enrichment.

Hope this helps.

Entering edit mode

I used AnnotationHub, and my organism does not have orgdb
I will try with the orthology search Thanks for you advices!


Login before adding your answer.

Traffic: 276 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6