NCBI Locus ID to gene ID conversion
1
2
Entering edit mode
KMS ▴ 20
@de2afa3c
Last seen 18 months ago
Croatia

Dear All, I request you please answer if any body know it and want to share.

I took NCBI Assembly file from RefSeq FTP server for genome index and after string tie/feature count I got locus tag ID which I am unable to convert through any software or tool. I humbly request you to please share if any suggestion. I am frustated of searching anything. I want to perform GO ontology or GSEA once I can get gene_ID.

Thanks

LocusIDtogeneIDconversion • 3.1k views
ADD COMMENT
0
Entering edit mode

Have u got the solution to this? I am also facing the same issue. Kindly help.

ADD REPLY
0
Entering edit mode

Did you read all posts in this thread? If so, then hopefully it should have become clear that you will have to provide sufficient information; more than that you did now. I also would suggest you start a new thread specific for your question.

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 6 minutes ago
United States

You are being unnecessarily mysterious. What species? What do you mean by 'locus tag ID'? Do you have any example IDs that people could use to test?

When posting questions, try to put yourself in the shoes of a potential answerer and think of what they might need in order to answer your question.

ADD COMMENT
0
Entering edit mode

Dear James, In RNASeq data analysis when I used an assembly file .fna (from FTP server) to build the genome of Streptomyces and further used its GFF/GTF file for read count of transcript through Stringtie it generated a file which contains say this kind of locus tag FQ762_RS31685 which were available in GTF file. I want gene_id for further gene ontology analysis and GSEA, but unable to find how to convert it into gene_id. Thanks for the suggestion. Same thing I am facing in Ensembl where such kind of gene_id is present which again is unable to convert into actual gene_ID.

ADD REPLY
0
Entering edit mode

Which files exactly? Please be more clear! I am not an expert on Streptomyces, but apparently there have been genomes of > 2840 different Streptomyces variants sequenced... See: https://www.ncbi.nlm.nih.gov/data-hub/genome/?taxon=1883

And as you can see here, for example, for Streptomyces abikoensis the locus tags have not been annotated to GeneIDs. https://www.ncbi.nlm.nih.gov/genome/browse/#!/proteins/95026/992294%7CStreptomyces%20abikoensis/

If such mapping data is also not available for your Streptomyces variant, then, AFAIK, there won't be any Bioconductor package that will allow you to do this....

ADD REPLY
0
Entering edit mode

Dear Guido, Thanks for your reply, I was looking for S.coelicolor where only 11 assemblies are there. The one I am looking for is having chromosomes instead of Un in your second link https://www.ncbi.nlm.nih.gov/genome/browse/#!/proteins/1057/705907%7CStreptomyces%20coelicolor%20A3(2)/chromosome/. I also tried to your another complete genome for reference its GTF has gene_ID but throws an error that no gene_ID in the 9th column.

Thanks

ADD REPLY
0
Entering edit mode

Again, you are being mysterious. You downloaded an assembly file from where? Am I supposed to guess? Where did you get the GTF file? By definition, if you are using StringTie you had to get a genome FASTA and GTF file. Telling me you downloaded something you absolutely had to download, without saying where you got it is what I meant when I said 'put yourself in the shoes of the answerer and think of what they need to answer your question'. You are telling me things I already know and omitting things that I cannot possibly know.

ADD REPLY
0
Entering edit mode

Dear James, I am sorry if you again think it is mysterious. However, I mentioned the bacteria, species I used the genome fasta file from ftp link genome>assembly>FTP directory for GenBank assembly, where all fasta, GFF, GTF files are available. I used these .fna file for genome index and GFF file for stringtie, and in GFF file gene_id was mentioned as FQ762_RS31685 which are locus tags, and it should be like SCO111 as gene id. There was no problem using these locus tag id up to DESeEq2 but when I want to go gene ontology/GSEA, it requires an actual gene ID, but how to convert locus tag to gene id, this was my question.

Thanks

ADD REPLY
0
Entering edit mode

Some people's replies in this community seem harsh. Like, OK, you want to express that the information provided by the OP is incomplete, got it, however you can do it in a more polite, kinder and more professional manner(s).

As for OP, hope you found the way to do it, I am having a similar issue myself and if I don't solve it soon, I will open a thread too.

ADD REPLY

Login before adding your answer.

Traffic: 846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6