Gene Database Files (eg gene2accession) complete?
2
0
Entering edit mode
Benjamin Otto ▴ 830
@benjamin-otto-1519
Last seen 8.2 years ago
Dear biocondutors, Obviously the database files accessible at the refseq, gene or locuslink ftp sites do not contain all ids which can be uniquely identified via the ncbi web interface. Whrere can I find database files containing the rest? Query the RefSeq identifier "NM_032722" via NCBI in the gene database and it will return exactly one hit: C1orf170 Links Official Symbol: C1orf170 and Name: chromosome 1 open reading frame 170 [Homo sapiens] Other Aliases: MGC13275, RP11-54O7.8 Other Designations: hypothetical protein LOC84808 Chromosome: 1; Location: 1p36.33 GeneID: 84808 So I supposed that I should be able to track this gene in the current gene2accession, gene2refseq (from the gene ftp site) or locuslink LL_tmpl file. Neither contains the identifier. Same is true for the RefSeq RefSeq-release21.catalog and accession2geneid files. Now a closer look at the hit reveals that the sequence has been surpressed. Has anybody an idea whether there is a database file which SHOULD contain this identifier (although it's surpressed)? My current problem is, that from about 26000 accessions I can only find around 13000 in the above mentioned files. Regards benjamin -- Benjamin Otto Universitaetsklinikum Eppendorf Hamburg Institut fuer Klinische Chemie Martinistrasse 52 20246 Hamburg
• 994 views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Tuesday 20 February 2007 09:48, Benjamin Otto wrote: > Dear biocondutors, > > Obviously the database files accessible at the refseq, gene or locuslink > ftp sites do not contain all ids which can be uniquely identified via the > ncbi web interface. Whrere can I find database files containing the rest? > > Query the RefSeq identifier "NM_032722" via NCBI in the gene database and > it will return exactly one hit: > > C1orf170 Links > Official Symbol: C1orf170 and Name: chromosome 1 open reading frame 170 > [Homo sapiens] > Other Aliases: MGC13275, RP11-54O7.8 > Other Designations: hypothetical protein LOC84808 > Chromosome: 1; Location: 1p36.33 > GeneID: 84808 > > So I supposed that I should be able to track this gene in the current > gene2accession, gene2refseq (from the gene ftp site) or locuslink LL_tmpl > file. Neither contains the identifier. Same is true for the RefSeq > RefSeq-release21.catalog and accession2geneid files. > > Now a closer look at the hit reveals that the sequence has been surpressed. > Has anybody an idea whether there is a database file which SHOULD contain > this identifier (although it's surpressed)? My current problem is, that > from about 26000 accessions I can only find around 13000 in the above > mentioned files. Do you have access to the sequences? If you do, you may want to simply blast your sequences (if you have them) against RefSeq to get the most up- to-date annotation. Sean
ADD COMMENT
0
Entering edit mode
Nianhua Li ▴ 870
@nianhua-li-1606
Last seen 8.2 years ago
Hi, Benjamin, You may want to check out ftp://ftp.ncbi.nih.gov/refseq/special_requests/ The file "suppressed_temporary" contains "NM_032722". gene2accession and gene2refseq (from the gene ftp site) only contain accessions that can be mapped to genes. If you want to match refseq ids, you should focus on the refseq ftp site. best nianhua
ADD COMMENT
0
Entering edit mode
Hi Niahnhua, Sean, As for the sequences: Unfortunately I don't have them. But I do agree this would be the most accurate solution. And as for the refseq ftp site: I already ha a look on that one. But I'll have a look at the link you sent Nianhua, thanks a lot... Many thanks again and sincere regards Benjamin -----Urspr?ngliche Nachricht----- Von: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] Im Auftrag von Nianhua Li Gesendet: 21 February 2007 18:30 An: bioconductor at stat.math.ethz.ch Betreff: Re: [BioC] Gene Database Files (eg gene2accession) complete? Hi, Benjamin, You may want to check out ftp://ftp.ncbi.nih.gov/refseq/special_requests/ The file "suppressed_temporary" contains "NM_032722". gene2accession and gene2refseq (from the gene ftp site) only contain accessions that can be mapped to genes. If you want to match refseq ids, you should focus on the refseq ftp site. best nianhua _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6