Entering edit mode
Benjamin Otto
▴
830
@benjamin-otto-1519
Last seen 10.3 years ago
Dear biocondutors,
Obviously the database files accessible at the refseq, gene or
locuslink ftp
sites do not contain all ids which can be uniquely identified via the
ncbi
web interface. Whrere can I find database files containing the rest?
Query the RefSeq identifier "NM_032722" via NCBI in the gene database
and it
will return exactly one hit:
C1orf170 Links
Official Symbol: C1orf170 and Name: chromosome 1 open reading frame
170
[Homo sapiens]
Other Aliases: MGC13275, RP11-54O7.8
Other Designations: hypothetical protein LOC84808
Chromosome: 1; Location: 1p36.33
GeneID: 84808
So I supposed that I should be able to track this gene in the current
gene2accession, gene2refseq (from the gene ftp site) or locuslink
LL_tmpl
file. Neither contains the identifier. Same is true for the RefSeq
RefSeq-release21.catalog and accession2geneid files.
Now a closer look at the hit reveals that the sequence has been
surpressed.
Has anybody an idea whether there is a database file which SHOULD
contain
this identifier (although it's surpressed)? My current problem is,
that from
about 26000 accessions I can only find around 13000 in the above
mentioned
files.
Regards
benjamin
--
Benjamin Otto
Universitaetsklinikum Eppendorf Hamburg
Institut fuer Klinische Chemie
Martinistrasse 52
20246 Hamburg