Dear all,
I am trying to map geneIDs from annotation file of S.aureus to probeIDs.
The problem is for over 2000 of rows there are more than 2 geneIDs for corresponding probeID in a row.
Here is the row number of 3544 of annotation file that I put as example:
sa_i10207dr_x_at | 1120534 // gi|1120534|ref|NC_002758.2|NC_002758.2(GI:57634611):629461-632324(+) Staphylococcus aureus subsp. aureus Mu50, GENE=sdrC LOCUS=SAV0561 // ncbi_bacterial // 13 // --- /// 1120535 // gi|1120535|ref|NC_002758.2|NC_002758.2(GI:57634611):632689-636848(+) Staphylococcus aureus subsp. aureus Mu50, GENE=sdrD LOCUS=SAV0562 // ncbi_bacterial // 116 // --- /// 1120536 // gi|1120536|ref|NC_002758.2|NC_002758.2(GI:57634611):637240-640667(+) Staphylococcus aureus subsp. aureus Mu50, GENE=sdrE LOCUS=SAV0563 // ncbi_bacterial // 27 // --- /// 1122655 // gi|1122655|ref|NC_002758.2|NC_002758.2(GI:57634611):2782009-2784642(-) Staphylococcus aureus subsp. aureus Mu50, GENE=clfB PRODUCT=Clumping factor B LOCUS=SAV2630 // ncbi_bacterial // 14 // --- /// 1123324 // gi|1123324|ref|NC_002745.2|NC_002745.2(GI:29165615):605214-608077(+) Staphylococcus aureus subsp. aureus N315, GENE=sdrC LOCUS=SA0519 // ncbi_bacterial // 13 // --- /// 1123325 // gi|1123325|ref|NC_002745.2|NC_002745.2(GI:29165615):608442-612601(+) Staphylococcus aureus subsp. aureus N315, GENE=sdrD LOCUS=SA0520 // ncbi_bacterial // 116 // --- /// 1123326 // gi|1123326|ref|NC_002745.2|NC_002745.2(GI:29165615):612993-616420(+) Staphylococcus aureus subsp. aureus N315, GENE=sdrE LOCUS=SA0521 // ncbi_bacterial // 28 // --- /// 1125352 // gi|1125352|ref|NC_002745.2|NC_002745.2(GI:29165615):2718295-2720928(-) Staphylococcus aureus subsp. aureus N315, GENE=clfB PRODUCT=Clumping factor B LOCUS=SA2423 // ncbi_bacterial // 14 // --- /// 3236072 // gi|3236072|ref|NC_002951.2|NC_002951.2(GI:57650036):635788-639935(+) Staphylococcus aureus subsp. aureus COL, GENE=sdrD PRODUCT=sdrD protein LOCUS=SACOL0609 // ncbi_bacterial // 164 // --- /// 3236073 // gi|3236073|ref|NC_002951.2|NC_002951.2(GI:57650036):640327-643829(+) Staphylococcus aureus subsp. aureus COL, GENE=sdrE PRODUCT=sdrE protein LOCUS=SACOL0610 // ncbi_bacterial // 34 // --- /// 3236353 // gi|3236353|ref|NC_002951.2|NC_002951.2(GI:57650036):632578-635423(+) Staphylococcus aureus subsp. aureus COL, GENE=sdrC PRODUCT=sdrC protein LOCUS=SACOL0608 // ncbi_bacterial // 13 // --- /// 3237041 // gi|3237041|ref|NC_002951.2|NC_002951.2(GI:57650036):2711036-2713777(-) Staphylococcus aureus subsp. aureus COL, GENE=clfB PRODUCT=clumping factor B LOCUS=SACOL2652 // ncbi_bacterial // 15 // --- |
I want to know is it correct if I consider only the first geneID in each row?
I will appreciate any advice
Nazanin