Annotation package for Codelink 55K Human Array
1
3
Entering edit mode
@agaz-hussain-wani-7620
Last seen 6.7 years ago
India

I want to know if there is an annotation package for Codelink 55K Human Array like h10kcod.db,h20kcod.db, hwgcod.db. How to annotate a data of Codelink 55K Human Array.

r annotation codelink • 2.7k views
ADD COMMENT
1
Entering edit mode
Diego Diez ▴ 760
@diego-diez-4520
Last seen 4.2 years ago
Japan

Although in NCBI GEO they seem to be different (see here vs. here) I believe the so called Codelink 55K arrays refer to the (originally named) "whole genome" arrays. So you can try to use the Human Whole Genome  annotation package: http://bioconductor.org/packages/hwgcod.db/ but let me know if you encounter any problems. Note that in the title of the package it is included the 55K (~55.000) tag as well suggesting they might be the same.

UPDATE

After further investigation, I can confirm that the so-called Human 55K array is indeed a Human Whole Genome array. All the probes listed in GPL15158 (which correspond to the 55K definition in GEO) are present in the Whole Genome array defined in GPL2895, or in the Bioconductor annotation package hwgcod.db.

The definition in GPL2895 (whole genome) contains more probes than the 55K array or the Bioconductor package. This is mainly because it contains probes labelled as "MASK" which were not included in the original chip file used to generate the annotation packages. However, the information in those probes is irrelevant in terms of annotation.

ADD COMMENT
0
Entering edit mode

Thanks for your comments. I think there is a mismatch of probes and gene names.

ADD REPLY
0
Entering edit mode

I tried to annotate with hwgcod.db, but it gives miss match .For example in gpl15158 probe GE766244  refers to gene symbol LOC343566 but from hwgcod.db reflects NA. In the same GPL file probe GE521442 refers  to symbol LOC130951 but from hwgcod.db i got M1AP and many other cases like this.

ADD REPLY
0
Entering edit mode

See my updated response that confirms the arrays are identical. The difference in annotation is because the packages are effectively re-annotated using the latest gene information (due to some gene ids being updated, or eliminated). That is one of the points of having the Bioconductor annotations. Indeed, the packages annotation could be even improved further if the sequences of the probes were rematched to the latest genome (instead of using the original mapping). I may consider creating such packages in the future.

ADD REPLY

Login before adding your answer.

Traffic: 370 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6