Question: Annotation package for Codelink 55K Human Array
3.8 years ago
Agaz Hussain Wani wrote:

I want to know if there is an annotation package for Codelink 55K Human Array like h10kcod.db,h20kcod.db, hwgcod.db. How to annotate a data of Codelink 55K Human Array.

annotation codelink R
written 3.8 years ago by Agaz Hussain Wani
Answer: Annotation package for Codelink 55K Human Array
3.8 years ago
Diego Diez
Diego Diez wrote:

Although in NCBI GEO they seem to be different (see here vs. here) I believe the so called Codelink 55K arrays refer to the (originally named) "whole genome" arrays. So you can try to use the Human Whole Genome  annotation package: but let me know if you encounter any problems. Note that in the title of the package it is included the 55K (~55.000) tag as well suggesting they might be the same.


After further investigation, I can confirm that the so-called Human 55K array is indeed a Human Whole Genome array. All the probes listed in GPL15158 (which correspond to the 55K definition in GEO) are present in the Whole Genome array defined in GPL2895, or in the Bioconductor annotation package hwgcod.db.

The definition in GPL2895 (whole genome) contains more probes than the 55K array or the Bioconductor package. This is mainly because it contains probes labelled as "MASK" which were not included in the original chip file used to generate the annotation packages. However, the information in those probes is irrelevant in terms of annotation.

modified 3.7 years ago • written 3.8 years ago by Diego Diez

Thanks for your comments. I think there is a mismatch of probes and gene names.

written 3.8 years ago by Agaz Hussain Wani

I tried to annotate with hwgcod.db, but it gives miss match .For example in gpl15158 probe GE766244  refers to gene symbol LOC343566 but from hwgcod.db reflects NA. In the same GPL file probe GE521442 refers  to symbol LOC130951 but from hwgcod.db i got M1AP and many other cases like this.

modified 3.8 years ago • written 3.8 years ago by Agaz Hussain Wani

See my updated response that confirms the arrays are identical. The difference in annotation is because the packages are effectively re-annotated using the latest gene information (due to some gene ids being updated, or eliminated). That is one of the points of having the Bioconductor annotations. Indeed, the packages annotation could be even improved further if the sequences of the probes were rematched to the latest genome (instead of using the original mapping). I may consider creating such packages in the future.

written 3.7 years ago by Diego Diez
