Question: Annotation package for Codelink 55K Human Array
gravatar for Agaz Hussain Wani
3.8 years ago by
Agaz Hussain Wani260 wrote:

I want to know if there is an annotation package for Codelink 55K Human Array like h10kcod.db,h20kcod.db, hwgcod.db. How to annotate a data of Codelink 55K Human Array.

annotation codelink R • 1.0k views
ADD COMMENTlink modified 3.8 years ago by Diego Diez730 • written 3.8 years ago by Agaz Hussain Wani260
Answer: Annotation package for Codelink 55K Human Array
gravatar for Diego Diez
3.8 years ago by
Diego Diez730
Diego Diez730 wrote:

Although in NCBI GEO they seem to be different (see here vs. here) I believe the so called Codelink 55K arrays refer to the (originally named) "whole genome" arrays. So you can try to use the Human Whole Genome  annotation package: but let me know if you encounter any problems. Note that in the title of the package it is included the 55K (~55.000) tag as well suggesting they might be the same.


After further investigation, I can confirm that the so-called Human 55K array is indeed a Human Whole Genome array. All the probes listed in GPL15158 (which correspond to the 55K definition in GEO) are present in the Whole Genome array defined in GPL2895, or in the Bioconductor annotation package hwgcod.db.

The definition in GPL2895 (whole genome) contains more probes than the 55K array or the Bioconductor package. This is mainly because it contains probes labelled as "MASK" which were not included in the original chip file used to generate the annotation packages. However, the information in those probes is irrelevant in terms of annotation.

ADD COMMENTlink modified 3.7 years ago • written 3.8 years ago by Diego Diez730

Thanks for your comments. I think there is a mismatch of probes and gene names.

ADD REPLYlink written 3.8 years ago by Agaz Hussain Wani260

I tried to annotate with hwgcod.db, but it gives miss match .For example in gpl15158 probe GE766244  refers to gene symbol LOC343566 but from hwgcod.db reflects NA. In the same GPL file probe GE521442 refers  to symbol LOC130951 but from hwgcod.db i got M1AP and many other cases like this.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by Agaz Hussain Wani260

See my updated response that confirms the arrays are identical. The difference in annotation is because the packages are effectively re-annotated using the latest gene information (due to some gene ids being updated, or eliminated). That is one of the points of having the Bioconductor annotations. Indeed, the packages annotation could be even improved further if the sequences of the probes were rematched to the latest genome (instead of using the original mapping). I may consider creating such packages in the future.

ADD REPLYlink written 3.7 years ago by Diego Diez730
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 142 users visited in the last hour