Hi, it would greatly help if you mention the Agilent array version, or, at least, that you state the source of the data that you have retrieved.
The hugene20sttranscriptcluster.db package, is for the Affymetrix HuGene 2.0 ST Array.
It is likely possible that you can create your own annotation package via AnnotationForge; however, for Agilent arrays, one can always use biomaRt:
probes <- c("A_14_P125183","P_310924","A_24_P5750",
"P_126531","P_310922","P_126524",
"P_126537","A_23_P153480","P_126533","P_126541")
library(biomaRt)
ensembl <- useMart('ensembl', dataset = 'hsapiens_gene_ensembl')
tables <- listAttributes(ensembl)
tables[grep('agilent', tables[,1]),]
name description
127 agilent_cgh_44b AGILENT CGH 44b probe
128 agilent_gpl26966 AGILENT GPL26966 probe
129 agilent_gpl6848 AGILENT GPL6848 probe
130 agilent_sureprint_g3_ge_8x60k AGILENT SurePrint G3 GE 8x60k probe
131 agilent_sureprint_g3_ge_8x60k_v2 AGILENT SurePrint G3 GE 8x60k v2 probe
132 agilent_wholegenome AGILENT WholeGenome probe
133 agilent_wholegenome_4x44k_v1 AGILENT WholeGenome 4x44k v1 probe
134 agilent_wholegenome_4x44k_v2 AGILENT WholeGenome 4x44k v2 probe
page
127 feature_page
128 feature_page
129 feature_page
130 feature_page
131 feature_page
132 feature_page
133 feature_page
134 feature_page
Then find the correct array from the first column. Once found, perform the lookup:
annot <- getBM(
attributes = c('agilent_wholegenome',
'wikigene_description',
'ensembl_gene_id',
'entrezgene_id',
'gene_biotype',
'external_gene_name'),
filters = 'agilent_wholegenome',
values = probes,
mart = ensembl)
annot <- merge(
x = as.data.frame(probes),
y = annot,
by.y = 'agilent_wholegenome',
all.x = T,
by.x = 'probes')
annot
probes wikigene_description ensembl_gene_id entrezgene_id
1 A_14_P125183 <NA> <NA> NA
2 A_23_P153480 kallikrein related peptidase 5 ENSG00000167754 25818
3 A_24_P5750 kallikrein related peptidase 2 ENSG00000167751 3817
4 P_126524 <NA> <NA> NA
5 P_126531 <NA> <NA> NA
6 P_126533 <NA> <NA> NA
7 P_126537 <NA> <NA> NA
8 P_126541 <NA> <NA> NA
9 P_310922 <NA> <NA> NA
10 P_310924 <NA> <NA> NA
gene_biotype external_gene_name
1 <NA> <NA>
2 protein_coding KLK5
3 protein_coding KLK2
4 <NA> <NA>
5 <NA> <NA>
6 <NA> <NA>
7 <NA> <NA>
8 <NA> <NA>
9 <NA> <NA>
10 <NA> <NA>
Kevin
The Array version in Agilent-032034 VPC Human 180K v3 Kevin