Question

Convert Agilent Probe ID to gene IDs/Symbols

0

Entering edit mode

rohitsatyam102 ▴ 20

@rohitsatyam102-24390

Last seen 5 weeks ago

India

Hi @James W. MacDonald. I have these Agilent Probe IDs and I wish to convert them to Gene IDs/ Gene Symbols. I tried using the annotationhub package but it's failing. The code used is provided below:

I amnot sure how to convert these alpha-neumeric probeIDs to numeric IDs. Please help


genes<-c("A_14_P125183","P_310924","A_24_P5750","P_126531","P_310922","P_126524","P_126537","A_23_P153480","P_126533","P_126541")
mapIds(hugene20sttranscriptcluster.db, genes, "SYMBOL", "PROBEID")

It throws the following error

Error in .testForValidKeys(x, keys, keytype, fks) : 
  None of the keys entered are valid keys for 'PROBEID'. Please use the keys method to see a listing of valid arguments.

AnnotationHub biomaRt • 4.2k views

ADD COMMENT • link updated 20 months ago by alvea2008667 • 0 • written 3.1 years ago by rohitsatyam102 ▴ 20

0

Entering edit mode

The Array version in Agilent-032034 VPC Human 180K v3 Kevin

ADD REPLY • link 3.1 years ago rohitsatyam102 ▴ 20

score 0 · Answer 1 · 2021-03-24

Hi, it would greatly help if you mention the Agilent array version, or, at least, that you state the source of the data that you have retrieved.

The hugene20sttranscriptcluster.db package, is for the Affymetrix HuGene 2.0 ST Array.

It is likely possible that you can create your own annotation package via AnnotationForge; however, for Agilent arrays, one can always use biomaRt:

probes <- c("A_14_P125183","P_310924","A_24_P5750",
  "P_126531","P_310922","P_126524",
  "P_126537","A_23_P153480","P_126533","P_126541")

library(biomaRt)
ensembl <- useMart('ensembl', dataset = 'hsapiens_gene_ensembl')

tables <- listAttributes(ensembl)
tables[grep('agilent', tables[,1]),]
                                name                            description
127                  agilent_cgh_44b                  AGILENT CGH 44b probe
128                 agilent_gpl26966                 AGILENT GPL26966 probe
129                  agilent_gpl6848                  AGILENT GPL6848 probe
130    agilent_sureprint_g3_ge_8x60k    AGILENT SurePrint G3 GE 8x60k probe
131 agilent_sureprint_g3_ge_8x60k_v2 AGILENT SurePrint G3 GE 8x60k v2 probe
132              agilent_wholegenome              AGILENT WholeGenome probe
133     agilent_wholegenome_4x44k_v1     AGILENT WholeGenome 4x44k v1 probe
134     agilent_wholegenome_4x44k_v2     AGILENT WholeGenome 4x44k v2 probe
            page
127 feature_page
128 feature_page
129 feature_page
130 feature_page
131 feature_page
132 feature_page
133 feature_page
134 feature_page

Then find the correct array from the first column. Once found, perform the lookup:

annot <- getBM(
  attributes = c('agilent_wholegenome',
    'wikigene_description',
    'ensembl_gene_id',
    'entrezgene_id',
    'gene_biotype',
    'external_gene_name'),
  filters = 'agilent_wholegenome',
  values = probes,
  mart = ensembl)

annot <- merge(
  x = as.data.frame(probes),
  y =  annot,
  by.y = 'agilent_wholegenome',
  all.x = T,
  by.x = 'probes')

annot
         probes           wikigene_description ensembl_gene_id entrezgene_id
1  A_14_P125183                           <NA>            <NA>            NA
2  A_23_P153480 kallikrein related peptidase 5 ENSG00000167754         25818
3    A_24_P5750 kallikrein related peptidase 2 ENSG00000167751          3817
4      P_126524                           <NA>            <NA>            NA
5      P_126531                           <NA>            <NA>            NA
6      P_126533                           <NA>            <NA>            NA
7      P_126537                           <NA>            <NA>            NA
8      P_126541                           <NA>            <NA>            NA
9      P_310922                           <NA>            <NA>            NA
10     P_310924                           <NA>            <NA>            NA
     gene_biotype external_gene_name
1            <NA>               <NA>
2  protein_coding               KLK5
3  protein_coding               KLK2
4            <NA>               <NA>
5            <NA>               <NA>
6            <NA>               <NA>
7            <NA>               <NA>
8            <NA>               <NA>
9            <NA>               <NA>
10           <NA>               <NA>

Kevin