Question: Get the same gene for multiple Probe ID
gravatar for gamal.elkomy
19 months ago by
gamal.elkomy0 wrote:

Hi, all

I am very new to bioconductor. I am developing a feature selection technique based on pathways.

When I am trying to get the gene id from the ProbeID illumina gene expression, I get the same gene for more than one probe ID.

For example:

res=data.frame(Gene=unlist(mapIds(illuminaHumanv4.db, probeID,"SYMBOL","PROBEID", ifnotfound=list(NA))))

The gene id for the probes

ILMN_2325610   AKT3
ILMN_1733598   AKT3
ILMN_2325612   AKT3
ILMN_1757130   AKT3

The next step of my analysis is to get the pathway of the gene to make the feature selection. Is getting the same gene for more than one probe id right? . If it is right, what is the best to do for probes with the same gene name in the feature selection ?


ADD COMMENTlink modified 19 months ago by Andy Lynch100 • written 19 months ago by gamal.elkomy0

Usually those probes are distributed among several points of the microarray, or the probes binds to different transcripts of the gene. I am not sure of the best method, that depends on what do you want to do. But usually I make the mean of them

ADD REPLYlink written 19 months ago by Lluís R330

Hello, Lluis,

Thanks for your reply. I will try making the mean.

ADD REPLYlink written 19 months ago by gamal.elkomy0
gravatar for Andy Lynch
19 months ago by
Andy Lynch100
United Kingdom
Andy Lynch100 wrote:

It is correct that you should get multiple probes for some genes, and Lluis is correct that in general some are meant to target specific single transcripts, some a subset of transcripts, and some probes intend to target all transcripts of a gene. In the case of AKT3, it looks as though there are alternative 3' UTRs and (presumably) since the platform preferentially places probes in the 3' end, this leads to two probes.

Later there is a probe that looks to be targeting an exon that is expected to be expressed in all isoforms, while the fourth probe appears to be spanning an intron and so might be informative for splicing events. In other genes there may be probes that cater for germline polymorphisms and so forth.

In general I would caution against taking the mean of all probes. AKT is a tricky case in having two 3' probes. For many genes there can be one probe (often the 3'-most one) that provides the best summary and then others that can be used to highlight specific transcripts or events. Taking the mean of all probes can dilute the signal of the good summary, and certainly can reduce the dynamic range of the signal with consequences for comparisons with other genes (See a brief discussion of an example in Dunning et al. PMID: 20688273). Where two probes target distinct subsets of transcripts there is an argument for adding their signals rather than averaging them (or simply treat them as separate genes in an analysis) if they are to be further combined with more probes in a summary. Note that there may be no truly satisfactory approach to producing a gene-level summary that applies to all genes in all experiments and tissue-types (e.g. probes covering common SNPs may be the best summaries in experiments conducted within one cell-line, but problematic across a population of heterogeneous individuals), so my advice is to choose something sensible and then drill into any interesting results at the end to ensure that you have not been misled by this step. 

(It almost goes without saying that one must consider at what stage relative to log-transformation and normalization - should they be being performed - one would wish to take the mean.)

ADD COMMENTlink written 19 months ago by Andy Lynch100

Hello Andy,

Thank you so much for your informative reply.

The data set I have is normalized before the analysis using the proposed technique 10.1093/bioinformatics/btq118 . 

I am now doing analysis based on pathways.The analysis includes 48,804 probes, some of them does not match to a gene. I first get the gene of the probe, then get the pathways of the gene. I am doing a feature selection based on the pathway. So, all the probes of a gene will be selected in the same group of the pathway related to the gene.

What do you suggest in this case?

The data set for a probe of the AKT3 will be like 

geneid, geneName, ProbeID, Pathway

10000 AKT3 ILMN_1757130 hsa04010
10000 AKT3 ILMN_1757130 hsa05211
10000 AKT3 ILMN_1757130 hsa04662
10000 AKT3 ILMN_1757130 hsa04664
10000 AKT3 ILMN_1757130 hsa04150
10000 AKT3 ILMN_1757130 hsa05212
10000 AKT3 ILMN_1757130 hsa04370
10000 AKT3 ILMN_1757130 hsa05223
10000 AKT3 ILMN_1757130 hsa05142
10000 AKT3 ILMN_1757130 hsa05210
10000 AKT3 ILMN_1757130 hsa04666
10000 AKT3 ILMN_1757130 hsa05200
10000 AKT3 ILMN_1757130 hsa05160
10000 AKT3 ILMN_1757130 hsa05220
10000 AKT3 ILMN_1757130 hsa04380
10000 AKT3 ILMN_1757130 hsa04630
10000 AKT3 ILMN_1757130 hsa04510
10000 AKT3 ILMN_1757130 hsa04660
10000 AKT3 ILMN_1757130 hsa04062
10000 AKT3 ILMN_1757130 hsa05214
10000 AKT3 ILMN_1757130 hsa05218
10000 AKT3 ILMN_1757130 hsa05221
10000 AKT3 ILMN_1757130 hsa04620
10000 AKT3 ILMN_1757130 hsa05215
10000 AKT3 ILMN_1757130 hsa04210
10000 AKT3 ILMN_1757130 hsa05222
10000 AKT3 ILMN_1757130 hsa05145
10000 AKT3 ILMN_1757130 hsa04530
10000 AKT3 ILMN_1757130 hsa04920
10000 AKT3 ILMN_1757130 hsa05213
10000 AKT3 ILMN_1757130 hsa04910
10000 AKT3 ILMN_1757130 hsa04012
10000 AKT3 ILMN_1757130 hsa04914
10000 AKT3 ILMN_1757130 hsa04973
10000 AKT3 ILMN_1757130 hsa04722
ADD REPLYlink written 19 months ago by gamal.elkomy0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 269 users visited in the last hour