Question

Get the same gene for multiple Probe ID

0

Entering edit mode

gamal.elkomy • 0

@gamalelkomy-11914

Last seen 8.0 years ago

Hi, all

I am very new to bioconductor. I am developing a feature selection technique based on pathways.

When I am trying to get the gene id from the ProbeID illumina gene expression, I get the same gene for more than one probe ID.

For example:

res=data.frame(Gene=unlist(mapIds(illuminaHumanv4.db, probeID,"SYMBOL","PROBEID", ifnotfound=list(NA))))

The gene id for the probes

ILMN_2325610 AKT3
ILMN_1733598 AKT3
ILMN_2325612 AKT3
ILMN_1757130 AKT3

The next step of my analysis is to get the pathway of the gene to make the feature selection. Is getting the same gene for more than one probe id right? . If it is right, what is the best to do for probes with the same gene name in the feature selection ?

Regards

illuminahumanv4.db • 2.1k views

ADD COMMENT • link updated 8.0 years ago by Andy Lynch ▴ 120 • written 8.0 years ago by gamal.elkomy • 0

0

Entering edit mode

Usually those probes are distributed among several points of the microarray, or the probes binds to different transcripts of the gene. I am not sure of the best method, that depends on what do you want to do. But usually I make the mean of them

ADD REPLY • link 8.0 years ago Lluís Revilla Sancho ▴ 760

0

Entering edit mode

Hello, Lluis,

Thanks for your reply. I will try making the mean.

ADD REPLY • link 8.0 years ago gamal.elkomy • 0

score 1 · Answer 1 · 2016-11-26

It is correct that you should get multiple probes for some genes, and Lluis is correct that in general some are meant to target specific single transcripts, some a subset of transcripts, and some probes intend to target all transcripts of a gene. In the case of AKT3, it looks as though there are alternative 3' UTRs and (presumably) since the platform preferentially places probes in the 3' end, this leads to two probes.

Later there is a probe that looks to be targeting an exon that is expected to be expressed in all isoforms, while the fourth probe appears to be spanning an intron and so might be informative for splicing events. In other genes there may be probes that cater for germline polymorphisms and so forth.

In general I would caution against taking the mean of all probes. AKT is a tricky case in having two 3' probes. For many genes there can be one probe (often the 3'-most one) that provides the best summary and then others that can be used to highlight specific transcripts or events. Taking the mean of all probes can dilute the signal of the good summary, and certainly can reduce the dynamic range of the signal with consequences for comparisons with other genes (See a brief discussion of an example in Dunning et al. PMID: 20688273). Where two probes target distinct subsets of transcripts there is an argument for adding their signals rather than averaging them (or simply treat them as separate genes in an analysis) if they are to be further combined with more probes in a summary. Note that there may be no truly satisfactory approach to producing a gene-level summary that applies to all genes in all experiments and tissue-types (e.g. probes covering common SNPs may be the best summaries in experiments conducted within one cell-line, but problematic across a population of heterogeneous individuals), so my advice is to choose something sensible and then drill into any interesting results at the end to ensure that you have not been misled by this step.

(It almost goes without saying that one must consider at what stage relative to log-transformation and normalization - should they be being performed - one would wish to take the mean.)

10000	AKT3	ILMN_1757130	hsa04010
10000	AKT3	ILMN_1757130	hsa05211
10000	AKT3	ILMN_1757130	hsa04662
10000	AKT3	ILMN_1757130	hsa04664
10000	AKT3	ILMN_1757130	hsa04150
10000	AKT3	ILMN_1757130	hsa05212
10000	AKT3	ILMN_1757130	hsa04370
10000	AKT3	ILMN_1757130	hsa05223
10000	AKT3	ILMN_1757130	hsa05142
10000	AKT3	ILMN_1757130	hsa05210
10000	AKT3	ILMN_1757130	hsa04666
10000	AKT3	ILMN_1757130	hsa05200
10000	AKT3	ILMN_1757130	hsa05160
10000	AKT3	ILMN_1757130	hsa05220
10000	AKT3	ILMN_1757130	hsa04380
10000	AKT3	ILMN_1757130	hsa04630
10000	AKT3	ILMN_1757130	hsa04510
10000	AKT3	ILMN_1757130	hsa04660
10000	AKT3	ILMN_1757130	hsa04062
10000	AKT3	ILMN_1757130	hsa05214
10000	AKT3	ILMN_1757130	hsa05218
10000	AKT3	ILMN_1757130	hsa05221
10000	AKT3	ILMN_1757130	hsa04620
10000	AKT3	ILMN_1757130	hsa05215
10000	AKT3	ILMN_1757130	hsa04210
10000	AKT3	ILMN_1757130	hsa05222
10000	AKT3	ILMN_1757130	hsa05145
10000	AKT3	ILMN_1757130	hsa04530
10000	AKT3	ILMN_1757130	hsa04920
10000	AKT3	ILMN_1757130	hsa05213
10000	AKT3	ILMN_1757130	hsa04910
10000	AKT3	ILMN_1757130	hsa04012
10000	AKT3	ILMN_1757130	hsa04914
10000	AKT3	ILMN_1757130	hsa04973
10000	AKT3	ILMN_1757130	hsa04722