Get the same gene for multiple Probe ID
1
0
Entering edit mode
@gamalelkomy-11914
Last seen 8.0 years ago

Hi, all

I am very new to bioconductor. I am developing a feature selection technique based on pathways.

When I am trying to get the gene id from the ProbeID illumina gene expression, I get the same gene for more than one probe ID.

For example:

res=data.frame(Gene=unlist(mapIds(illuminaHumanv4.db, probeID,"SYMBOL","PROBEID", ifnotfound=list(NA))))

The gene id for the probes

ILMN_2325610   AKT3
ILMN_1733598   AKT3
ILMN_2325612   AKT3
ILMN_1757130   AKT3

The next step of my analysis is to get the pathway of the gene to make the feature selection. Is getting the same gene for more than one probe id right? . If it is right, what is the best to do for probes with the same gene name in the feature selection ?

Regards

illuminahumanv4.db • 2.1k views
ADD COMMENT
0
Entering edit mode

Usually those probes are distributed among several points of the microarray, or the probes binds to different transcripts of the gene. I am not sure of the best method, that depends on what do you want to do. But usually I make the mean of them

ADD REPLY
0
Entering edit mode

Hello, Lluis,

Thanks for your reply. I will try making the mean.

ADD REPLY
1
Entering edit mode
Andy Lynch ▴ 120
@andy-lynch-6934
Last seen 8 months ago
United Kingdom

It is correct that you should get multiple probes for some genes, and Lluis is correct that in general some are meant to target specific single transcripts, some a subset of transcripts, and some probes intend to target all transcripts of a gene. In the case of AKT3, it looks as though there are alternative 3' UTRs and (presumably) since the platform preferentially places probes in the 3' end, this leads to two probes.

Later there is a probe that looks to be targeting an exon that is expected to be expressed in all isoforms, while the fourth probe appears to be spanning an intron and so might be informative for splicing events. In other genes there may be probes that cater for germline polymorphisms and so forth.

In general I would caution against taking the mean of all probes. AKT is a tricky case in having two 3' probes. For many genes there can be one probe (often the 3'-most one) that provides the best summary and then others that can be used to highlight specific transcripts or events. Taking the mean of all probes can dilute the signal of the good summary, and certainly can reduce the dynamic range of the signal with consequences for comparisons with other genes (See a brief discussion of an example in Dunning et al. PMID: 20688273). Where two probes target distinct subsets of transcripts there is an argument for adding their signals rather than averaging them (or simply treat them as separate genes in an analysis) if they are to be further combined with more probes in a summary. Note that there may be no truly satisfactory approach to producing a gene-level summary that applies to all genes in all experiments and tissue-types (e.g. probes covering common SNPs may be the best summaries in experiments conducted within one cell-line, but problematic across a population of heterogeneous individuals), so my advice is to choose something sensible and then drill into any interesting results at the end to ensure that you have not been misled by this step. 

(It almost goes without saying that one must consider at what stage relative to log-transformation and normalization - should they be being performed - one would wish to take the mean.)

ADD COMMENT
0
Entering edit mode

Hello Andy,

Thank you so much for your informative reply.

The data set I have is normalized before the analysis using the proposed technique 10.1093/bioinformatics/btq118 . 

I am now doing analysis based on pathways.The analysis includes 48,804 probes, some of them does not match to a gene. I first get the gene of the probe, then get the pathways of the gene. I am doing a feature selection based on the pathway. So, all the probes of a gene will be selected in the same group of the pathway related to the gene.

What do you suggest in this case?

The data set for a probe of the AKT3 will be like 

geneid, geneName, ProbeID, Pathway

10000 AKT3 ILMN_1757130 hsa04010
10000 AKT3 ILMN_1757130 hsa05211
10000 AKT3 ILMN_1757130 hsa04662
10000 AKT3 ILMN_1757130 hsa04664
10000 AKT3 ILMN_1757130 hsa04150
10000 AKT3 ILMN_1757130 hsa05212
10000 AKT3 ILMN_1757130 hsa04370
10000 AKT3 ILMN_1757130 hsa05223
10000 AKT3 ILMN_1757130 hsa05142
10000 AKT3 ILMN_1757130 hsa05210
10000 AKT3 ILMN_1757130 hsa04666
10000 AKT3 ILMN_1757130 hsa05200
10000 AKT3 ILMN_1757130 hsa05160
10000 AKT3 ILMN_1757130 hsa05220
10000 AKT3 ILMN_1757130 hsa04380
10000 AKT3 ILMN_1757130 hsa04630
10000 AKT3 ILMN_1757130 hsa04510
10000 AKT3 ILMN_1757130 hsa04660
10000 AKT3 ILMN_1757130 hsa04062
10000 AKT3 ILMN_1757130 hsa05214
10000 AKT3 ILMN_1757130 hsa05218
10000 AKT3 ILMN_1757130 hsa05221
10000 AKT3 ILMN_1757130 hsa04620
10000 AKT3 ILMN_1757130 hsa05215
10000 AKT3 ILMN_1757130 hsa04210
10000 AKT3 ILMN_1757130 hsa05222
10000 AKT3 ILMN_1757130 hsa05145
10000 AKT3 ILMN_1757130 hsa04530
10000 AKT3 ILMN_1757130 hsa04920
10000 AKT3 ILMN_1757130 hsa05213
10000 AKT3 ILMN_1757130 hsa04910
10000 AKT3 ILMN_1757130 hsa04012
10000 AKT3 ILMN_1757130 hsa04914
10000 AKT3 ILMN_1757130 hsa04973
10000 AKT3 ILMN_1757130 hsa04722
ADD REPLY

Login before adding your answer.

Traffic: 940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6