The next step of my analysis is to get the pathway of the gene to make the feature selection. Is getting the same gene for more than one probe id right? . If it is right, what is the best to do for probes with the same gene name in the feature selection ?
Usually those probes are distributed among several points of the microarray, or the probes binds to different transcripts of the gene. I am not sure of the best method, that depends on what do you want to do. But usually I make the mean of them
It is correct that you should get multiple probes for some genes, and Lluis is correct that in general some are meant to target specific single transcripts, some a subset of transcripts, and some probes intend to target all transcripts of a gene. In the case of AKT3, it looks as though there are alternative 3' UTRs and (presumably) since the platform preferentially places probes in the 3' end, this leads to two probes.
Later there is a probe that looks to be targeting an exon that is expected to be expressed in all isoforms, while the fourth probe appears to be spanning an intron and so might be informative for splicing events. In other genes there may be probes that cater for germline polymorphisms and so forth.
In general I would caution against taking the mean of all probes. AKT is a tricky case in having two 3' probes. For many genes there can be one probe (often the 3'-most one) that provides the best summary and then others that can be used to highlight specific transcripts or events. Taking the mean of all probes can dilute the signal of the good summary, and certainly can reduce the dynamic range of the signal with consequences for comparisons with other genes (See a brief discussion of an example in Dunning et al. PMID: 20688273). Where two probes target distinct subsets of transcripts there is an argument for adding their signals rather than averaging them (or simply treat them as separate genes in an analysis) if they are to be further combined with more probes in a summary. Note that there may be no truly satisfactory approach to producing a gene-level summary that applies to all genes in all experiments and tissue-types (e.g. probes covering common SNPs may be the best summaries in experiments conducted within one cell-line, but problematic across a population of heterogeneous individuals), so my advice is to choose something sensible and then drill into any interesting results at the end to ensure that you have not been misled by this step.
(It almost goes without saying that one must consider at what stage relative to log-transformation and normalization - should they be being performed - one would wish to take the mean.)
The data set I have is normalized before the analysis using the proposed technique 10.1093/bioinformatics/btq118 .
I am now doing analysis based on pathways.The analysis includes 48,804 probes, some of them does not match to a gene. I first get the gene of the probe, then get the pathways of the gene. I am doing a feature selection based on the pathway. So, all the probes of a gene will be selected in the same group of the pathway related to the gene.
Usually those probes are distributed among several points of the microarray, or the probes binds to different transcripts of the gene. I am not sure of the best method, that depends on what do you want to do. But usually I make the mean of them
Hello, Lluis,
Thanks for your reply. I will try making the mean.