Question: How to combine expression values of multiple probes for one gene
gravatar for ayanava18
2.2 years ago by
ayanava1810 wrote:

I am a bit new to R Bioconductor and microarray analysis.

I have loaded a GEO series matrix file (GSE2990) from GEO database in R Bioconductor.  This dataset contain expression values of 22283 probes. I wish to get the expression values for the genes for the dataset. Since, there are multiple probes for an individual gene in many cases, I would like to know if there is a package /R code that can combine the expression values of multiple probes for the same gene. Also does oneChannel GUI has this feature?


ADD COMMENTlink modified 2.2 years ago by svlachavas560 • written 2.2 years ago by ayanava1810
gravatar for svlachavas
2.2 years ago by
Greece/Athens/National Hellenic Research Foundation
svlachavas560 wrote:

Dear Ayanava,

firstly, which specific platform does your experiment use ? Secondly, handling duplicate probesets(and i believe you meant probesets after normalization)-that is probesets that map to the same gene- is a very complex procedure by the way that there are so many options, but which is the most promiscuous remains challenging. For instance, you could use the average or the median value of these duplicated probesets. On the other hand, in my opinion[and also many people will also provide other more useful or alternative suggestions] every probeset represents in a simple way a gene, since every probeset(with its associate probes) interogate an expressed sequence. Some probes may not be annotated with any of them or may also associate with multiple potential target sequences, and most important other genes may be represented by different probesets, each of them more possibly interogates a different mRNA transcript of these genes. Thus, in my opinion, i believe it is more wise to not choose average across different probesets mapping to the same gene, because it may be also possible that these probesets map to the same gene, but recognize a different transcript-so they could correspond to alternative transcripts or splice forms, which may not absolutely correlate-. Personally, with Affymetrix and Illumina oligonucleotide arrays i have worked i used the Median Absolut Deviation, which is a measure of dispersion, robust to outliers. So you can use the following example:

Probesets=paste("a",1:200,sep="") # "fake probesets"

So in the end Y is a data.frame which includes unique gene symbols linked to the probesets, with the highest MAD in each case of the duplicates

To sum up, it is sure that the best option is far more complex and depends also on the study and the various tools you could use




ADD COMMENTlink written 2.2 years ago by svlachavas560

Also you can check this post for more "formal options" : A: eset annotation issues, plus generate heatmap with correct gene symbol as row la

ADD REPLYlink written 2.2 years ago by svlachavas560

Also i forgot to mention that the argument Value linked to MAD you can acquire from your expression set like this:

MAD <- apply(exprs(eset),1,mad) # where eset your expression set

ADD REPLYlink written 2.2 years ago by svlachavas560
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 176 users visited in the last hour