unusal expression level values after normalization in Affymetrix microarray experiment
2
0
Entering edit mode
@amos-kirilovsky-5407
Last seen 7.7 years ago

Dear Bioconductor community,

I'm making a gene expression analysis of a subset of 145 samples (affymetrix hgu133a) from a cohort of lung cancers found in GEO.

I made a strange observation while cutting the cohort in two based on a median expression level for each probe: for several probes the cohort was not divided in 2 sub groups with equal number of patients as it should be. For example the cohort was divided in 30% of patients below and 70% above the median expression level of probe "212970_at". I found out that for many patients the expression intensity was exactly equal to the median level. I can't figure out why? I checked, the raw data are different for each sample.

I just import the data with ReadAffy (1.52.0) function from affy package and normalize the data with gcrma (2.46.0) function and that's all.

Have you already observed something similar?

If any information or data is missing please just tell me.

Thank you for your help,

Amos Kirilovsky

microarray gcrma affy hgu133a normalization • 1.6k views
ADD COMMENT
0
Entering edit mode

Thank you Wolfgang and Gordon for your answer. The option fast = FALSE in gcrma did the trick. I plotted the intensities of one probe after I run the full gcrma algorithm and the ad hoc approximation. As you can see many ties (53) were generated with the ad hoc approximation but not with the full algorithm. I’m not a specialist but the correlation between both method doesn’t seem very high (R2 =0.77). Should we worried about that? If yes maybe the fast option should be False by default. Is it possible that the generated ties have a bad influence in some kind of analysis (e.g. survival)? I didn’t find yet any documentation about the differences between the two methods.

I also plotted the log 2 intensities of PM from the same probe set than earlier, the median value, and ad hoc approximation and full gcrma algorithm.

I would expected at least a small correlation between median raw intensities and gcrma processed data. But I suppose this is not new. The value with full algorithm are lower than with ad hoc approximation. The main difference is generated during the background subtraction?

Thanks again,

Amos

ADD REPLY
2
Entering edit mode
@gordon-smyth
Last seen 21 hours ago
WEHI, Melbourne, Australia

When you run gcrma(), try setting option fast=FALSE. This will cause it to run the full gcrma algorithm instead of an ad hoc approximation. In my experience, this solves the sort of problems you mention.

The gcrma publications only document the full algorithm. The "fast" option isn't described anywhere and no doubt was only implemented because computers tended to be slower 13 years ago.

You might say that fast=FALSE should be the default now, and I'd agree, but I'm not a gcrma author. I wouldn't use gcrma myself without setting fast=FALSE.

ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 4 months ago
EMBL European Molecular Biology Laborat…

gcrma is a sophisticated and numerically somewhat complex algorithm; it is possible that ties might be induced as you describe.

I'd recommend:

Wolfgang

ADD COMMENT

Login before adding your answer.

Traffic: 626 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6