Dear Bioconductor community,

I'm making a gene expression analysis of a subset of 145 samples (affymetrix hgu133a) from a cohort of lung cancers found in GEO.

I made a strange observation while cutting the cohort in two based on a median expression level for each probe: for several probes the cohort was not divided in 2 sub groups with equal number of patients as it should be. For example the cohort was divided in 30% of patients below and 70% above the median expression level of probe "212970_at". I found out that for many patients the expression intensity was exactly equal to the median level. I can't figure out why? I checked, the raw data are different for each sample.

I just import the data with ReadAffy (1.52.0) function from affy package and normalize the data with gcrma (2.46.0) function and that's all.

Have you already observed something similar?

If any information or data is missing please just tell me.

Thank you for your help,

Amos Kirilovsky

Thank you Wolfgang and Gordon for your answer. The option fast = FALSE in gcrma did the trick. I plotted the intensities of one probe after I run the full gcrma algorithm and the ad hoc approximation. As you can see many ties (53) were generated with the ad hoc approximation but not with the full algorithm. I’m not a specialist but the correlation between both method doesn’t seem very high (R2 =0.77). Should we worried about that? If yes maybe the fast option should be False by default. Is it possible that the generated ties have a bad influence in some kind of analysis (e.g. survival)? I didn’t find yet any documentation about the differences between the two methods.

I also plotted the log 2 intensities of PM from the same probe set than earlier, the median value, and ad hoc approximation and full gcrma algorithm.

I would expected at least a small correlation between median raw intensities and gcrma processed data. But I suppose this is not new. The value with full algorithm are lower than with ad hoc approximation. The main difference is generated during the background subtraction?

Thanks again,

Amos