Dear Bioconductor community,
I'm making a gene expression analysis of a subset of 145 samples (affymetrix hgu133a) from a cohort of lung cancers found in GEO.
I made a strange observation while cutting the cohort in two based on a median expression level for each probe: for several probes the cohort was not divided in 2 sub groups with equal number of patients as it should be. For example the cohort was divided in 30% of patients below and 70% above the median expression level of probe "212970_at". I found out that for many patients the expression intensity was exactly equal to the median level. I can't figure out why? I checked, the raw data are different for each sample.
I just import the data with ReadAffy (1.52.0) function from affy package and normalize the data with gcrma (2.46.0) function and that's all.
Have you already observed something similar?
If any information or data is missing please just tell me.
Thank you for your help,