Dear all
I am new to microarray analysis and have been assigned a rather challenging project. I'd like to re-analyze published microarray data from GEO. Essentially I would like to do a differential gene expression meta analysis, between diseased and healthy, across different experiments, chips and species (mouse, human).
I more or less just started and found the bioconductor package GEOquery very useful to import the data. Unfortunately I do not get how the obtained values are produced.
I download the data via
series.hsa.GSE28475 <- getGEO(GEO = "GSE28475",GSEMatrix=TRUE)
and access the expression values via
eset <- series.hsa.GSE28475[[2]]
exprsVals <- exprs(eset)
In parallel I also downloaded the files directly, either via the browser or via
getGEOSuppFiles("GSE28475", fetch_files = TRUE)
In this case the files contain quantile_normalized data (according to the file name). If I compare the values in the file with the values from the expression set from the GEOquery import, they differ.
Therefore I wonder how the data in the expression set from the GEOquery are preprocessed, in terms of background subtraction, transformation, normalization, etc.? and if this is either constant across all series or at least is found with in the expression set.
Later on I would like to do the missing preprocessing steps on my own, using lumi or affy (depending on the platform), doing batch effect correction (ComBat) and eventually use limma for dge analysis.
Thank you very much, for all comments and help.
Thank you Sean for the quick and helpful answer. Especially for the link to the raw data. I was puzzled how GEOquery can return the raw data if in the Series entry only the normalized data are provided. I always underestimate how many places in GEO there are to "hide" data. Thank you very much.
The Series record contains the "Value" column from the Sample records, which in this case we are told contain the raw data. The "files" attached to the Series record are UNRELATED to the actual data in the Series data values and are included as "extra" info. Confusing--you bet!!