I am new to microarray analysis and have been assigned a rather challenging project. I'd like to re-analyze published microarray data from GEO. Essentially I would like to do a differential gene expression meta analysis, between diseased and healthy, across different experiments, chips and species (mouse, human).
I more or less just started and found the bioconductor package GEOquery very useful to import the data. Unfortunately I do not get how the obtained values are produced.
I download the data via
series.hsa.GSE28475 <- getGEO(GEO = "GSE28475",GSEMatrix=TRUE)
and access the expression values via
eset <- series.hsa.GSE28475[] exprsVals <- exprs(eset)
In parallel I also downloaded the files directly, either via the browser or via
getGEOSuppFiles("GSE28475", fetch_files = TRUE)
In this case the files contain quantile_normalized data (according to the file name). If I compare the values in the file with the values from the expression set from the GEOquery import, they differ.
Therefore I wonder how the data in the expression set from the GEOquery are preprocessed, in terms of background subtraction, transformation, normalization, etc.? and if this is either constant across all series or at least is found with in the expression set.
Later on I would like to do the missing preprocessing steps on my own, using lumi or affy (depending on the platform), doing batch effect correction (ComBat) and eventually use limma for dge analysis.
Thank you very much, for all comments and help.