I retrieved raw expression values from geo series supplementary files and did a normalization:
treatment1Samples <- c("GSM437316","GSM437286","GSM437305","GSM437297","GSM437277","GSM437282","GSM437269","GSM437302")
treatment2Samples <- c("GSM437147","GSM437138","GSM437311","GSM437218","GSM437114","GSM437234","GSM437165","GSM437299")
sampleNames <- c(treatment1Samples, treatment2Samples)
rawData <- read.celfiles(pathSampleCELfiles)
normData <- rma(rawData)
As the resulting expressionset didn't have the complete sample metadata (phenoData) I retrieved it from GEO with:
seriesMatrixFilesList <- getGEO(GEOId, GSEMatrix=T)
# following selects data of platform we want (there are two in this experiment):
for (i in 1:length(seriesMatrixFilesList)) {
if (as.character(unique(seriesMatrixFilesList[[i]]$platform_id)) == GPLId) {
seriesMatrixFile <- seriesMatrixFilesList[[i]]
}
}
pheno <- phenoData(seriesMatrixFile) # gets complete phenoData of the study
pheno <- pheno[sampleNames,] # pheno has metadata for all samples, so we select just the samples we want
and then added it to normalized expression values:
phenoData(normData) <- pheno
The rownames of phenodata has the samples names:
rownames(phenoData(normData))
[1] "GSM437316" "GSM437286" "GSM437305" "GSM437297" "GSM437277" "GSM437282" "GSM437269" "GSM437302" "GSM437147" "GSM437138" "GSM437311"
[12] "GSM437218" "GSM437114" "GSM437234" "GSM437165" "GSM437299"
But column names of assaydata are file names:
colnames(exprs(normData))
[1] "GSM437114.CEL.gz" "GSM437138.CEL.gz" "GSM437147.CEL.gz" "GSM437165.CEL.gz" "GSM437218.CEL.gz" "GSM437234.CEL.gz" "GSM437269.CEL.gz"
[8] "GSM437277.CEL.gz" "GSM437282.CEL.gz" "GSM437286.CEL.gz" "GSM437297.CEL.gz" "GSM437299.CEL.gz" "GSM437302.CEL.gz" "GSM437305.CEL.gz"
[15] "GSM437311.CEL.gz" "GSM437316.CEL.gz"
I tried to change assaydata column names to sample names:
newNames<- unlist(lapply(strsplit(colnames(exprs(normData)), '(\\.)|(_)'), function(x) x[1]))
newNames
[1] "GSM437114" "GSM437138" "GSM437147" "GSM437165" "GSM437218" "GSM437234" "GSM437269" "GSM437277" "GSM437282" "GSM437286" "GSM437297"
[12] "GSM437299" "GSM437302" "GSM437305" "GSM437311" "GSM437316"
colnames(exprs(normData)) <- newNames
And got this error:
Error in (function (od, vd) :
object and replacement value dimnames differ
although number of element of newNames
is same as of colnames(exprs(normData))
- How can I solve this?
- Also, does the order of column names in assay data has to be same as row names in phenoData?
Note: Above I din't do colnames(exprs(normData)) <- rownames(phenoData(normData))
cause wanted to keep assayData column names order, which was different from rownames order in phenoData
it was easier than I though. thank you!