It is possible to construct an ExpressionSet
from separate files. For example imagine you have three separate files, one for the expression data (e.g. from RMA as you mentioned), another with probe annotations (like your csv one) and another with sample annotations. For example:
#===========================
# phenoData - sample annotations.
# pdata.txt (comma separated)
#
# id,treatment
# sample1,1
# sample2,1
# sample3,0
# sample4,0
#===========================
# featureData - probe annotations.
# fdata.txt (comma separated)
#
# id,symbol
# probe1,gene1
# probe2,gene2
# probe3,gene3
# probe4,gene4
#===========================
# expression data
#
# exprs.txt (tab delimited)
# sample1 sample2 sample3 sample4
# probe1 10 9 11 8
# probe2 10 11 2 1
# probe3 2 3 12 10
# probe4 1 3 2 1
I assumed the annotations to be comma separated value files and the expression data to be tab separated file. But this does not matter- it only changes the R function used to read it. You can do then something like this:
library(Biobase)
# phenoData:
tmp <- read.csv("pdata.txt", row.names = 1)
pdata <- AnnotatedDataFrame(tmp)
# featureData:
tmp <- read.csv("fdata.txt", row.names = 1)
fdata <- AnnotatedDataFrame(tmp)
# expression data:
tmp <- read.table("exprs.txt")
m <- as.matrix(tmp)
## create ExpressionSet object:
eset <- new("ExpressionSet", exprs = m, phenoData = pdata, featureData = fdata)
pData(eset)
fData(eset)
eset$treatment
The only requirement (I think) is that the sample names and feature names agree between the different files.
Thank you.
So if I wanted to add the csv file as a parameter in the expression: eset <- new("ExpressionSet", exprs=as.matrix(rma.file)), I would read in the csv file using the read.csv, then add it like this: eset <- new("ExpressionSet", exprs=as.matrix(rma.file), annotation = cvs.file)?
Dear Vani,
please excuse me because i misread your second part. By the annotation of the specific affy chip you mean the pheno data object ? that is, the phenotype of your data ? if so, you could use :
read.csv to read the csv file and convert it into a data.frame object(can be made after read.csv with the function as.data.frame) and the if the object is called for instance dat2:
phenoData(eset) <- new("AnnotatedDataFrame", data=dat2)
The csv file contains the annotation of the HuGene-1_0-st-v1 affymetrix chip. So basically it has all the info like enterzID, GeneSymbol etc. Would it still be considered to be a pheno data object?
No, in my opinion there is no need to load it in r, as this is your annotation file, which you could use after your statistical analysis, to annotate your results. By "pheno data" object i meant the description of your samples: i.e disease, healthy, cancer, control etc. Moreover, although i have never used this specific platform of Affymetrix you could find useful the specific package of the specific HuGene platform (http://bioconductor.org/packages/release/data/annotation/html/pd.hugene.1.0.st.v1.html)
Cool. Thanks.