Question: Creating an Expression Set with a csv file
1
4.3 years ago by
Vani20
United States
Vani20 wrote:

Hi,

Is it possible to create an expression set from 1. a text file containing an the results of an RMA normalization 2. a csv file containing the annotation for the specific affy chip pertaining to the normalized data. If yes please advise.

Thanks.

annotation eset • 5.4k views
modified 4.3 years ago by Diego Diez730 • written 4.3 years ago by Vani20
Answer: Creating an Expression Set with a csv file
2
4.3 years ago by
Diego Diez730
Japan
Diego Diez730 wrote:

It is possible to construct an ExpressionSet from separate files. For example imagine you have three separate files, one for the expression data (e.g. from RMA as you mentioned), another with probe annotations (like your csv one) and another with sample annotations. For example:

#===========================
# phenoData - sample annotations.
# pdata.txt (comma separated)
#
# id,treatment
# sample1,1
# sample2,1
# sample3,0
# sample4,0

#===========================
# featureData - probe annotations.
# fdata.txt (comma separated)
#
# id,symbol
# probe1,gene1
# probe2,gene2
# probe3,gene3
# probe4,gene4

#===========================
# expression data
#
# exprs.txt (tab delimited)
# sample1 sample2 sample3 sample4
# probe1 10 9 11 8
# probe2 10 11 2 1
# probe3 2 3 12 10
# probe4 1 3 2 1


I assumed the annotations to be comma separated value files and the expression data to be tab separated file. But this does not matter- it only changes the R function used to read it. You can do then something like this:

library(Biobase)

# phenoData:
tmp <- read.csv("pdata.txt", row.names = 1)
pdata <- AnnotatedDataFrame(tmp)

# featureData:
tmp <- read.csv("fdata.txt", row.names = 1)
fdata <- AnnotatedDataFrame(tmp)

# expression data:
m <- as.matrix(tmp)

## create ExpressionSet object:
eset <- new("ExpressionSet", exprs = m, phenoData = pdata, featureData = fdata)

pData(eset)
fData(eset)
eset\$treatment


The only requirement (I think) is that the sample names and feature names agree between the different files.

Answer: Creating an Expression Set with a csv file
1
4.3 years ago by
svlachavas740
Greece/Athens/National Hellenic Research Foundation
svlachavas740 wrote:

Dear Vani,

suppose for the first case of the txt file, the file has headers with the sample names and the rows are the probesets.

You can use then in your current directory:

eset <- new("ExpressionSet", exprs=as.matrix(rma.file))

In similar way, you can use read,csv for the csv file . You can check also in more detail the above functions, including read.delim

Thank you.

So if I wanted to add the csv file as a parameter in the expression: eset <- new("ExpressionSet", exprs=as.matrix(rma.file)), I would read in the csv file using the read.csv, then add it like this: eset <- new("ExpressionSet", exprs=as.matrix(rma.file), annotation = cvs.file)?

Dear Vani,

please excuse me because i misread your second part. By the annotation of the specific affy chip you mean the pheno data object ? that is, the phenotype of your data ? if so, you could use :

read.csv to read the csv file and convert it into a data.frame object(can be made after read.csv with the function as.data.frame) and the if the object is called  for instance dat2:

phenoData(eset) <- new("AnnotatedDataFrame", data=dat2)

The csv file contains the annotation of the HuGene-1_0-st-v1 affymetrix chip. So basically it has all the info like enterzID, GeneSymbol etc. Would it still be considered  to be a pheno data object?

No, in my opinion there is no need to load it in r, as this is your annotation file, which you could use after your statistical analysis, to annotate your results. By "pheno data" object i meant the description of your samples: i.e disease, healthy, cancer, control etc. Moreover, although i have never used this specific platform of Affymetrix you could find useful the specific package of the specific HuGene platform (http://bioconductor.org/packages/release/data/annotation/html/pd.hugene.1.0.st.v1.html)