Expressionset from ArrayExpress processed data
2
0
Entering edit mode
@yovanny-izquierdo-nunez-3233
Last seen 9.6 years ago
Dear BioC users, I'm working with experiments from the ArrayExpress database and some of them do not provide the cell files, but instead the already processed data in a table fromat (esasy to read with read.delim, for instance). The PhenoData of the experiment comes separately in the sdrf file. Is there a way to create an expressionset object from these two? The ArrayExpress package only provides functions for creating an AffyBatch object from the raw data and the sdrf, adf and idf files; but has nothing so far to deal with the processed data. Thanks so much, Yovanny Instituto de Biotecnolog?a de las Plantas Universidad Central "Marta Abreu" de Las Villas Carretera a Camajuan? km 5?, Santa Clara, Villa Clara, Cuba Tel: 53 (42) 281257, 281268, 281693 Fax: 53 (42) 281329 Web: http://www.ibp.co.cu E-Mail: info at ibp.co.cu
ArrayExpress SANTA ArrayExpress SANTA • 1.8k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 6 weeks ago
United States
Hi Yovanny Yovanny Izquierdo N??ez <yovanny at="" ibp.co.cu=""> writes: > Dear BioC users, > > I'm working with experiments from the ArrayExpress database and some > of them do not provide the cell files, but instead the already > processed data in a table fromat (esasy to read with read.delim, for > instance). The PhenoData of the experiment comes separately in the > sdrf file. Is there a way to create an expressionset object from these > two? The ArrayExpress package only provides functions for creating an See the 'ExpressionSetIntroduction.pdf' in the Biobase package http://bioconductor.org/packages/2.3/bioc/html/Biobase.html I don't know how to parse the PhenoData into a data.frame, but once done likely you'll be able to do phenoData <- new("AnnotatedDataFrame", pData=PhenoData) eset <- new("ExpressionSet", exprs=exprs, phenoData==phenoData) Martin > AffyBatch object from the raw data and the sdrf, adf and idf files; > but has nothing so far to deal with the processed data. > > Thanks so much, > > Yovanny > > Instituto de Biotecnolog?a de las Plantas Universidad Central "Marta > Abreu" de Las Villas Carretera a Camajuan? km 5?, Santa Clara, Villa > Clara, Cuba Tel: 53 (42) 281257, 281268, 281693 Fax: 53 (42) 281329 > Web: http://www.ibp.co.cu E-Mail: info at ibp.co.cu > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
Hi Martin, Thank you for your suggestions. Here's an example of how to create a data.frame from a sdrf file as explained in 'ExpressionSetIntroduction.pdf' (provided that the file is in the current working directory): pData <- read.table("file.sdrf", row.names = 1, header = TRUE, sep = "\t") >From here it is possible to follow your suggestion. However, I found that my expression data contains 3 replicates per array, but these are not treated separately in the pData (I have 3 times as columns in the expression data as elements in each pData slot). So obviously I get the error: > eset <- new("ExpressionSet", exprs=exprs, phenoData=phenoData) Error in validObject(.Object) : invalid class "ExpressionSet" object: 1: sample numbers differ between assayData and phenoData invalid class "ExpressionSet" object: 2: sampleNames differ between assayData and phenoData In addition: Warning message: In sampleNames(assayData(object)) == sampleNames(phenoData(object)) : longer object length is not a multiple of shorter object length Any ideas of how can I make them match? Thanks Yovanny ________________________________________ De: Martin Morgan [mtmorgan at fhcrc.org] Enviado el: martes, 20 de enero de 2009 9:26 Para: Yovanny Izquierdo N??ez CC: bioconductor at stat.math.ethz.ch Asunto: Re: [BioC] Expressionset from ArrayExpress processed data Hi Yovanny Yovanny Izquierdo N??ez <yovanny at="" ibp.co.cu=""> writes: > Dear BioC users, > > I'm working with experiments from the ArrayExpress database and some > of them do not provide the cell files, but instead the already > processed data in a table fromat (esasy to read with read.delim, for > instance). The PhenoData of the experiment comes separately in the > sdrf file. Is there a way to create an expressionset object from these > two? The ArrayExpress package only provides functions for creating an See the 'ExpressionSetIntroduction.pdf' in the Biobase package http://bioconductor.org/packages/2.3/bioc/html/Biobase.html I don't know how to parse the PhenoData into a data.frame, but once done likely you'll be able to do phenoData <- new("AnnotatedDataFrame", pData=PhenoData) eset <- new("ExpressionSet", exprs=exprs, phenoData==phenoData) Martin > AffyBatch object from the raw data and the sdrf, adf and idf files; > but has nothing so far to deal with the processed data. > > Thanks so much, > > Yovanny > > Instituto de Biotecnolog?a de las Plantas Universidad Central "Marta > Abreu" de Las Villas Carretera a Camajuan? km 5?, Santa Clara, Villa > Clara, Cuba Tel: 53 (42) 281257, 281268, 281693 Fax: 53 (42) 281329 > Web: http://www.ibp.co.cu E-Mail: info at ibp.co.cu > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 Instituto de Biotecnolog?a de las Plantas Universidad Central "Marta Abreu" de Las Villas Carretera a Camajuan? km 5?, Santa Clara, Villa Clara, Cuba Tel: 53 (42) 281257, 281268, 281693 Fax: 53 (42) 281329 Web: http://www.ibp.co.cu E-Mail: info at ibp.co.cu
ADD REPLY
0
Entering edit mode
Hi, Audrey (the author of the ArrayExpress package) will probably be able to give a more precise answer, but I believe a capacity for the this package to return ExpressionSet objects from processed data in AE is planned and such an implementation may be available in the development branch (check bioc svn). Audrey? --Misha On 20 Jan 2009, at 14:26, Martin Morgan wrote: > Hi Yovanny > > Yovanny Izquierdo N??ez <yovanny at="" ibp.co.cu=""> writes: > >> Dear BioC users, >> >> I'm working with experiments from the ArrayExpress database and some >> of them do not provide the cell files, but instead the already >> processed data in a table fromat (esasy to read with read.delim, for >> instance). The PhenoData of the experiment comes separately in the >> sdrf file. Is there a way to create an expressionset object from >> these >> two? The ArrayExpress package only provides functions for creating >> an > > See the 'ExpressionSetIntroduction.pdf' in the Biobase package > > http://bioconductor.org/packages/2.3/bioc/html/Biobase.html > > I don't know how to parse the PhenoData into a data.frame, but once > done likely you'll be able to do > > phenoData <- new("AnnotatedDataFrame", pData=PhenoData) > eset <- new("ExpressionSet", exprs=exprs, phenoData==phenoData) > > Martin > >> AffyBatch object from the raw data and the sdrf, adf and idf files; >> but has nothing so far to deal with the processed data. >> >> Thanks so much, >> >> Yovanny >> >> Instituto de Biotecnolog?a de las Plantas Universidad Central "Marta >> Abreu" de Las Villas Carretera a Camajuan? km 5?, Santa Clara, Villa >> Clara, Cuba Tel: 53 (42) 281257, 281268, 281693 Fax: 53 (42) 281329 >> Web: http://www.ibp.co.cu E-Mail: info at ibp.co.cu >> >> _______________________________________________ Bioconductor mailing >> list Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M2 B169 > Phone: (206) 667-2793 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi all, I do not especially want to encourage everyone to use the development version of the package as I am still working on it and it is unstable. But, you can still try to use the version 1.3.6 of the package ArrayExpress. This version does offer functions for processed data (using getAE then getcolproc and then procset, see procset help). However; if there is a problem of mapping between the sdrf file and the expression files, the phenoData will not be created using the ArrayExpress package. This is a very specific situation and I do not think there is a good automated way to handle it without taking the risk of doing something wrong. I will be happy to have your feedback on these functions. Cheers, Audrey > Hi, > > Audrey (the author of the ArrayExpress package) will probably be able > to give a more precise answer, but I believe a capacity for the this > package to return ExpressionSet objects from processed data in AE is > planned and such an implementation may be available in the development > branch (check bioc svn). Audrey? > > --Misha > On 20 Jan 2009, at 14:26, Martin Morgan wrote: > >> Hi Yovanny >> >> Yovanny Izquierdo N??ez <yovanny at="" ibp.co.cu=""> writes: >> >>> Dear BioC users, >>> >>> I'm working with experiments from the ArrayExpress database and some >>> of them do not provide the cell files, but instead the already >>> processed data in a table fromat (esasy to read with read.delim, for >>> instance). The PhenoData of the experiment comes separately in the >>> sdrf file. Is there a way to create an expressionset object from >>> these >>> two? The ArrayExpress package only provides functions for creating >>> an >> >> See the 'ExpressionSetIntroduction.pdf' in the Biobase package >> >> http://bioconductor.org/packages/2.3/bioc/html/Biobase.html >> >> I don't know how to parse the PhenoData into a data.frame, but once >> done likely you'll be able to do >> >> phenoData <- new("AnnotatedDataFrame", pData=PhenoData) >> eset <- new("ExpressionSet", exprs=exprs, phenoData==phenoData) >> >> Martin >> >>> AffyBatch object from the raw data and the sdrf, adf and idf files; >>> but has nothing so far to deal with the processed data. >>> >>> Thanks so much, >>> >>> Yovanny >>> >>> Instituto de Biotecnolog?a de las Plantas Universidad Central "Marta >>> Abreu" de Las Villas Carretera a Camajuan? km 5?, Santa Clara, Villa >>> Clara, Cuba Tel: 53 (42) 281257, 281268, 281693 Fax: 53 (42) 281329 >>> Web: http://www.ibp.co.cu E-Mail: info at ibp.co.cu >>> >>> _______________________________________________ Bioconductor mailing >>> list Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >>> archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> Martin Morgan >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M2 B169 >> Phone: (206) 667-2793 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
@martin-morgan-1513
Last seen 6 weeks ago
United States
Yovanny Izquierdo N??ez <yovanny at="" ibp.co.cu=""> writes: > Hi Martin, > > Thank you for your suggestions. Here's an example of how to create a > data.frame from a sdrf file as explained in > 'ExpressionSetIntroduction.pdf' (provided that the file is in the > current working directory): > > pData <- read.table("file.sdrf", row.names = 1, header = TRUE, sep = "\t") > > From here it is possible to follow your suggestion. > > However, I found that my expression data contains 3 replicates per > array, but these are not treated separately in the pData (I have 3 > times as columns in the expression data as elements in each pData > slot). So obviously I get the error: > >> eset <- new("ExpressionSet", exprs=exprs, phenoData=phenoData) > Error in validObject(.Object) : > invalid class "ExpressionSet" object: 1: sample numbers differ between assayData and phenoData > invalid class "ExpressionSet" object: 2: sampleNames differ between assayData and phenoData > In addition: Warning message: > In sampleNames(assayData(object)) == sampleNames(phenoData(object)) : > longer object length is not a multiple of shorter object length > > > Any ideas of how can I make them match? Hi Yovanny -- There must be as many rows in pData as there are columns in exprs, and the rows of pData must correspond to the columns of exprs. If exprs has two arrays A, B and replicates 1, 2, 3, with columns A1 B1 A2 B2 A3 B3 then you might pData3 <- cbind(rbind(pData, pData, pData), Replicate=rep(1:2, each=3)) this binds three copies of pData together by row, and then adds a column to indicate which replicate each row represents. Then use pData=pData3 when creating the ExpressionSet. It might be necessary to adjust the row.names of pData3 to match the colnames of exprs, e.g., row.names(pData3) <- colnames(exprs) These are just suggestions; you'll have to manipulate pData and exprs in a way that makes sense for the ExpressionSet and pData you actually have. Hope that helps, Martin > Thanks > Yovanny > > > ________________________________________ > De: Martin Morgan [mtmorgan at fhcrc.org] > Enviado el: martes, 20 de enero de 2009 9:26 > Para: Yovanny Izquierdo N??ez > CC: bioconductor at stat.math.ethz.ch > Asunto: Re: [BioC] Expressionset from ArrayExpress processed data > > Hi Yovanny > > Yovanny Izquierdo N??ez <yovanny at="" ibp.co.cu=""> writes: > >> Dear BioC users, >> >> I'm working with experiments from the ArrayExpress database and some >> of them do not provide the cell files, but instead the already >> processed data in a table fromat (esasy to read with read.delim, for >> instance). The PhenoData of the experiment comes separately in the >> sdrf file. Is there a way to create an expressionset object from these >> two? The ArrayExpress package only provides functions for creating an > > See the 'ExpressionSetIntroduction.pdf' in the Biobase package > > http://bioconductor.org/packages/2.3/bioc/html/Biobase.html > > I don't know how to parse the PhenoData into a data.frame, but once > done likely you'll be able to do > > phenoData <- new("AnnotatedDataFrame", pData=PhenoData) > eset <- new("ExpressionSet", exprs=exprs, phenoData==phenoData) > > Martin > >> AffyBatch object from the raw data and the sdrf, adf and idf files; >> but has nothing so far to deal with the processed data. >> >> Thanks so much, >> >> Yovanny >> >> Instituto de Biotecnolog?a de las Plantas Universidad Central "Marta >> Abreu" de Las Villas Carretera a Camajuan? km 5?, Santa Clara, Villa >> Clara, Cuba Tel: 53 (42) 281257, 281268, 281693 Fax: 53 (42) 281329 >> Web: http://www.ibp.co.cu E-Mail: info at ibp.co.cu >> >> _______________________________________________ Bioconductor mailing >> list Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M2 B169 > Phone: (206) 667-2793 > > Instituto de Biotecnolog?a de las Plantas > Universidad Central "Marta Abreu" de Las Villas > Carretera a Camajuan? km 5?, Santa Clara, Villa Clara, Cuba > Tel: 53 (42) 281257, 281268, 281693 > Fax: 53 (42) 281329 > Web: http://www.ibp.co.cu > E-Mail: info at ibp.co.cu -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
ADD COMMENT

Login before adding your answer.

Traffic: 697 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6