custom Affymetrix
1
0
Entering edit mode
@vermesan-oana-4044
Last seen 9.6 years ago
Good morning, I started using Bioconductor about a month ago. I'm working with Affymetrix and Agilent data. I have the entire workflow and I know how to use the packages for this type of data. Recently I have found some custom Affymetrix data and I don't know how to process it. I have only an .xls file with the measured genes and a .txt with the clinical information. These two can be found here http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5479. Can someone tell me which packages to use in order to do the background correction, to normalize the data and to do a differential analysis? I also need to do a hierarchical clustering. If someone could help me I would really appreciate that. Thank you, Oana [[alternative HTML version deleted]]
Clustering PROcess Clustering PROcess • 805 views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 11 days ago
EMBL European Molecular Biology Laborat…
Dear Oana, I am sure you have read the experiment description at the URL you kindly sent us, where it says "The final average log2 values are supplied as supplemental files to this GEO series." This indicates that the data producers consider no further background correction or normalize steps to be necessary. For differential expresion, you might have a look, for instance, at the respective chapters in the book http://www.bioconductor.org/pub/biocases which discusses the other questions (differential analysis and clustering) you asked. Best wishes Wolfgang Vermesan Oana scripsit 27/04/10 08:05: > Good morning, > > I started using Bioconductor about a month ago. I'm working with Affymetrix and Agilent data. I have the entire workflow > and I know how to use the packages for this type of data. Recently I have found some custom Affymetrix data and I don't > know how to process it. I have only an .xls file with the measured genes and a .txt with the clinical information. These two > can be found here http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5479. Can someone tell me which packages > to use in order to do the background correction, to normalize the data and to do a differential analysis? I also need to do > a hierarchical clustering. If someone could help me I would really appreciate that. > > Thank you, > Oana > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
ADD COMMENT
0
Entering edit mode
I want to thank you for the prompt answer. I know that the data from that URL has been preprocessed, it was just an example of custom Affymetrix. I found other custom Affymetrix datasets that have raw data, that I need to preprocess (such as http://www.mdl.dk/Publications_sup7.htm). I also have the book that you mentioned, and the examples are on Affymetrix data and I can't use them on my dataset. I've tried but I get the same error - the object that I use must be an Affybatch. I believe that are some special packages that can be used on custom data and I want to know which one. Thank you again and I look forward to your answer, Oana ________________________________ From: Wolfgang Huber <whuber@embl.de> Cc: bioconductor@stat.math.ethz.ch Sent: Tue, April 27, 2010 2:46:12 PM Subject: Re: [BioC] custom Affymetrix Dear Oana, I am sure you have read the experiment description at the URL you kindly sent us, where it says "The final average log2 values are supplied as supplemental files to this GEO series." This indicates that the data producers consider no further background correction or normalize steps to be necessary. For differential expresion, you might have a look, for instance, at the respective chapters in the book http://www.bioconductor.org/pub/biocases which discusses the other questions (differential analysis and clustering) you asked. Best wishes Wolfgang Vermesan Oana scripsit 27/04/10 08:05: > Good morning, > > I started using Bioconductor about a month ago. I'm working with Affymetrix and Agilent data. I have the entire workflow > and I know how to use the packages for this type of data. Recently I have found some custom Affymetrix data and I don't > know how to process it. I have only an .xls file with the measured genes and a .txt with the clinical information. These two > can be found here http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5479. Can someone tell me which packages > to use in order to do the background correction, to normalize the data and to do a differential analysis? I also need to do > a hierarchical clustering. If someone could help me I would really appreciate that. > Thank you, > Oana > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Dear Oana the basic container in Bioconductor for expression data (where you would also put your custom data) is the ExpressionSet. This is essentially a matrix of expression values, plus a table with annotations for the rows (features) -- this can e.g. be target names -- and a table with annotations for the columns (samples) -- this can e.g. be patient IDs. The documentation mentioned below explains how to do data analysis steps such as differential expressed genes or clustering on data in an ExpressionSet. AffyBatch is a container for raw Affymetrix data. It happens to internally share the same structure as an ExpressionSet, but except for some special cases (e.g. technical quality assessement), you don't do high-level data analysis on that. The function "rma" in the affy package is the most popular way of turning an AffyBatch into an ExpressionSet. To load a dataset from GEO, you can use library("GEOquery") x = getGEO("GSE5479") This gave me a list of 4 ExpressionSet objects, each with 3072 rows and 255 columns. I didn't find it immediately obvious how to interpret those with the experiment description on http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5479 which mentions that 404 samples were measured in duplicate. Ideally, such a call to getGEO (or, similarly to the "ArrayExpress" function in the equinymous package will directly produce an ExpressionSet with you can continue your analysis. In reality, however, often some furter data reshuffling and/or normalisation is necessary, and the details depend on the dataset. The curators of GEO, or the submitters of the data, are the best people to ask for clarification here. Perhaps downloading the data from http://www.mdl.dk/Publications_sup7.htm and parsing it yourself into an ExpressionSet is an another approach - I haven't tried that. The vignette "Biobase - An introduction to Biobase and ExpressionSets" explains how to "Build an ExpressionSet From Scratch", given the various components. Hope this helps. Best wishes Wolfgang Vermesan Oana ha scritto: > I want to thank you for the prompt answer. > > I know that the data from that URL has been preprocessed, it was just an example of custom Affymetrix. I found other > custom Affymetrix datasets that have raw data, that I need to preprocess (such as http://www.mdl.dk/Publications_sup7.htm). > I also have the book that you mentioned, and the examples are on Affymetrix data and I can't use them on my dataset. I've tried > but I get the same error - the object that I use must be an Affybatch. > I believe that are some special packages that can be used on custom data and I want to know which one. > > > Thank you again and I look forward to your answer, > Oana > > > > > ________________________________ > From: Wolfgang Huber <whuber at="" embl.de=""> > To: Vermesan Oana <oana.vermesan at="" yahoo.com=""> > Cc: bioconductor at stat.math.ethz.ch > Sent: Tue, April 27, 2010 2:46:12 PM > Subject: Re: [BioC] custom Affymetrix > > > Dear Oana, > > I am sure you have read the experiment description at the URL you kindly sent us, where it says "The final average log2 values are supplied as supplemental files to this GEO series." This indicates that the data producers consider no further background correction or normalize steps to be necessary. > > For differential expresion, you might have a look, for instance, at the respective chapters in the book http://www.bioconductor.org/pub/biocases > which discusses the other questions (differential analysis and clustering) you asked. > > Best wishes > Wolfgang > > > Vermesan Oana scripsit 27/04/10 08:05: >> Good morning, >> >> I started using Bioconductor about a month ago. I'm working with Affymetrix and Agilent data. I have the entire workflow >> and I know how to use the packages for this type of data. Recently I have found some custom Affymetrix data and I don't >> know how to process it. I have only an .xls file with the measured genes and a .txt with the clinical information. These two >> can be found here http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5479. Can someone tell me which packages >> to use in order to do the background correction, to normalize the data and to do a differential analysis? I also need to do >> a hierarchical clustering. If someone could help me I would really appreciate that. >> Thank you, >> Oana >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
ADD REPLY

Login before adding your answer.

Traffic: 788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6