comparing HG-U219 data to HG-U133 data from public databases

0

Entering edit mode

Andreas Heider ▴ 340

@andreas-heider-4538

Last seen 10.9 years ago

Hi bioconductor users, I am a PhD student working on stem cells from human umbilical cord blood. I got data from our sorted cells relying on the HG-U219 platform from Affymetrix. What I want to do is to compare our data to data from public databases such as GeneExpressionOmnibus or ArrayExpress, but unfortunately these data are all based on the HG-U133 platform. So my question is: What would be the simplest approach to achieve this? What would be the best approach to achieve this? I'm thinking of 2 scenarios: First scenario: 1. Get an expression table for both datasets (U219 and U133) 2. label both datasets with "comparable" identifiers, eg UniGene Id or GeneBank A# 3. get new expression tables with only entries present in both datasets Second scenario: 1. get raw data of both datasets 2. import CEL files from U219 and convert it to U133 format 3. combine both datasets into 1 4. do normalization of all data together Please tell me it is possible, and if then how to do it. I'm pretty sure it is possible, but I'm a R and BioC novice and don't know every function/package. Thanks in advance, Andreas PS: Is it problematic that there are only perfect match probes on the HG-U219 and no MMs? [[alternative HTML version deleted]]

Normalization convert ArrayExpress Normalization convert ArrayExpress • 3.0k views

ADD COMMENT • link updated 14.9 years ago by Sole Acha, Xavi ▴ 110 • written 14.9 years ago by Andreas Heider ▴ 340

0

Entering edit mode

Sole Acha, Xavi ▴ 110

@sole-acha-xavi-4144

Last seen 11.4 years ago

Dear Andreas, a possible pipeline (though not necessarily the best one -- suggestions welcome) to compare HG-U133 Plus2 and HG-U219 data is: 1) Download CEL files and normalize both datasets separately using RMA, so you don't use Plus2's MM. I believe it is not straightforward to convert Plus2 CEL files to U219 or viceversa. 2) For both array types, keep only probesets with the best match (the most comparable between the two array types). You can find this information in Affymetrix's website: http://www.affymetrix.com/support/downloads/comparisons/U133PlusVsU219 _BestMatch.zip In this file you have the correspondence between Plus2 and U219 probesets. You can build then a complete matrix with all the Plus2 and U219 hybridizations and only the common probesets, for which you will have to create a new ID, since probeset ID's are different for Plus2 and U219 arrays. 3) Once you have a common set of probesets for both datasets, you can re-normalize all the arrays altogether applying a quantile normalization (see package limma). Although this approach may work for you, please notice that even after applying quantile normalization in step 3 you may have a strong batch effect in your data, which you must be aware of. Hope this helps, Xavi. ------ Xavier Sol? Acha Unitat de Biomarcadors i Susceptibilitat Unit of Biomarkers and Susceptibility Institut Catal? d'Oncologia // Catalan Institute of Oncology Gran Via de L'Hospitalet 199-203 08908 L'Hospitalet de Llobregat, Barcelona, Spain. Phone: +34 93 260 71 22 / +34 93 260 71 86 (ext. 7122) Fax: +34 93 260 71 88 E-mail: x.sole (at) iconcologia.net -----Mensaje original----- De: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] En nombre de Andreas Heider Enviado el: jueves, 10 de marzo de 2011 9:29 Para: bioconductor at r-project.org Asunto: [BioC] comparing HG-U219 data to HG-U133 data from public databases Hi bioconductor users, I am a PhD student working on stem cells from human umbilical cord blood. I got data from our sorted cells relying on the HG-U219 platform from Affymetrix. What I want to do is to compare our data to data from public databases such as GeneExpressionOmnibus or ArrayExpress, but unfortunately these data are all based on the HG-U133 platform. So my question is: What would be the simplest approach to achieve this? What would be the best approach to achieve this? I'm thinking of 2 scenarios: First scenario: 1. Get an expression table for both datasets (U219 and U133) 2. label both datasets with "comparable" identifiers, eg UniGene Id or GeneBank A# 3. get new expression tables with only entries present in both datasets Second scenario: 1. get raw data of both datasets 2. import CEL files from U219 and convert it to U133 format 3. combine both datasets into 1 4. do normalization of all data together Please tell me it is possible, and if then how to do it. I'm pretty sure it is possible, but I'm a R and BioC novice and don't know every function/package. Thanks in advance, Andreas PS: Is it problematic that there are only perfect match probes on the HG-U219 and no MMs? [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 14.9 years ago Sole Acha, Xavi ▴ 110

0

Entering edit mode

Dear Xavi, thank you for your fast answer! I agree with your proposed procedure. However, I don't know how to get to a complete matrix with both datasets (I mean with R-commands). And is it necessary to have a complete AffyBatch object of this combined matrix, or is it sufficient to just have "a" matrix? On the other hand, can't I take raw CEL extracts of both chips, keep only "best match" probesets of the raw data, combine them into one matrix and then do the normalization? Thanks, Andreas 2011/3/10 Sole Acha, Xavi <x.sole@iconcologia.net> > Dear Andreas, > > a possible pipeline (though not necessarily the best one -- suggestions > welcome) to compare HG-U133 Plus2 and HG-U219 data is: > > 1) Download CEL files and normalize both datasets separately using RMA, so > you don't use Plus2's MM. I believe it is not straightforward to convert > Plus2 CEL files to U219 or viceversa. > > 2) For both array types, keep only probesets with the best match (the most > comparable between the two array types). You can find this information in > Affymetrix's website: > > > http://www.affymetrix.com/support/downloads/comparisons/U133PlusVsU2 19_BestMatch.zip > > In this file you have the correspondence between Plus2 and U219 probesets. > You can build then a complete matrix with all the Plus2 and U219 > hybridizations and only the common probesets, for which you will have to > create a new ID, since probeset ID's are different for Plus2 and U219 > arrays. > > 3) Once you have a common set of probesets for both datasets, you can > re-normalize all the arrays altogether applying a quantile normalization > (see package limma). > > Although this approach may work for you, please notice that even after > applying quantile normalization in step 3 you may have a strong batch effect > in your data, which you must be aware of. > > Hope this helps, > > Xavi. > > ------ > Xavier Solé Acha > Unitat de Biomarcadors i Susceptibilitat > Unit of Biomarkers and Susceptibility > Institut Català d'Oncologia // Catalan Institute of Oncology > Gran Via de L'Hospitalet 199-203 > 08908 L'Hospitalet de Llobregat, Barcelona, Spain. > Phone: +34 93 260 71 22 / +34 93 260 71 86 (ext. 7122) > Fax: +34 93 260 71 88 > E-mail: x.sole (at) iconcologia.net > > -----Mensaje original----- > De: bioconductor-bounces@r-project.org [mailto: > bioconductor-bounces@r-project.org] En nombre de Andreas Heider > Enviado el: jueves, 10 de marzo de 2011 9:29 > Para: bioconductor@r-project.org > Asunto: [BioC] comparing HG-U219 data to HG-U133 data from public databases > > Hi bioconductor users, > I am a PhD student working on stem cells from human umbilical cord blood. I > got data from our sorted cells relying on the HG-U219 platform from > Affymetrix. What I want to do is to compare our data to data from public > databases such as GeneExpressionOmnibus or ArrayExpress, but unfortunately > these data are all based on the HG-U133 platform. > > So my question is: What would be the simplest approach to achieve this? > What > would be the best approach to achieve this? > > I'm thinking of 2 scenarios: > First scenario: > 1. Get an expression table for both datasets (U219 and U133) > 2. label both datasets with "comparable" identifiers, eg UniGene Id or > GeneBank A# > 3. get new expression tables with only entries present in both datasets > > Second scenario: > 1. get raw data of both datasets > 2. import CEL files from U219 and convert it to U133 format > 3. combine both datasets into 1 > 4. do normalization of all data together > > Please tell me it is possible, and if then how to do it. I'm pretty sure it > is possible, but I'm a R and BioC novice and don't know every > function/package. > > Thanks in advance, Andreas > > PS: Is it problematic that there are only perfect match probes on the > HG-U219 and no MMs? > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 14.9 years ago Andreas Heider ▴ 340

Login before adding your answer.