load and normalize arrays from different platform

0

Entering edit mode

Wendy Qiao ▴ 360

@wendy-qiao-4501

Last seen 10.6 years ago

Hi all, I need to load and normalize CEL files from two different platforms, one platform is *U133AAofAv2 (22944 affyids)* and the other is *HG-U133A_2 (22277 affyids)*. I believe that these two platforms have very similar annotations. When I read all the file together using ReadAffy, I got an error saying, > es.affy<-ReadAffy(filenames=celfile, celfile.path=celpath, phenoData=NULL) Error in read.affybatch(filenames = l$filenames, phenoData = l$phenoData, : Cel file XX does not seem to have the correct dimensions I figure that is because two platform has different cdf. So I tried to change the cdf name for *U133AAofAv2 *using library("affxparser"). The I got the following errors, > convertCel(celfile, celfile.output, newChipType="HG-U133A_2") Error in .unwrapDatHeaderString(header$DatHeader) : Internal error: Failed to extract 'pixelRange' and 'sampleName' from DAT header. They became identical: HG-U133A_2.1sq I am not sure how to get around with this problem? Could anybody helps? Or what would be the best way to normalize two datasets like mine? Thank you very much. Any suggestion is appreciated. Thank you very much, Wendy [[alternative HTML version deleted]]

cdf cdf • 1.7k views

ADD COMMENT • link updated 14.1 years ago by James W. MacDonald 68k • written 14.1 years ago by Wendy Qiao ▴ 360

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 17 hours ago

United States

Hi Wendy, On 2/22/2011 3:13 PM, Wendy Qiao wrote: > Hi all, > > I need to load and normalize CEL files from two different platforms, one > platform is *U133AAofAv2 (22944 affyids)* and the other is *HG- U133A_2 > (22277 affyids)*. I believe that these two platforms have very similar > annotations. They may have similar annotations, but you won't be able to load and normalize together. You are better off normalizing separately, and then if you need to analyze together, you can attempt to subset to the intersecting probesets and then do the analysis. Best, Jim > > When I read all the file together using ReadAffy, I got an error saying, > >> es.affy<-ReadAffy(filenames=celfile, celfile.path=celpath, phenoData=NULL) > Error in read.affybatch(filenames = l$filenames, phenoData = l$phenoData, > : > Cel file XX does not seem to have the correct dimensions > > I figure that is because two platform has different cdf. So I tried to > change the cdf name for *U133AAofAv2 *using library("affxparser"). The I got > the following errors, > >> convertCel(celfile, celfile.output, newChipType="HG-U133A_2") > Error in .unwrapDatHeaderString(header$DatHeader) : > Internal error: Failed to extract 'pixelRange' and 'sampleName' from DAT > header. They became identical: HG-U133A_2.1sq > > I am not sure how to get around with this problem? Could anybody helps? Or > what would be the best way to normalize two datasets like mine? Thank you > very much. Any suggestion is appreciated. > > > Thank you very much, > Wendy > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 14.1 years ago James W. MacDonald 68k

0

Entering edit mode

If you wish to normalize them together you can do what Jim suggested but without normalization, get the two expression matrices, combine them (using common genes only) and then use normalization functions from limma to normalize the two sets together. Regards, Moshe. > Hi Wendy, > > On 2/22/2011 3:13 PM, Wendy Qiao wrote: >> Hi all, >> >> I need to load and normalize CEL files from two different platforms, one >> platform is *U133AAofAv2 (22944 affyids)* and the other is *HG- U133A_2 >> (22277 affyids)*. I believe that these two platforms have very similar >> annotations. > > They may have similar annotations, but you won't be able to load and > normalize together. You are better off normalizing separately, and then > if you need to analyze together, you can attempt to subset to the > intersecting probesets and then do the analysis. > > Best, > > Jim > > >> >> When I read all the file together using ReadAffy, I got an error saying, >> >>> es.affy<-ReadAffy(filenames=celfile, celfile.path=celpath, >>> phenoData=NULL) >> Error in read.affybatch(filenames = l$filenames, phenoData = >> l$phenoData, >> : >> Cel file XX does not seem to have the correct dimensions >> >> I figure that is because two platform has different cdf. So I tried to >> change the cdf name for *U133AAofAv2 *using library("affxparser"). The I >> got >> the following errors, >> >>> convertCel(celfile, celfile.output, newChipType="HG-U133A_2") >> Error in .unwrapDatHeaderString(header$DatHeader) : >> Internal error: Failed to extract 'pixelRange' and 'sampleName' from >> DAT >> header. They became identical: HG-U133A_2.1sq >> >> I am not sure how to get around with this problem? Could anybody helps? >> Or >> what would be the best way to normalize two datasets like mine? Thank >> you >> very much. Any suggestion is appreciated. >> >> >> Thank you very much, >> Wendy >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not > be used for urgent or sensitive issues > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Moshe Olshansky Division of Bioinformatics The Walter & Eliza Hall Institute of Medical Research 1G Royal Parade, Parkville, Vic 3052 e-mail: olshansky at wehi.edu.au tel: (03) 9345 2697 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 14.1 years ago Moshe Olshansky ▴ 260

0

Entering edit mode

Hi Moshe and James, Thank you very much for your suggestions. They make sense. I will do that. Regards, Wendy On 22 February 2011 19:27, Moshe Olshansky <olshansky@wehi.edu.au> wrote: > If you wish to normalize them together you can do what Jim suggested but > without normalization, get the two expression matrices, combine them (using > common genes only) and then use normalization functions from limma to > normalize the two sets together. Regards, Moshe. > Hi Wendy, > On 2/22/2011 > 3:13 PM, Wendy Qiao wrote: >> Hi all, >> I need to load and normalize CEL > files from two different platforms, one >> platform is *U133AAofAv2 (22944 > affyids)* and the other is *HG-U133A_2 >> (22277 affyids)*. I believe that > these two platforms have very similar >> annotations. > They may have > similar annotations, but you won't be able to load and > normalize together. > You are better off normalizing separately, and then > if you need to analyze > together, you can attempt to subset to the > intersecting probesets and then > do the analysis. > Best, > Jim >> When I read all the file together using > ReadAffy, I got an error saying, >>> es.affy<-ReadAffy(filenames=celfile, > celfile.path=celpath, >>> phenoData=NULL) >> Error in > read.affybatch(filenames = l$filenames, phenoData = >> l$phenoData, >> : > >> Cel file XX does not seem to have the correct dimensions >> I figure > that is because two platform has different cdf. So I tried to >> change the > cdf name for *U133AAofAv2 *using library("affxparser"). The I >> got >> the > following errors, >>> convertCel(celfile, celfile.output, > newChipType="HG-U133A_2") >> Error in > .unwrapDatHeaderString(header$DatHeader) : >> Internal error: Failed to > extract 'pixelRange' and 'sampleName' from >> DAT >> header. They became > identical: HG-U133A_2.1sq >> I am not sure how to get around with this > problem? Could anybody helps? >> Or >> what would be the best way to > normalize two datasets like mine? Thank >> you >> very much. Any suggestion > is appreciated. >> Thank you very much, >> Wendy [[alternative HTML version > deleted]] >> _______________________________________________ >> Bioconductor > mailing list >> Bioconductor@r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > -- > > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of > Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > > Ann Arbor MI 48109-5618 > 734-615-7826 > > ********************************************************** > Electronic Mail > is not secure, may not be read every day, and should not > be used for > urgent or sensitive issues > _______________________________________________ > > Bioconductor mailing list > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor Moshe > Olshansky Division of Bioinformatics The Walter & Eliza Hall Institute of > Medical Research 1G Royal Parade, Parkville, Vic 3052 e-mail: > olshansky@wehi.edu.au tel: (03) 9345 2697 > ______________________________________________________________________ The > information in this email is confidential and intended solely for the > addressee. You must not disclose, forward, print or use it without the > permission of the sender. > ______________________________________________________________________ [[alternative HTML version deleted]]

ADD REPLY • link 14.1 years ago Wendy Qiao ▴ 360

0

Entering edit mode

Hi Wendy, Just one more comment: since you are using two different platforms you should expect batch effect, so it is important to normalize the two batches together (and there are several ways of doing this). Even then check your normalized expression values to see whether you still have batch effect. Moshe. > Hi Moshe and James, > > Thank you very much for your suggestions. They make sense. I will do that. > > Regards, > Wendy > > > > On 22 February 2011 19:27, Moshe Olshansky <olshansky at="" wehi.edu.au=""> wrote: > >> If you wish to normalize them together you can do what Jim suggested but >> without normalization, get the two expression matrices, combine them >> (using >> common genes only) and then use normalization functions from limma to >> normalize the two sets together. Regards, Moshe. > Hi Wendy, > On >> 2/22/2011 >> 3:13 PM, Wendy Qiao wrote: >> Hi all, >> I need to load and normalize >> CEL >> files from two different platforms, one >> platform is *U133AAofAv2 >> (22944 >> affyids)* and the other is *HG-U133A_2 >> (22277 affyids)*. I believe >> that >> these two platforms have very similar >> annotations. > They may have >> similar annotations, but you won't be able to load and > normalize >> together. >> You are better off normalizing separately, and then > if you need to >> analyze >> together, you can attempt to subset to the > intersecting probesets and >> then >> do the analysis. > Best, > Jim >> When I read all the file together >> using >> ReadAffy, I got an error saying, >>> >> es.affy<-ReadAffy(filenames=celfile, >> celfile.path=celpath, >>> phenoData=NULL) >> Error in >> read.affybatch(filenames = l$filenames, phenoData = >> l$phenoData, >> >> : >> >> Cel file XX does not seem to have the correct dimensions >> I >> figure >> that is because two platform has different cdf. So I tried to >> change >> the >> cdf name for *U133AAofAv2 *using library("affxparser"). The I >> got >> >> the >> following errors, >>> convertCel(celfile, celfile.output, >> newChipType="HG-U133A_2") >> Error in >> .unwrapDatHeaderString(header$DatHeader) : >> Internal error: Failed >> to >> extract 'pixelRange' and 'sampleName' from >> DAT >> header. They >> became >> identical: HG-U133A_2.1sq >> I am not sure how to get around with >> this >> problem? Could anybody helps? >> Or >> what would be the best way to >> normalize two datasets like mine? Thank >> you >> very much. Any >> suggestion >> is appreciated. >> Thank you very much, >> Wendy [[alternative HTML >> version >> deleted]] >> _______________________________________________ >> >> Bioconductor >> mailing list >> Bioconductor at r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the >> archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> -- >> > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University >> of >> Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine >> St. >> > Ann Arbor MI 48109-5618 > 734-615-7826 > >> ********************************************************** > Electronic >> Mail >> is not secure, may not be read every day, and should not > be used for >> urgent or sensitive issues > >> _______________________________________________ >> > Bioconductor mailing list > Bioconductor at r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the >> archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> Moshe >> Olshansky Division of Bioinformatics The Walter & Eliza Hall Institute >> of >> Medical Research 1G Royal Parade, Parkville, Vic 3052 e-mail: >> olshansky at wehi.edu.au tel: (03) 9345 2697 >> ______________________________________________________________________ >> The >> information in this email is confidential and intended solely for the >> addressee. You must not disclose, forward, print or use it without the >> permission of the sender. >> ______________________________________________________________________ > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 14.1 years ago Moshe Olshansky ▴ 260

0

Entering edit mode

Just a follow up: The 'HG-U133A_2' chip type is a *physically different* array than the 'HT_HG-U133A' chip type (aliased 'U133AAofAv2' during its early-access stage). For instance, the former has 732x732 probes whereas the latter has 744x744 probes, cf. http://www.aroma-project.org/chipTypes/ In other words, the problem is *not* just about different chip type *aliases* (as it would have been if you had CEL files labelled 'HT_HG-U133A' and ''U133AAofAv2' when you theoretically can treat all to be of the same chip type). You need to proceed as others have already suggested in this thread. /Henrik On Tue, Feb 22, 2011 at 5:02 PM, Moshe Olshansky <olshansky at="" wehi.edu.au=""> wrote: > Hi Wendy, > > Just one more comment: since you are using two different platforms you > should expect batch effect, so it is important to normalize the two > batches together (and there are several ways of doing this). Even then > check your normalized expression values to see whether you still have > batch effect. > > Moshe. > >> Hi Moshe and James, >> >> Thank you very much for your suggestions. They make sense. I will do that. >> >> Regards, >> Wendy >> >> >> >> On 22 February 2011 19:27, Moshe Olshansky <olshansky at="" wehi.edu.au=""> wrote: >> >>> If you wish to normalize them together you can do what Jim suggested but >>> without normalization, get the two expression matrices, combine them >>> (using >>> common genes only) and then use normalization functions from limma to >>> normalize the two sets together. Regards, Moshe. > Hi Wendy, > On >>> 2/22/2011 >>> 3:13 PM, Wendy Qiao wrote: >> Hi all, >> I need to load and normalize >>> CEL >>> files from two different platforms, one >> platform is *U133AAofAv2 >>> (22944 >>> affyids)* and the other is *HG-U133A_2 >> (22277 affyids)*. I believe >>> that >>> these two platforms have very similar >> annotations. > They may have >>> similar annotations, but you won't be able to load and > normalize >>> together. >>> You are better off normalizing separately, and then > if you need to >>> analyze >>> together, you can attempt to subset to the > intersecting probesets and >>> then >>> do the analysis. > Best, > Jim >> When I read all the file together >>> using >>> ReadAffy, I got an error saying, >>> >>> es.affy<-ReadAffy(filenames=celfile, >>> celfile.path=celpath, >>> phenoData=NULL) >> Error in >>> read.affybatch(filenames = l$filenames, phenoData = >> l$phenoData, >> >>> : >>> >> ? ?Cel file XX does not seem to have the correct dimensions >> I >>> figure >>> that is because two platform has different cdf. So I tried to >> change >>> the >>> cdf name for *U133AAofAv2 *using library("affxparser"). The I >> got >> >>> the >>> following errors, >>> convertCel(celfile, celfile.output, >>> newChipType="HG-U133A_2") >> Error in >>> .unwrapDatHeaderString(header$DatHeader) : >> ? ?Internal error: Failed >>> to >>> extract 'pixelRange' and 'sampleName' from >> DAT >> header. ?They >>> became >>> identical: ? HG-U133A_2.1sq ?>> I am not sure how to get around with >>> this >>> problem? Could anybody helps? >> Or >> what would be the best way to >>> normalize two datasets like mine? Thank >> you >> very much. Any >>> suggestion >>> is appreciated. >> Thank you very much, >> Wendy [[alternative HTML >>> version >>> deleted]] >> _______________________________________________ >> >>> Bioconductor >>> mailing list >> Bioconductor at r-project.org >> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the >>> archives: >>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> -- >>> > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University >>> of >>> Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine >>> St. >>> > Ann Arbor MI 48109-5618 > 734-615-7826 > >>> ********************************************************** > Electronic >>> Mail >>> is not secure, may not be read every day, and should not > be used for >>> urgent or sensitive issues > >>> _______________________________________________ >>> > Bioconductor mailing list > Bioconductor at r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the >>> archives: >>> > http://news.gmane.org/gmane.science.biology.informatics.conductor >>> Moshe >>> Olshansky Division of Bioinformatics The Walter & Eliza Hall Institute >>> of >>> Medical Research 1G Royal Parade, Parkville, Vic 3052 e-mail: >>> olshansky at wehi.edu.au tel: (03) 9345 2697 >>> ______________________________________________________________________ >>> The >>> information in this email is confidential and intended solely for the >>> addressee. You must not disclose, forward, print or use it without the >>> permission of the sender. >>> ______________________________________________________________________ >> > > > > ______________________________________________________________________ > The information in this email is confidential and intend...{{dropped:4}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 14.1 years ago Henrik Bengtsson ★ 2.4k

Login before adding your answer.