Combining HGU133A & HGU133B data

0

Entering edit mode

Adaikalavan RAMASAMY ▴ 80

@adaikalavan-ramasamy-437

Last seen 11.3 years ago

Dear all, I have been asked to analyze the data where samples were hybridized on both HGU133A and HGU133B Affymetrix chips. One option is to analyze the A and B chips seperately but this is not desirable. The other option is to combine both (using something akin to "rbind") to combine these data. I think it is better to combine the results after rma as different background correction needs be applied. This method however does have its problems with the genes redundant between A and B chip (there are 2000+ genes that overlap both chips). Can anyone suggest what is the best way to deal with this problem ? Does anyone have any experience or seen publications combining data from two different array formats. Thank you. -- Adaikalavan Ramasamy ramasamyA@gis.a-star.edu.sg Research Assistant http://giscompute.gis.a-star.edu.sg/~adai Microarray & Expression Genomics Tel: 65-6478 8043 Information & Mathematical Sciences Fax: 65 6478 9058 Genome Institute of Singapore http://www.gis.a-star.edu.sg/

hgu133a hgu133b hgu133a hgu133b • 2.6k views

ADD COMMENT • link updated 22.3 years ago by Crispin Miller ★ 1.1k • written 22.3 years ago by Adaikalavan RAMASAMY ▴ 80

0

Entering edit mode

Laurent Gautier ★ 2.3k

@laurent-gautier-29

Last seen 11.3 years ago

On Mon, Sep 15, 2003 at 06:26:15PM +0800, Adaikalavan RAMASAMY wrote: > Dear all, > > I have been asked to analyze the data where samples were hybridized on > both HGU133A and HGU133B Affymetrix chips. One option is to analyze the > A and B chips seperately but this is not desirable. > > The other option is to combine both (using something akin to "rbind") to > combine these data. I think it is better to combine the results after > rma as different background correction needs be applied. > > This method however does have its problems with the genes redundant > between A and B chip (there are 2000+ genes that overlap both chips). > > Can anyone suggest what is the best way to deal with this problem ? Does > anyone have any experience or seen publications combining data from two > different array formats. > > Thank you. > > -- > Adaikalavan Ramasamy ramasamyA@gis.a-star.edu.sg > Research Assistant > http://giscompute.gis.a-star.edu.sg/~adai > Microarray & Expression Genomics Tel: 65-6478 8043 > Information & Mathematical Sciences Fax: 65 6478 9058 > Genome Institute of Singapore http://www.gis.a-star.edu.sg/ > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor Wolgang Huber and Robert Gentleman have certainly a word to say about that. Did you check the function 'combine' in the package 'matchprobes' (section 'devel') ? L.

ADD COMMENT • link 22.3 years ago Laurent Gautier ★ 2.3k

0

Entering edit mode

Hi On Mon, 15 Sep 2003, Laurent Gautier wrote: > Wolgang Huber and Robert Gentleman have certainly a word to say about > that. Did you check the function 'combine' in the package 'matchprobes' > (section 'devel') ? The combine function in the matchprobes package is useful for combining data from different chip types. The combination is done on the probe-level, before normalization, and it requires that there is an appreciable overlap in probe sequences (as, for example, with hu6800/hgu95av2 or mgu74a/mgu74av2). The combination is based on the INTERSECTION of probes that have the same sequence, and from the point of view of the expression matrix, it corresponds, loosely speaking, to a CBIND. What Adaikalavan is looking for is much simpler: something that works on the UNION of all probes/genes on HGU133A and HGU133B, and from the point of view of the expression matrix corresponds to an RBIND. I am not aware of a simpler method for doing this than calling new("exprSet", ....) with the arguments patched together from the individual two HGU133A and HGU133B exprSets. Best regards Wolfgang ------------------------------------- Wolfgang Huber Division of Molecular Genome Analysis German Cancer Research Center Heidelberg, Germany Phone: +49 6221 424709 Fax: +49 6221 42524709 Http: www.dkfz.de/mga/whuber

ADD REPLY • link 22.3 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

On Mon, Sep 15, 2003 at 02:19:01PM +0200, w.huber@dkfz-heidelberg.de wrote: > > Hi > > On Mon, 15 Sep 2003, Laurent Gautier wrote: > > Wolgang Huber and Robert Gentleman have certainly a word to say about > > that. Did you check the function 'combine' in the package 'matchprobes' > > (section 'devel') ? > > The combine function in the matchprobes package is useful for combining > data from different chip types. The combination is done on the > probe-level, before normalization, and it requires that there is an > appreciable overlap in probe sequences (as, for example, with > hu6800/hgu95av2 or mgu74a/mgu74av2). The combination is based on the > INTERSECTION of probes that have the same sequence, and from the point of > view of the expression matrix, it corresponds, loosely speaking, to a > CBIND. > > What Adaikalavan is looking for is much simpler: something that works on > the UNION of all probes/genes on HGU133A and HGU133B, and from the point > of view of the expression matrix corresponds to an RBIND. > > I am not aware of a simpler method for doing this than calling > new("exprSet", ....) with the arguments patched together from the > individual two HGU133A and HGU133B exprSets. > > Best regards > Wolfgang > > ------------------------------------- > Wolfgang Huber > Division of Molecular Genome Analysis > German Cancer Research Center > Heidelberg, Germany > Phone: +49 6221 424709 > Fax: +49 6221 42524709 > Http: www.dkfz.de/mga/whuber > ------------------------------------- > Ooops... sorry for the confusion (I never used combined (...yet)). In this case, the union of expression values is a straightforward 'rbind' as Wolfgang suggests. The probe business is slightly more tricjy because of the cdfenvs. The following scheme should make it (more or less I did not test it): ##abatch.a and abatch.b are the AffyBatch objects abatch.ab <- new("AffyBatch", exprs=rbind(exprs(abatch.a), exprs(abatch.b)), cdfName="cdfenv.ab") ## make a cdfenv for the union-combined-chips cdfenv.ab <- new.env(hash=TRUE) cdfenv.a <- getCdfInfo(abatch.a) for (i in ls(cdfenv.a)) { assign(i, get(i, envir=cdfenv.a), envir=cdfenv.ab) } offset <- nrow(exprs(abatch.a)) cdfenv.b <- getCdfInfo(abatch.b) for (i in ls(cdfenv.b)) { if (exists(i, envir=cdfenv.a)) stop(paste(i, ": id already in use !")) assign(i, get(i, envir=cdfenv.b)+offset, envir=cdfenv.ab) } ## from now, this should be like a regular AffyBatch ## (expect quirks with some methods/functions ## dealing with spatial features of the probes, ex: image) Hopin' it helps, L. -- -------------------------------------------------------------- Laurent Gautier CBS, Building 208, DTU PhD. Student DK-2800 Lyngby,Denmark tel: +45 45 25 24 89 http://www.cbs.dtu.dk/laurent

ADD REPLY • link 22.3 years ago Laurent Gautier ★ 2.3k

0

Entering edit mode

> In this case, the union of expression values is a straightforward 'rbind' > as Wolfgang suggests. The probe business is slightly more tricjy because > of the cdfenvs. The following scheme should make it (more or less I > did not test it): ....(some code)... But while software-technically possible, it may not be the best idea to patch together the data on the probe (AffyBatch) level - the data from the individual arrays will very likely need to be normalized by themselves. Best regards Wolfgang

ADD REPLY • link 22.3 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Crispin Miller ★ 1.1k

@crispin-miller-264

Last seen 11.3 years ago

Hi, A more serious issue is that normalisation (almost certainly) assumes that the average expression level on each chip is the same. This is clearly not the case between A and B chips - and combing each pair of A's and B's for every sample, before normalisation, is almost certainly a bad idea... Normalising the A's and B's separately is probably much more sensible - and this then allows you to use the 2000+ shared probes to see how well your normalisation has worked: their signals are from the same hyb. cocktail so they should produce the same expression levels. If you think about it this way, the repeated probes are a Good Thing(TM) :-) Crispin > -----Original Message----- > From: Adaikalavan RAMASAMY [mailto:ramasamya@gis.a-star.edu.sg] > Sent: 15 September 2003 11:26 > To: bioconductor@stat.math.ethz.ch > Cc: Mark.Reimers@biosci.ki.se > Subject: [BioC] Combining HGU133A & HGU133B data > > > Dear all, > > I have been asked to analyze the data where samples were hybridized on > both HGU133A and HGU133B Affymetrix chips. One option is to > analyze the > A and B chips seperately but this is not desirable. > > The other option is to combine both (using something akin to > "rbind") to > combine these data. I think it is better to combine the results after > rma as different background correction needs be applied. > > This method however does have its problems with the genes redundant > between A and B chip (there are 2000+ genes that overlap both chips). > > Can anyone suggest what is the best way to deal with this > problem ? Does > anyone have any experience or seen publications combining > data from two > different array formats. > > Thank you. > > -- > Adaikalavan Ramasamy ramasamyA@gis.a-star.edu.sg > Research Assistant > http://giscompute.gis.a-star.edu.sg/~adai > Microarray & Expression Genomics Tel: 65-6478 8043 > Information & Mathematical Sciences Fax: 65 6478 9058 > Genome Institute of Singapore > http://www.gis.a-star.edu.sg/ > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > -------------------------------------------------------- This email is confidential and intended solely for the use o...{{dropped}}

ADD COMMENT • link 22.3 years ago Crispin Miller ★ 1.1k

Login before adding your answer.