MGU74A and MGU74Av2

0

Entering edit mode

Matthew Hobbs ▴ 80

@matthew-hobbs-298

Last seen 9.6 years ago

Hi, I have some cel files from experiments using the Affy MGU74A chip and some from experiments using the MGU74Av2 chip. Am I right in thinking that MGU74A and MGU74Av2 are actually the same chip described differently (because some probesets were originally designed wrongly)? I wish to treat all this data together. Can I use theMGU74Av2 CDF to make a single Affybatch object containing both sorts of data? If not how should I proceed? Thanks for any help. -- ---------------------------------------------------------------------- Matthew Hobbs Garvan Institute of Medical Research 384 Victoria St Ph : (02) 9295 8327 Darlinghurst http://www.garvan.org.au email: m.hobbs@garvan.org.au

mgu74a mgu74av2 cdf affy mgu74a mgu74av2 cdf affy • 1.6k views

ADD COMMENT • link updated 20.8 years ago by Susan G. Hilsenbeck ▴ 10 • written 20.8 years ago by Matthew Hobbs ▴ 80

0

Entering edit mode

Rafael A. Irizarry ★ 2.3k

@rafael-a-irizarry-205

Last seen 9.6 years ago

On Fri, 20 Jun 2003, Matthew Hobbs wrote: > Hi, > > I have some cel files from experiments using the Affy MGU74A chip and some > from experiments using the MGU74Av2 chip. Am I right in thinking that > MGU74A and MGU74Av2 are actually the same chip described differently > (because some probesets were originally designed wrongly)? my understanding is MGU74A had probes designed wrong. v2 is the fix. > > I wish to treat all this data together. Can I use theMGU74Av2 CDF to make a > single Affybatch object containing both sorts of data? If not how should I > proceed? > for sure, you dont want to use the probes that were sequenced wrong. affymetrix can tell you which these are. im cc-ing tom cappola and leslie cope who have worked closely with data with this problems. > Thanks for any help. > > -- > ---------------------------------------------------------------------- > Matthew Hobbs > > Garvan Institute of Medical Research > 384 Victoria St Ph : (02) 9295 8327 > Darlinghurst > http://www.garvan.org.au email: m.hobbs@garvan.org.au > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 20.8 years ago Rafael A. Irizarry ★ 2.3k

0

Entering edit mode

Laurent Gautier ★ 2.3k

@laurent-gautier-29

Last seen 9.6 years ago

On Fri, Jun 20, 2003 at 02:21:51PM +1000, Matthew Hobbs wrote: > Hi, > > I have some cel files from experiments using the Affy MGU74A chip and some > from experiments using the MGU74Av2 chip. Am I right in thinking that > MGU74A and MGU74Av2 are actually the same chip described differently > (because some probesets were originally designed wrongly)? I believe that v2 stands for "version 2" which means some improvement was thrown in. > > I wish to treat all this data together. Can I use theMGU74Av2 CDF to make a > single Affybatch object containing both sorts of data? If not how should I > proceed? If the differences between those two guys and the ones between U95A and U95v2 are comparable, a simple way should be to have an AffyBatch which is the "intersection" between the probe sets in both chips. To achieve this, you will need the probes sequences files (should be available at www.affymetrix.com) and the .CDF files (or corresponding 'cdfenvs'). Using these four elements, you will craft a cdfenv that is the intersection of the two others (the functions xy2indices and indices2xy should be handy to shuttle from the x/y coordinates in the sequence files to the indices (NOTE: add 1 to the x/y in the sequence file. The indexing starts at 0 in Affymetrix files !!!!!)). If you are lucky, you won't have to shuffle the values in the 'exprs' matrix, but you'd better be prepared to do it... Hopin' it helps, L.

ADD COMMENT • link 20.8 years ago Laurent Gautier ★ 2.3k

0

Entering edit mode

Our experience with MGU74A and Av2 chips: As I recall, the MGU74A and Av2 have a different number of probe sets! Also, while there is substantial overlap in affy IDs: 1. there are probe set names in A that are not in Av2 (expected), and 2. there are probe set names in A2 that are not in A (NOT expected). We preprocessed the A and Av2 chips separately, because there was some confusion in RMA about which CDF environment to use for the A and Av2 chips. (I don't recall the details now, and it was never clear whether some biologist had edited the CEL file to change the name of the CDF file or not.) In any event, BE CAREFUL when combining v1 and v2 chips! dave On Thu, 2003-06-19 at 21:46, Laurent Gautier wrote: > On Fri, Jun 20, 2003 at 02:21:51PM +1000, Matthew Hobbs wrote: > > Hi, > > > > I have some cel files from experiments using the Affy MGU74A chip and some > > from experiments using the MGU74Av2 chip. Am I right in thinking that > > MGU74A and MGU74Av2 are actually the same chip described differently > > (because some probesets were originally designed wrongly)? > > I believe that v2 stands for "version 2" which means some improvement > was thrown in. > > > > > I wish to treat all this data together. Can I use theMGU74Av2 CDF to make a > > single Affybatch object containing both sorts of data? If not how should I > > proceed? > > If the differences between those two guys and the ones between U95A and U95v2 > are comparable, a simple way should be to have an AffyBatch which is the > "intersection" between the probe sets in both chips. To achieve this, > you will need the probes sequences files (should be available at > www.affymetrix.com) and the .CDF files (or corresponding 'cdfenvs'). > Using these four elements, you will craft a cdfenv that is the intersection > of the two others (the functions xy2indices and indices2xy should be handy > to shuttle from the x/y coordinates in the sequence files to the indices > (NOTE: add 1 to the x/y in the sequence file. The indexing starts at 0 > in Affymetrix files !!!!!)). > If you are lucky, you won't have to shuffle the values in the 'exprs' > matrix, but you'd better be prepared to do it... > > > Hopin' it helps, > > > L. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- David O. Nelson <daven@llnl.gov> LLNL

ADD REPLY • link 20.8 years ago David O. Nelson ▴ 60

0

Entering edit mode

David: Can you tell us how you dealt with the masked genes in MG 74A with rma? Thanks, Yongde Bao At 10:03 AM 6/20/2003 -0700, David O. Nelson wrote: >Our experience with MGU74A and Av2 chips: > >As I recall, the MGU74A and Av2 have a different number of probe sets! >Also, while there is substantial overlap in affy IDs: > >1. there are probe set names in A that are not in Av2 (expected), and >2. there are probe set names in A2 that are not in A (NOT expected). > >We preprocessed the A and Av2 chips separately, because there was some >confusion in RMA about which CDF environment to use for the A and Av2 >chips. > >(I don't recall the details now, and it was never clear whether some >biologist had edited the CEL file to change the name of the CDF file or >not.) > >In any event, BE CAREFUL when combining v1 and v2 chips! > >dave > >On Thu, 2003-06-19 at 21:46, Laurent Gautier wrote: > > On Fri, Jun 20, 2003 at 02:21:51PM +1000, Matthew Hobbs wrote: > > > Hi, > > > > > > I have some cel files from experiments using the Affy MGU74A chip and > some > > > from experiments using the MGU74Av2 chip. Am I right in thinking that > > > MGU74A and MGU74Av2 are actually the same chip described differently > > > (because some probesets were originally designed wrongly)? > > > > I believe that v2 stands for "version 2" which means some improvement > > was thrown in. > > > > > > > > I wish to treat all this data together. Can I use theMGU74Av2 CDF to > make a > > > single Affybatch object containing both sorts of data? If not how > should I > > > proceed? > > > > If the differences between those two guys and the ones between U95A and > U95v2 > > are comparable, a simple way should be to have an AffyBatch which is the > > "intersection" between the probe sets in both chips. To achieve this, > > you will need the probes sequences files (should be available at > > www.affymetrix.com) and the .CDF files (or corresponding 'cdfenvs'). > > Using these four elements, you will craft a cdfenv that is the intersection > > of the two others (the functions xy2indices and indices2xy should be handy > > to shuttle from the x/y coordinates in the sequence files to the indices > > (NOTE: add 1 to the x/y in the sequence file. The indexing starts at 0 > > in Affymetrix files !!!!!)). > > If you are lucky, you won't have to shuffle the values in the 'exprs' > > matrix, but you'd better be prepared to do it... > > > > > > Hopin' it helps, > > > > > > L. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >-- >David O. Nelson <daven@llnl.gov> >LLNL > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor Yongde Bao, Ph.D Biomolecular Research Facility Department of Microbiology University of Virginia School of Medicine Charlottesville, VA 22908 E-mail: yb8d@virginia.edu Voice mail: 434-982-2551, 434-924-2553 FAX: 434-982-2514

ADD REPLY • link 20.8 years ago Yongde Bao ▴ 170

0

Entering edit mode

On Fri, 2003-06-20 at 11:24, Yongde Bao wrote: > David: > > Can you tell us how you dealt with the masked genes in MG 74A with rma? > I didn't. As I understand RMA (please correct me if I'm wrong here), the default approach is to use Ben Bolstad's quantile normalization followed by a robust fit of an additive model (y ~ array + probe) to each probe set separately. So, if you confine your attention to one chip type for your arrays, the normalization should go thru OK, and the fact that a particular probe set is spectacularly bad in predicting expression levels doesn't affect the expression estimates for other probe sets. Is that line of reasoning bogus? dave > Thanks, Yongde Bao -- David O. Nelson <daven@llnl.gov> LLNL

ADD REPLY • link 20.8 years ago David O. Nelson ▴ 60

0

Entering edit mode

You have several options. 1. treat the MGU74a and MGU74av2 seperately, then try to figure out a way to combine them. 2. Look at only the common probesets but in this case you will be thowing away about 2500-3000 probesets (out of about 12600 total probesets) Option 1 is doable right now. I will have an approach for option 2 in a couple of hours (a local colleague has been bothering me about this for awhile). thanks, Ben On Fri, 2003-06-20 at 12:47, David O. Nelson wrote: > On Fri, 2003-06-20 at 11:24, Yongde Bao wrote: > > David: > > > > Can you tell us how you dealt with the masked genes in MG 74A with rma? > > > > I didn't. As I understand RMA (please correct me if I'm wrong here), the > default approach is to use Ben Bolstad's quantile normalization followed > by a robust fit of an additive model (y ~ array + probe) to each probe > set separately. > > So, if you confine your attention to one chip type for your arrays, the > normalization should go thru OK, and the fact that a particular probe > set is spectacularly bad in predicting expression levels doesn't affect > the expression estimates for other probe sets. > > Is that line of reasoning bogus? > > dave > > > > Thanks, Yongde Bao -- Ben Bolstad <bolstad@stat.berkeley.edu>

ADD REPLY • link 20.8 years ago Ben Bolstad ★ 1.1k

0

Entering edit mode

Thanks to everyone for the helpful responses. On Sat, 21 Jun 2003 06:10 am, Ben Bolstad wrote: > You have several options. I will probably look into both. > 1. treat the MGU74a and MGU74av2 seperately, then try to figure out a > way to combine them. > 2. Look at only the common probesets but in this case you will be > thowing away about 2500-3000 probesets (out of about 12600 total > probesets) > > Option 1 is doable right now. OK then - I'd really appreciate some guidance with option 1. (I am still fairly new to BioConductor and a pretty wobbly user of R.) >> I will have an approach for option 2 in a couple of hours (a local > colleague has been bothering me about this for awhile). And I look forward to seeing this too! Thanks. -- ---------------------------------------------------------------------- Matthew Hobbs Garvan Institute of Medical Research 384 Victoria St Ph : (02) 9295 8327 Darlinghurst http://www.garvan.org.au email: m.hobbs@garvan.org.au

ADD REPLY • link 20.8 years ago Matthew Hobbs ▴ 80

0

Entering edit mode

On Mon, Jun 23, 2003 at 10:17:37AM +1000, Matthew Hobbs wrote: > Thanks to everyone for the helpful responses. > > On Sat, 21 Jun 2003 06:10 am, Ben Bolstad wrote: > > You have several options. > > I will probably look into both. > > > 1. treat the MGU74a and MGU74av2 seperately, then try to figure out a > > way to combine them. > > 2. Look at only the common probesets but in this case you will be > > thowing away about 2500-3000 probesets (out of about 12600 total > > probesets) > > > > Option 1 is doable right now. > > OK then - I'd really appreciate some guidance with option 1. (I am still > fairly new to BioConductor and a pretty wobbly user of R.) here it goes: abatch.74a and abatch.74av2 being your AffyBatches, you merge them in one AffyBatch to do normalization, then split this AffyBatch in two again (this will work because the chips are of the same size and because most of the probes are in common). (hint: check split.AffyBatch and merge.AffyBatch in the doc) then something like: eset.47a <- computeExprSet(abatch.74a, <insert your="" favorite="" methods="" here="">) ...leaving you with eset.74a and eset.74av2 Now thre is only to merge the two in one "exprSet". Here is a rough (non-tested) function to do it: merge.exprSet <- function(x, y) { g.x <- geneNames(x) g.y <- geneNames(y) g.xy <- unique(c(g.x, g.y)) nc.x <- ncol(exprs(x)) nc.y <- ncol(exprs(y)) if (nc.x != nc.y) stop("For now only merging data in an identical set of experiements is allowed ! ") m <- matrix(as.numeric(NA), length(g.xy), nc.x) m.xy <- match(g.xy, g.x) m[!is.na(m.xy), ] <- exprs(x)[m.xy, ] m.xy <- match(g.xy, g.y) m[!is.na(m.xy), ] <- exprs(x)[m.xy, ] eset <- new("exprSet", exprs=matrix(0, 0, nc.x), phenoData=x@phenoData) eset@exprs <- m return(eset) } hopin' it helps, L. > > >> I will have an approach for option 2 in a couple of hours (a local > > colleague has been bothering me about this for awhile). > > And I look forward to seeing this too! > > Thanks. > -- > ---------------------------------------------------------------------- > Matthew Hobbs > > Garvan Institute of Medical Research > 384 Victoria St Ph : (02) 9295 8327 > Darlinghurst > http://www.garvan.org.au email: m.hobbs@garvan.org.au > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- -------------------------------------------------------------- currently at the National Yang-Ming University in Taipei, Taiwan -------------------------------------------------------------- Laurent Gautier CBS, Building 208, DTU PhD. Student DK-2800 Lyngby,Denmark tel: +45 45 25 24 89 http://www.cbs.dtu.dk/laurent

ADD REPLY • link 20.8 years ago Laurent Gautier ★ 2.3k

0

Entering edit mode

Susan G. Hilsenbeck ▴ 10

@susan-g-hilsenbeck-351

Last seen 9.6 years ago

A fairly large number of sequences targetted on the MGU74A chip were actually the antisense rather than the sense, and therefore the probes were sense rather than antisense. When the new version was designed the features (spots) used by each of the bad probesets were reshuffled, so that the locations of the probe pairs changed completely. In any case, the affected probesets on version 1 cannot be used. A strategy that was used several years ago, when this was initially discovered and many people had mixed datasets with both genechips, was to use the MGU74Av2 *.cdf but to mask out the affected genes for analyses involving both chips types. A file showing the original and replacement probeset ids is available from the NetAFFX website. >===== Original Message From Laurent Gautier <laurent@cbs.dtu.dk> ===== >On Fri, Jun 20, 2003 at 02:21:51PM +1000, Matthew Hobbs wrote: >> Hi, >> >> I have some cel files from experiments using the Affy MGU74A chip and some >> from experiments using the MGU74Av2 chip. Am I right in thinking that >> MGU74A and MGU74Av2 are actually the same chip described differently >> (because some probesets were originally designed wrongly)? > >I believe that v2 stands for "version 2" which means some improvement >was thrown in. > >> >> I wish to treat all this data together. Can I use theMGU74Av2 CDF to make a >> single Affybatch object containing both sorts of data? If not how should I >> proceed? > >If the differences between those two guys and the ones between U95A and U95v2 >are comparable, a simple way should be to have an AffyBatch which is the >"intersection" between the probe sets in both chips. To achieve this, >you will need the probes sequences files (should be available at >www.affymetrix.com) and the .CDF files (or corresponding 'cdfenvs'). >Using these four elements, you will craft a cdfenv that is the intersection >of the two others (the functions xy2indices and indices2xy should be handy >to shuttle from the x/y coordinates in the sequence files to the indices >(NOTE: add 1 to the x/y in the sequence file. The indexing starts at 0 >in Affymetrix files !!!!!)). >If you are lucky, you won't have to shuffle the values in the 'exprs' >matrix, but you'd better be prepared to do it... > > >Hopin' it helps, > > >L. > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 20.8 years ago Susan G. Hilsenbeck ▴ 10

Login before adding your answer.