marray and gathering additional information from GenePix GPR file

0

Entering edit mode

Daniel Brewer ★ 1.9k

@daniel-brewer-1791

Last seen 11.4 years ago

Hi, I am new to the use of bioconductor, so bear with me. I am using GenePix as my image processing software for cDNA arrays. As it is set up there is an automated procedure that determines whether a spot should be considered for further use, and it uses the Normalize column to indicate this (value of 1 accept, value of 0 drop). I would like to filter out all the "bad" spots after normalisation. I use the following to read in the GPR files. > mraw <- read.GenePix(targets=TargetInfo,name.Gb = NULL, name.Rb = NULL) mraw at maW, successfully gives the flag values, but the Normalize column does not appear to be in the object. Is there anyway to input this information? Many thanks Daniel Brewer ************************************************************** Daniel Brewer, Ph.D. Institute of Cancer Research

Cancer Cancer • 2.2k views

ADD COMMENT • link updated 19.6 years ago by J.delasHeras@ed.ac.uk ★ 1.9k • written 19.6 years ago by Daniel Brewer ★ 1.9k

0

Entering edit mode

J.delasHeras@ed.ac.uk ★ 1.9k

@jdelasherasedacuk-1189

Last seen 10.5 years ago

United Kingdom

Quoting Daniel Brewer <daniel.brewer at="" icr.ac.uk="">: > Hi, > > I am new to the use of bioconductor, so bear with me. I am using > GenePix as my image processing software for cDNA arrays. As it is set > up there is an automated procedure that determines whether a spot should > be considered for further use, and it uses the Normalize column to > indicate this (value of 1 accept, value of 0 drop). I would like to > filter out all the "bad" spots after normalisation. > > I use the following to read in the GPR files. >> mraw <- read.GenePix(targets=TargetInfo,name.Gb = NULL, name.Rb = NULL) > > mraw at maW, successfully gives the flag values, but the Normalize column > does not appear to be in the object. Is there anyway to input this > information? > > Many thanks > > Daniel Brewer > > ************************************************************** > > Daniel Brewer, Ph.D. > > Institute of Cancer Research Hi Daniel, You could try reading the gpr file with 'read.table' as a data frame, take the column you want from it, add it to your marray object and delete the data frame. I don't use marray, I use limma, and I use the same procedure to import other columns I am interested in (in my case the SNR columns at the end of the gpr file, and perhaps the flags). I hope it helps. Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD COMMENT • link 19.6 years ago J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

J.delasHeras at ed.ac.uk wrote: > Quoting Daniel Brewer <daniel.brewer at="" icr.ac.uk="">: > >> Hi, >> >> I am new to the use of bioconductor, so bear with me. I am using >> GenePix as my image processing software for cDNA arrays. As it is set >> up there is an automated procedure that determines whether a spot should >> be considered for further use, and it uses the Normalize column to >> indicate this (value of 1 accept, value of 0 drop). I would like to >> filter out all the "bad" spots after normalisation. >> >> I use the following to read in the GPR files. >>> mraw <- read.GenePix(targets=TargetInfo,name.Gb = NULL, name.Rb = NULL) >> >> mraw at maW, successfully gives the flag values, but the Normalize column >> does not appear to be in the object. Is there anyway to input this >> information? >> >> Many thanks >> >> Daniel Brewer >> >> ************************************************************** >> >> Daniel Brewer, Ph.D. >> >> Institute of Cancer Research > > Hi Daniel, > > You could try reading the gpr file with 'read.table' as a data frame, > take the column you want from it, add it to your marray object and > delete the data frame. > > I don't use marray, I use limma, and I use the same procedure to import > other columns I am interested in (in my case the SNR columns at the end > of the gpr file, and perhaps the flags). > > I hope it helps. > > Jose > Thanks for the suggestion, I am sure that would work. I did find another way round it by defining weights i.e. > mraw2 <- read.GenePix(targets=TargetInfo,name.Gb = NULL, name.Rb = NULL, name.W="Normalize") And that seems to work fine. I am sorry if this is a simplistic question but I am trying to filter the marrayRaw object by this column but I am running into problems. This is the approach: > mraw2[mraw2 at maW == 1,] Error in mraw2[mraw2 at maW == 1, ] : (subscript) logical subscript too long I think this is because maW is a long vector across all microarrays whereas the first index is just across one array. Anyway round this. What I am aiming to do is create an marray object with only those spots with the normalize column = 1. Maybe genefilter is more suited to this. Thanks Daniel Brewer

ADD REPLY • link 19.6 years ago Daniel Brewer ★ 1.9k

0

Entering edit mode

Hi Daniel, >Thanks for the suggestion, I am sure that would work. I did find >another way round it by defining weights i.e. > > > mraw2 <- read.GenePix(targets=TargetInfo,name.Gb = NULL, name.Rb = > NULL, name.W="Normalize") > >And that seems to work fine. I am sorry if this is a simplistic >question but I am trying to filter the marrayRaw object by this column >but I am running into problems. This is the approach: > > > mraw2[mraw2 at maW == 1,] >Error in mraw2[mraw2 at maW == 1, ] : (subscript) logical subscript too long > >I think this is because maW is a long vector across all microarrays >whereas the first index is just across one array. Anyway round this. >What I am aiming to do is create an marray object with only those spots >with the normalize column = 1. Maybe genefilter is more suited to this. First, I would caution you about discarding all spots called 'bad' by your GenePix setup - do you really know what it's doing? Often, spots are labeled 'bad' if the spot foreground is not detectable above some level of the background. Unless there is a defect on the chip, these type of spots are actually 'real' data, not 'bad' data - more like a value of 0. Now, the number that is associated with that spot is not entirely accurate, but it is relatively accurate compared to spots with detectable values. In an extreme example, by throwing out these spots, you may not detect a gene that is only turned on in one of your treatment groups! However, you do need to discard spot values that are a result of obvious defects on the array, so you still need an answer to your question: You can't really 'remove' all the spots that have normalize=0 from the marray object because it has to be a full matrix. One possibility is to set them all to NA: mraw2[mraw2 at maW==0] <- NA # note there is no row,col in the subsetting because you want to replace individual values, not rows or columns However, this may cause problems farther down the line if some functions can't handle NA values. I not really familiar with marray objects, but in limma RGList or MAList objects, having a spot weighted zero means that it will not be used in most functions automatically. You should check into how the preprocessing & analysis functions you want to use work - they may already make use of your maW component, or they may have an argument to set weights in the call to the function. Cheers, Jenny >Thanks > >Daniel Brewer > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu

ADD REPLY • link 19.6 years ago Jenny Drnevich ★ 2.2k

0

Entering edit mode

Jenny Drnevich wrote: > Hi Daniel, > > > First, I would caution you about discarding all spots called 'bad' by > your GenePix setup - do you really know what it's doing? Often, spots > are labeled 'bad' if the spot foreground is not detectable above some > level of the background. Unless there is a defect on the chip, these > type of spots are actually 'real' data, not 'bad' data - more like a > value of 0. Now, the number that is associated with that spot is not > entirely accurate, but it is relatively accurate compared to spots with > detectable values. In an extreme example, by throwing out these spots, > you may not detect a gene that is only turned on in one of your > treatment groups! However, you do need to discard spot values that are a > result of obvious defects on the array, so you still need an answer to > your question: > Thanks for that, that makes a lot of sense. There are "spots" that are empty by design (there is no spot there) and are indicated as such in maControls. Is it ok, and sensible to remove these from the object? i.e. > mraw2 <- mraw2[maControls(mraw2) == 'probes'] Thanks Daniel

ADD REPLY • link 19.6 years ago Daniel Brewer ★ 1.9k

0

Entering edit mode

Hi Daniel, At 10:35 AM 7/7/2006, Daniel Brewer wrote: >Jenny Drnevich wrote: > > Hi Daniel, > > > > > > First, I would caution you about discarding all spots called 'bad' by > > your GenePix setup - do you really know what it's doing? Often, spots > > are labeled 'bad' if the spot foreground is not detectable above some > > level of the background. Unless there is a defect on the chip, these > > type of spots are actually 'real' data, not 'bad' data - more like a > > value of 0. Now, the number that is associated with that spot is not > > entirely accurate, but it is relatively accurate compared to spots with > > detectable values. In an extreme example, by throwing out these spots, > > you may not detect a gene that is only turned on in one of your > > treatment groups! However, you do need to discard spot values that are a > > result of obvious defects on the array, so you still need an answer to > > your question: > > > >Thanks for that, that makes a lot of sense. There are "spots" that are >empty by design (there is no spot there) and are indicated as such in >maControls. Is it ok, and sensible to remove these from the object? >i.e. > > mraw2 <- mraw2[maControls(mraw2) == 'probes'] Yes, you can remove empty 'spots' because in this case, you are talking about removing the same spot on every array, not just particular arrays. I'm assuming maControls(mraw2) is a vector with length== #spots per array and if so, the proper way to subset to pull out all the rows corresponding to 'probes' should be: mraw2 <- mraw2[maControls(mraw2) == 'probes' , ] I also want to clarify (for the record) what I meant about spots with '0' values - I was referring to the individual channel intensities, not the log ratio. If both channels' intensities were 0, then the log ratio would be undefined, but in practical terms, the log ratio should be 1 (not changed) for undetectable spots. If you are using log ratios for the statistical analysis, you might want to check that your undetectable spots are close to 1 after pre-processing, and if they are not, you might consider changing them. Jenny >Thanks > >Daniel > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu

ADD REPLY • link 19.6 years ago Jenny Drnevich ★ 2.2k

0

Entering edit mode

Quoting Jenny Drnevich <drnevich at="" uiuc.edu="">: [...] > I also want to clarify (for the record) what I meant about spots with '0' > values - I was referring to the individual channel intensities, not the log > ratio. If both channels' intensities were 0, then the log ratio would be > undefined, but in practical terms, the log ratio should be 1 (not changed) > for undetectable spots. If you are using log ratios for the statistical > analysis, you might want to check that your undetectable spots are close to > 1 after pre-processing, and if they are not, you might consider > changing them. > > Jenny Hi Jenny, I never heard anybody doing that before... but I guess it makes sense. Unless the model applied already takes into account the intensities somehow... hmmmm, something to think about. I suppose that's what spot weights are for :-) ok, end of pointless rambling... thanks Jenny for those comments! Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD REPLY • link 19.6 years ago J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

Just to add further comment to this discussion about empty spots in GenePix. Empty spots could be due to the physical printing not taking place at that spot position or buffer which contains no physical cDNA/oligo probe could be deposited by the printing quill at that spot position. In gpr files I've looked at, GenePix image analysis provides a non zero value (but close to zero) to these intensities. On an MA scale, you should see them distributed around 0 on the M scale and have Abundance values slightly lower than other non expressing spots. They are usually labelled "" or "Empty" in the annotation. Agilent intensity files I have looked at don't contain these empty spots. Marcus On 7/9/06 4:33 AM, "J.delasHeras at ed.ac.uk" <j.delasheras at="" ed.ac.uk=""> wrote: > Quoting Jenny Drnevich <drnevich at="" uiuc.edu="">: > > [...] >> I also want to clarify (for the record) what I meant about spots with '0' >> values - I was referring to the individual channel intensities, not the log >> ratio. If both channels' intensities were 0, then the log ratio would be >> undefined, but in practical terms, the log ratio should be 1 (not changed) >> for undetectable spots. If you are using log ratios for the statistical >> analysis, you might want to check that your undetectable spots are close to >> 1 after pre-processing, and if they are not, you might consider >> changing them. >> >> Jenny > > > Hi Jenny, > > I never heard anybody doing that before... but I guess it makes sense. > Unless the model applied already takes into account the intensities > somehow... hmmmm, something to think about. > I suppose that's what spot weights are for :-) > > ok, end of pointless rambling... > > thanks Jenny for those comments! > > Jose ______________________________________________________ The contents of this e-mail are privileged and/or confidenti...{{dropped}}

ADD REPLY • link 19.6 years ago Marcus Davy ▴ 680

0

Entering edit mode

Quoting Marcus Davy <mdavy at="" hortresearch.co.nz="">: > > Just to add further comment to this discussion about empty spots in GenePix. > > Empty spots could be due to the physical printing not taking place at that > spot position or buffer which contains no physical cDNA/oligo probe could be > deposited by the printing quill at that spot position. > > In gpr files I've looked at, GenePix image analysis provides a non zero > value (but close to zero) to these intensities. close to "background", rather. > On an MA scale, you should > see them distributed around 0 on the M scale and have Abundance values > slightly lower than other non expressing spots. By Abundance you're referring to the A values, right? What I tend to see is a spread for those spots, it can be quite broad, but as you say, they're always distinctly "clustering" at lower A values. If you substract background, then you do get zero values (or 0.5, actually, if you use the method "half" to make sure you don't get negative intensities, which can happen for those "empty" spots easily) > They are usually labelled "" or "Empty" in the annotation. one word of warning: check that empty spots are really empty. On my latest arrays, I noticed quite a few spots labelled as empty, when I could clearly see a spot. I got worried I was using the wrong annotation, so I contacted the people who made the arrays. It turned out that after printing, they found that they couldn't trust the identity of some cDNAs, and they simply marked those spots as "empty" in the annotation. It can be very bad, if you're using those spots to get an estimate of background, as some people do. Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD REPLY • link 19.6 years ago J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

Yes the spot intensities (labeled "Empty") which are recorded in a GenePix gpr file should be distributed similarly to the background intensities on a 16 bit GenePix scanner (between 0 and several hundred). Essentially they are measures of background in the foreground spot area. Yes I mean A values, and I should have mentioned that the grouping on an MA scale I described is without background correction. If you background correct I would expect a shift in the MA plot distribution to the left for the A scale. Depending on the method chosen a lot of NA's will be evaluated when foreground is less than or equal to background. I think that due to the distribution of background intensities, there is a probability that for some empty or buffer only spots the foreground measure will be higher than the background in both the Red and Green channel simultaneously so that these small positive numbers do not get converted to NA's when calculating M. In the case of buffer only spots this could be due to the physical contact between the metal printtip and the glass slide, the buffer deposited, or even high PMT scanner settings adding noise during scanning. Marcus On 7/19/06 10:06 PM, "J.delasHeras at ed.ac.uk" <j.delasheras at="" ed.ac.uk=""> wrote: > Quoting Marcus Davy <mdavy at="" hortresearch.co.nz="">: > >> >> Just to add further comment to this discussion about empty spots in GenePix. >> >> Empty spots could be due to the physical printing not taking place at that >> spot position or buffer which contains no physical cDNA/oligo probe could be >> deposited by the printing quill at that spot position. >> >> In gpr files I've looked at, GenePix image analysis provides a non zero >> value (but close to zero) to these intensities. > > close to "background", rather. > >> On an MA scale, you should >> see them distributed around 0 on the M scale and have Abundance values >> slightly lower than other non expressing spots. > > By Abundance you're referring to the A values, right? > > What I tend to see is a spread for those spots, it can be quite broad, > but as you say, they're always distinctly "clustering" at lower A > values. > > If you substract background, then you do get zero values (or 0.5, > actually, if you use the method "half" to make sure you don't get > negative intensities, which can happen for those "empty" spots easily) > >> They are usually labelled "" or "Empty" in the annotation. > > one word of warning: check that empty spots are really empty. > > On my latest arrays, I noticed quite a few spots labelled as empty, > when I could clearly see a spot. I got worried I was using the wrong > annotation, so I contacted the people who made the arrays. It turned > out that after printing, they found that they couldn't trust the > identity of some cDNAs, and they simply marked those spots as "empty" > in the annotation. It can be very bad, if you're using those spots to > get an estimate of background, as some people do. > > Jose ______________________________________________________ The contents of this e-mail are privileged and/or confidenti...{{dropped}}

ADD REPLY • link 19.6 years ago Marcus Davy ▴ 680

0

Entering edit mode

Hi Marcus, In my attempt to clarify things, I had a brain lapse and messed it up further! Thanks for pointing it out. I meant to say exactly what you said - the M values (log2(R/G) or log2(R) - log2(G) ) should be 0 for empty spots. It's the absolute ratio R/G that should be 1. Jenny At 06:24 PM 7/18/2006, Marcus Davy wrote: >Just to add further comment to this discussion about empty spots in GenePix. > >Empty spots could be due to the physical printing not taking place at that >spot position or buffer which contains no physical cDNA/oligo probe could be >deposited by the printing quill at that spot position. > >In gpr files I've looked at, GenePix image analysis provides a non zero >value (but close to zero) to these intensities. On an MA scale, you should >see them distributed around 0 on the M scale and have Abundance values >slightly lower than other non expressing spots. > >They are usually labelled "" or "Empty" in the annotation. > >Agilent intensity files I have looked at don't contain these empty spots. > > >Marcus > > >On 7/9/06 4:33 AM, "J.delasHeras at ed.ac.uk" <j.delasheras at="" ed.ac.uk=""> wrote: > > > Quoting Jenny Drnevich <drnevich at="" uiuc.edu="">: > > > > [...] > >> I also want to clarify (for the record) what I meant about spots with '0' > >> values - I was referring to the individual channel intensities, not > the log > >> ratio. If both channels' intensities were 0, then the log ratio would be > >> undefined, but in practical terms, the log ratio should be 1 (not changed) > >> for undetectable spots. If you are using log ratios for the statistical > >> analysis, you might want to check that your undetectable spots are > close to > >> 1 after pre-processing, and if they are not, you might consider > >> changing them. > >> > >> Jenny > > > > > > Hi Jenny, > > > > I never heard anybody doing that before... but I guess it makes sense. > > Unless the model applied already takes into account the intensities > > somehow... hmmmm, something to think about. > > I suppose that's what spot weights are for :-) > > > > ok, end of pointless rambling... > > > > thanks Jenny for those comments! > > > > Jose > > >______________________________________________________ > >The contents of this e-mail are privileged and/or confidential to the >named recipient and are not to be used by any other person and/or >organisation. If you have received this e-mail in error, please notify >the sender and delete all material pertaining to this e-mail. >______________________________________________________ Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu

ADD REPLY • link 19.6 years ago Jenny Drnevich ★ 2.2k

Login before adding your answer.