2 color data...

0

Entering edit mode

milesg@bu.edu ▴ 20

@milesgbuedu-1803

Last seen 9.6 years ago

HI, my name is Gregory Miles. I'm at Boston University and was given this address by Dr. Carey (I went to a seminar of his last week) at the Harvard medical school and was told that I could ask my question about 2 color data to you. On the mouse microarray dataset we have, there are two colors, and therefore two values that can be below background. When both values are above background (zero_barcode on our chip), we keep the data and when both are below we eliminate the data (they become NA). I imagine this is a correct approach, but what should be done regarding the data that has one intensity below background and one above. Would it be best to keep the good value? Do we eliminate the entire gene from entry into bioconductor? Perhaps there is a way to specify to bioconductor that this is the case (by entering a background value) and allowing it to handle the data abstractly? Or is it best to let Bioconductor look at them as NA's. Any help would be greatly appreciated. Thanks! -Greg Miles

Microarray Microarray • 1.1k views

ADD COMMENT • link updated 17.8 years ago by Naomi Altman ★ 6.0k • written 17.8 years ago by milesg@bu.edu ▴ 20

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.0 years ago

United States

I would not delete data that is below background, even in both channels, if it is above background on at least one array. It seems to me that it is important information to know that a gene does not express under some condition in your experiment. Of course, the unfortunate side-effect of our liking to use ratios is that "zero" is not handled well. But a gene that expresses in some conditions of interest but not in others surely is of primary interest to your study. --Naomi At 11:48 AM 7/18/2006, milesg at bu.edu wrote: >HI, my name is Gregory Miles. I'm at Boston University and was given this >address by Dr. Carey (I went to a seminar of his last week) at the Harvard >medical school and was told that I could ask my question about 2 >color data to >you. On the mouse microarray dataset we have, there are two colors, and >therefore two values that can be below background. When both values are above >background (zero_barcode on our chip), we keep the data and when both are >below we eliminate the data (they become NA). I imagine this is a correct >approach, but what should be done regarding the data that has one intensity >below background and one above. Would it be best to keep the good >value? Do we >eliminate the entire gene from entry into bioconductor? Perhaps >there is a way >to specify to bioconductor that this is the case (by entering a background >value) and allowing it to handle the data abstractly? Or is it best to let >Bioconductor look at them as NA's. Any help would be greatly appreciated. >Thanks! >-Greg Miles > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 17.8 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

To those who responded to my last e-mail, thanks for the help. I had another question. I got my 2 color time course data into limma. I have a targets file with 2 replicates per time point for time points 1 day, 2 day, 4 day, 7 day, and 14 day. I have LIMMA assuming that these are not ALL replicates by telling it so. Please not that any semicolons coming up are NOT part of the code. There is supposed to be a nicely sized shift in differential expression from 4 days to 7 days, so I used those points for my comparison. As the LIMMA manual has stated, I have assigned my levels variable lev, assigned my factors variable f, and my design. I made my colnames variable: colnames(design)=lev ; and my fit variable: fit=lmFit(MA, design); where MA is the normalized RG. I continue to follow the manual (the variable names it gave me were X1day, X2day, etc.): cont=makeContrasts("X7day-X4day", levels=design); I then did fit2=contrasts.fit(fit, cont) ; then fit2=eBayes(fit2); then I did selected=p.adjust(fit2$F.p.value, method="BH")<0.05 to get the genes that change from 4 days to 7 days with strong p-values. Unfortunately, looking at the results yield only about 30 genes (there should be several hundred), none of whom (by eye) undergo any significant change in differential expression from the 4 day point to the 7 day point. Can someone please help me with what I may be doing wrong? Any help would be greatly appreciated. Thanks! -greg

ADD REPLY • link 17.8 years ago milesg@bu.edu ▴ 20

0

Entering edit mode

For this type of problem, it usually helps if you paste your code to the end of the message. --Naomi At 02:16 PM 7/19/2006, milesg at bu.edu wrote: >To those who responded to my last e-mail, thanks for the help. I had another >question. I got my 2 color time course data into limma. I have a targets file >with 2 replicates per time point for time points 1 day, 2 day, 4 day, 7 day, >and 14 day. I have LIMMA assuming that these are not ALL replicates >by telling >it so. Please not that any semicolons coming up are NOT part of the code. >There is supposed to be a nicely sized shift in differential >expression from 4 >days to 7 days, so I used those points for my comparison. As the LIMMA manual >has stated, I have assigned my levels variable lev, assigned my factors >variable f, and my design. I made my colnames variable: >colnames(design)=lev ; >and my fit variable: fit=lmFit(MA, design); where MA is the normalized RG. I >continue to follow the manual (the variable names it gave me were X1day, >X2day, etc.): cont=makeContrasts("X7day-X4day", levels=design); I then did >fit2=contrasts.fit(fit, cont) ; then fit2=eBayes(fit2); then I did >selected=p.adjust(fit2$F.p.value, method="BH")<0.05 to get the genes that >change from 4 days to 7 days with strong p-values. Unfortunately, looking at >the results yield only about 30 genes (there should be several hundred), none >of whom (by eye) undergo any significant change in differential expression >from the 4 day point to the 7 day point. Can someone please help me with what >I may be doing wrong? Any help would be greatly appreciated. Thanks! >-greg > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 17.8 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 10 hours ago

United States

Hi Miles, milesg at bu.edu wrote: > HI, my name is Gregory Miles. I'm at Boston University and was given this > address by Dr. Carey (I went to a seminar of his last week) at the Harvard > medical school and was told that I could ask my question about 2 color data to > you. On the mouse microarray dataset we have, there are two colors, and > therefore two values that can be below background. When both values are above > background (zero_barcode on our chip), we keep the data and when both are > below we eliminate the data (they become NA). I imagine this is a correct > approach, but what should be done regarding the data that has one intensity > below background and one above. Would it be best to keep the good value? Do we > eliminate the entire gene from entry into bioconductor? Perhaps there is a way > to specify to bioconductor that this is the case (by entering a background > value) and allowing it to handle the data abstractly? Or is it best to let > Bioconductor look at them as NA's. Any help would be greatly appreciated. Probably the easiest way to handle such things is to use the limma package and when you do background correction, use the 'normexp' method, which ensures that none of the background corrected values will be below zero. This is probably not critical for those genes that are both below zero (since you probably want to ignore those anyway), but you certainly wouldn't want to ignore a gene where one sample is below zero and the other is (possibly) a large value. If you want to use limma, I would strongly suggest perusing the user's guide. The learning curve can be steep, especially if you don't have a statistical background (load limma, then type limmaUsersGuide() at the R prompt). HTH, Jim > Thanks! > -Greg Miles > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD COMMENT • link 17.8 years ago James W. MacDonald 65k

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.0 years ago

United States

I would not do this. Use less background correction, (e.g. don't background correct, or subtract 1/2 of the background), or set the channels that are below background to some low value (e.g. 1) so that logs can be used. --Naomi At 09:48 AM 7/19/2006, you wrote: >Thanks for your quick response. I will not delete the gene completely (if you >delete genes then LIMMA doesn't know how to handle genes lists with different >orders), but although it is helpful to keep genes that may have >information in >one array, I do think it may be necessary to "NA" the below background values >and keep the above background ones. Thus you still have the good values but >have eliminated possible bad ones. What do you think of this? >-greg > >Quoting Naomi Altman <naomi at="" stat.psu.edu="">: > > > I would not delete data that is below background, even in both > > channels, if it is above background on at least one array. > > > > It seems to me that it is important information to know that a gene > > > > does not express under some condition in your experiment. Of course, > > > > the unfortunate side-effect of our liking to use ratios is that > > "zero" is not handled well. But a gene that expresses in some > > conditions of interest but not in others surely is of primary > > interest to your study. > > > > --Naomi > > > > At 11:48 AM 7/18/2006, milesg at bu.edu wrote: > > >HI, my name is Gregory Miles. I'm at Boston University and was given > > this > > >address by Dr. Carey (I went to a seminar of his last week) at the > > Harvard > > >medical school and was told that I could ask my question about 2 > > >color data to > > >you. On the mouse microarray dataset we have, there are two colors, > > and > > >therefore two values that can be below background. When both values > > are above > > >background (zero_barcode on our chip), we keep the data and when > > both are > > >below we eliminate the data (they become NA). I imagine this is a > > correct > > >approach, but what should be done regarding the data that has one > > intensity > > >below background and one above. Would it be best to keep the good > > >value? Do we > > >eliminate the entire gene from entry into bioconductor? Perhaps > > >there is a way > > >to specify to bioconductor that this is the case (by entering a > > background > > >value) and allowing it to handle the data abstractly? Or is it best > > to let > > >Bioconductor look at them as NA's. Any help would be greatly > > appreciated. > > >Thanks! > > >-Greg Miles > > > > > >_______________________________________________ > > >Bioconductor mailing list > > >Bioconductor at stat.math.ethz.ch > > >https://stat.ethz.ch/mailman/listinfo/bioconductor > > >Search the archives: > > >http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > Naomi S. Altman 814-865-3791 (voice) > > Associate Professor > > Dept. of Statistics 814-863-7114 (fax) > > Penn State University 814-865-1348 > > (Statistics) > > University Park, PA 16802-2111 > > > > Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 17.8 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Just to add a bit here, with many image analysis options, there are other measures of the "quality" of a spot besides intensity. Since limma will allow you to incorporate this information into an analysis, you might think about whether there is some quantity reported by your image analysis software that might be useful in this regard. I agree with Naomi that excluding genes based on low-intensity spots in some subset of the arrays may be discarding some of the most interesting data. Sean On 7/19/06 8:10, "Naomi Altman" <naomi at="" stat.psu.edu=""> wrote: > I would not do this. Use less background correction, (e.g. don't > background correct, or subtract 1/2 of the background), or set the > channels that are below background to some low value (e.g. 1) so that > logs can be used. > > --Naomi > > At 09:48 AM 7/19/2006, you wrote: >> Thanks for your quick response. I will not delete the gene completely (if you >> delete genes then LIMMA doesn't know how to handle genes lists with different >> orders), but although it is helpful to keep genes that may have >> information in >> one array, I do think it may be necessary to "NA" the below background values >> and keep the above background ones. Thus you still have the good values but >> have eliminated possible bad ones. What do you think of this? >> -greg >> >> Quoting Naomi Altman <naomi at="" stat.psu.edu="">: >> >>> I would not delete data that is below background, even in both >>> channels, if it is above background on at least one array. >>> >>> It seems to me that it is important information to know that a gene >>> >>> does not express under some condition in your experiment. Of course, >>> >>> the unfortunate side-effect of our liking to use ratios is that >>> "zero" is not handled well. But a gene that expresses in some >>> conditions of interest but not in others surely is of primary >>> interest to your study. >>> >>> --Naomi >>> >>> At 11:48 AM 7/18/2006, milesg at bu.edu wrote: >>>> HI, my name is Gregory Miles. I'm at Boston University and was given >>> this >>>> address by Dr. Carey (I went to a seminar of his last week) at the >>> Harvard >>>> medical school and was told that I could ask my question about 2 >>>> color data to >>>> you. On the mouse microarray dataset we have, there are two colors, >>> and >>>> therefore two values that can be below background. When both values >>> are above >>>> background (zero_barcode on our chip), we keep the data and when >>> both are >>>> below we eliminate the data (they become NA). I imagine this is a >>> correct >>>> approach, but what should be done regarding the data that has one >>> intensity >>>> below background and one above. Would it be best to keep the good >>>> value? Do we >>>> eliminate the entire gene from entry into bioconductor? Perhaps >>>> there is a way >>>> to specify to bioconductor that this is the case (by entering a >>> background >>>> value) and allowing it to handle the data abstractly? Or is it best >>> to let >>>> Bioconductor look at them as NA's. Any help would be greatly >>> appreciated. >>>> Thanks! >>>> -Greg Miles >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> Naomi S. Altman 814-865-3791 (voice) >>> Associate Professor >>> Dept. of Statistics 814-863-7114 (fax) >>> Penn State University 814-865-1348 >>> (Statistics) >>> University Park, PA 16802-2111 >>> >>> > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 17.8 years ago Sean Davis 21k

0

Entering edit mode

Hi Miles, We are all trying to say: Why do you think a channel with a near- or below-background is a "bad" value and needs to be removed? If the gene is not expressed in one treatment, then it should have a 'zero' value, which due to array technology will be some positive number near background fluorescence. By removing the value completely, you are saying in the analysis "there is no information for that sample on that array", but you do have information on that sample - it was below detectable level. Even though the number that comes out of the background correction, either 0.5 or 1 as suggested, is not entirely accurate, it is relatively accurate to numbers a good deal higher. Conversely, you should not throw away saturated values either, because even though you don't know exactly how large they were, you do know they were large. If both channels of a spot are near/below background on every single array in your experiment, then you can remove the entire gene/spot from the analysis. Cheers, Jenny At 09:10 AM 7/19/2006, Naomi Altman wrote: >I would not do this. Use less background correction, (e.g. don't >background correct, or subtract 1/2 of the background), or set the >channels that are below background to some low value (e.g. 1) so that >logs can be used. > >--Naomi > >At 09:48 AM 7/19/2006, you wrote: > >Thanks for your quick response. I will not delete the gene completely > (if you > >delete genes then LIMMA doesn't know how to handle genes lists with > different > >orders), but although it is helpful to keep genes that may have > >information in > >one array, I do think it may be necessary to "NA" the below background > values > >and keep the above background ones. Thus you still have the good values but > >have eliminated possible bad ones. What do you think of this? > >-greg > > > >Quoting Naomi Altman <naomi at="" stat.psu.edu="">: > > > > > I would not delete data that is below background, even in both > > > channels, if it is above background on at least one array. > > > > > > It seems to me that it is important information to know that a gene > > > > > > does not express under some condition in your experiment. Of course, > > > > > > the unfortunate side-effect of our liking to use ratios is that > > > "zero" is not handled well. But a gene that expresses in some > > > conditions of interest but not in others surely is of primary > > > interest to your study. > > > > > > --Naomi > > > > > > At 11:48 AM 7/18/2006, milesg at bu.edu wrote: > > > >HI, my name is Gregory Miles. I'm at Boston University and was given > > > this > > > >address by Dr. Carey (I went to a seminar of his last week) at the > > > Harvard > > > >medical school and was told that I could ask my question about 2 > > > >color data to > > > >you. On the mouse microarray dataset we have, there are two colors, > > > and > > > >therefore two values that can be below background. When both values > > > are above > > > >background (zero_barcode on our chip), we keep the data and when > > > both are > > > >below we eliminate the data (they become NA). I imagine this is a > > > correct > > > >approach, but what should be done regarding the data that has one > > > intensity > > > >below background and one above. Would it be best to keep the good > > > >value? Do we > > > >eliminate the entire gene from entry into bioconductor? Perhaps > > > >there is a way > > > >to specify to bioconductor that this is the case (by entering a > > > background > > > >value) and allowing it to handle the data abstractly? Or is it best > > > to let > > > >Bioconductor look at them as NA's. Any help would be greatly > > > appreciated. > > > >Thanks! > > > >-Greg Miles > > > > > > > >_______________________________________________ > > > >Bioconductor mailing list > > > >Bioconductor at stat.math.ethz.ch > > > >https://stat.ethz.ch/mailman/listinfo/bioconductor > > > >Search the archives: > > > >http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > Naomi S. Altman 814-865-3791 (voice) > > > Associate Professor > > > Dept. of Statistics 814-863-7114 (fax) > > > Penn State University 814-865-1348 > > > (Statistics) > > > University Park, PA 16802-2111 > > > > > > > >Naomi S. Altman 814-865-3791 (voice) >Associate Professor >Dept. of Statistics 814-863-7114 (fax) >Penn State University 814-865-1348 (Statistics) >University Park, PA 16802-2111 > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu

ADD REPLY • link 17.8 years ago Jenny Drnevich ★ 2.2k

0

Entering edit mode

J.delasHeras@ed.ac.uk ★ 1.9k

@jdelasherasedacuk-1189

Last seen 8.7 years ago

United Kingdom

Quoting milesg at bu.edu: > HI, my name is Gregory Miles. I'm at Boston University and was given this > address by Dr. Carey (I went to a seminar of his last week) at the Harvard > medical school and was told that I could ask my question about 2 > color data to > you. On the mouse microarray dataset we have, there are two colors, and > therefore two values that can be below background. When both values are above > background (zero_barcode on our chip), we keep the data and when both are > below we eliminate the data (they become NA). I imagine this is a correct > approach, but what should be done regarding the data that has one intensity > below background and one above. Would it be best to keep the good > value? Do we > eliminate the entire gene from entry into bioconductor? If you eliminate a gene because it has no signal in one channel, you may be eliminating some of the most interesting genes! It depends on the biology of your experiments. You should always think about the experiment underneath, not just about numbers :-) A gene with no signal in one channel, but ok signal on the other, may be a gene that becomes silenced after your treatment, or switched on. In my particular case *those* are the genes that I am after, so I keep them and cherish them ;-) However, when there's no signal in both channels, on all your experiments, it sounds reasonable to eliminate them. Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD COMMENT • link 17.8 years ago J.delasHeras@ed.ac.uk ★ 1.9k

Login before adding your answer.