Intra variance Vs inter group variance: scared!

0

Entering edit mode

Emmanuel Levy ▴ 270

@emmanuel-levy-1240

Last seen 6.3 years ago

Dear All, I've got two conditions and three replicates per condition: A1 A2 A3 B1 B2 B3 To test the INTRA VS INTER group variance, I compared the fold changes within group and between groups: length(which(A1/A2 > 5))=686 length(which(A1/B1 > 5))=708 The fact that this is similar is quite scary! What do you think? Do you know of a package that would show somehow that the noise found above should not prevent me from getting meaningful results with these data? Many thanks in advance for your help, Emmanuel

• 2.0k views

ADD COMMENT • link updated 17.2 years ago by Naomi Altman ★ 6.0k • written 17.2 years ago by Emmanuel Levy ▴ 270

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 2 hours ago

United States

Hi Emmanuel, Emmanuel Levy wrote: > Dear All, > > I've got two conditions and three replicates per condition: > A1 A2 A3 B1 B2 B3 > > To test the INTRA VS INTER group variance, I compared the fold changes > within group and between groups: > > length(which(A1/A2 > 5))=686 > length(which(A1/B1 > 5))=708 > > The fact that this is similar is quite scary! What do you think? Scary? No. I do think it illustrates why Statisticians recommend doing replicates rather than relying on fold change for assessing differential expression. It is quite possible that the 686 genes that have a fold change greater than 5 are just noisy in general, so if you use a statistic that accounts for the intra-sample variance these genes will likely not end up being significant. It is also possible that these data are Just Not Good(tm). I presume you have done some other exploratory data analysis designed to tell if the data are bad or not? > > Do you know of a package that would show somehow that the noise found above > should not prevent me from getting meaningful results with these data? It depends on the platform. However, there are many packages on this page designed for assessing the quality of your data: http://www.bioconductor.org/packages/1.9/Visualization.html Best, Jim > > Many thanks in advance for your help, > > Emmanuel > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD COMMENT • link 17.2 years ago James W. MacDonald 65k

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.0 years ago

United States

CyberT compares the experimental noise to the biological signal. Statistically significant genes are those that have signal higher than noise. I think that you are asking about the false detection and nondetection rates. False nondetection will be high if the noise is high. A rough estimate of the false nondetection rate (but not the contributing genes) can be made using the qvalue package. qvalue uses the p-values from cyberT to estimate FDR. En route, it estimate pi-0, the percentage of genes that do NOT differentially express. (1-pi-0)x Ngenes = estimated number of genes that do differentially express. Subtract from this the estimated number of truly differentially expressed genes you have detected (1-FDR) x N significant. You now have a rough estimate of how many you missed. But realistically, the more noise in the data, the rougher this estimate is, too. --Naomi At 10:57 AM 2/6/2007, Emmanuel Levy wrote: >Dear James and Naomi, > >Thanks for your suggestions. > >Quality control is not exactly what I am looking for: I would like to compare >the experimental noise compared to the "biological signal". > >I agree that fold change is not a great measure, and of course I use a >statisticaly >robust method for comparing the INTER variance (cyber-T). So I am >confident about >the DEGs I find. What I am more concerned about are the trues DEGs >that I do _not_ >find because of the experimental noise. And, if the experimental noise >is of the same >order of magnitude as my biological signal, I guess my conclusions >would not be very meaningful. (am I right?) > >So, to compare the INTRA VS. INTER, I looked at the number of genes >found above >different fold change thresholds, between samples in the same or in >different groups. (I used fold change because I have only three >replicates so I can only do pairwise comparisons). Obviously this >method has important limits but it is to get an idea. > >I was wondering if there was an established standart procedure to check this. > >I hope I made my thoughts clearer and that you can point me to something. > >Best wishes, > >Emmanuel > > > > > You should look at some quality control measures for your arrays. > > > If > > all is well, then you should use a statistical measure of > > differential expression. There are several available in > > Bioconductor. I usually use Limma. Others like multtest, samr > or siggenes. > > > > --Naomi > > > > At 03:23 PM 2/5/2007, you wrote: > > >Dear All, > > > > > >I've got two conditions and three replicates per condition: > > >A1 A2 A3 B1 B2 B3 > > > > > >To test the INTRA VS INTER group variance, I compared the fold changes > > >within group and between groups: > > > > > >length(which(A1/A2 > 5))=686 > > >length(which(A1/B1 > 5))=708 > > > > > >The fact that this is similar is quite scary! What do you think? > > > > > >Do you know of a package that would show somehow that the noise > found above > > >should not prevent me from getting meaningful results with these data? > > > > > >Many thanks in advance for your help, > > > > > >Emmanuel > > > > > >_______________________________________________ > > >Bioconductor mailing list > > >Bioconductor at stat.math.ethz.ch > > >https://stat.ethz.ch/mailman/listinfo/bioconductor > > >Search the archives: > > >http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > Naomi S. Altman 814-865-3791 (voice) > > Associate Professor > > Dept. of Statistics 814-863-7114 (fax) > > Penn State University 814-865-1348 (Statistics) > > University Park, PA 16802-2111 > > > > > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 17.2 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Dear Naomi, Thanks a lot, I guess this is what I was looking for :) Thanks for the summary too. Typically do you know how common it is that INTER and INTRA variations are comparable? Best, Emmanuel On 2/6/07, Naomi Altman <naomi at="" stat.psu.edu=""> wrote: > CyberT compares the experimental noise to the biological > signal. Statistically significant genes are those that have signal > higher than noise. > > I think that you are asking about the false detection and > nondetection rates. False nondetection will be high if the noise is > high. A rough estimate of the false nondetection rate (but not the > contributing genes) can be made using the qvalue package. > > qvalue uses the p-values from cyberT to estimate FDR. En route, it > estimate pi-0, the percentage of genes that do NOT differentially express. > (1-pi-0)x Ngenes = estimated number of genes that do differentially express. > > Subtract from this the estimated number of truly differentially > expressed genes you have detected (1-FDR) x N significant. You now > have a rough estimate of how many you missed. But realistically, the > more noise in the data, the rougher this estimate is, too. > > --Naomi > > At 10:57 AM 2/6/2007, Emmanuel Levy wrote: > >Dear James and Naomi, > > > >Thanks for your suggestions. > > > >Quality control is not exactly what I am looking for: I would like to compare > >the experimental noise compared to the "biological signal". > > > >I agree that fold change is not a great measure, and of course I use a > >statisticaly > >robust method for comparing the INTER variance (cyber-T). So I am > >confident about > >the DEGs I find. What I am more concerned about are the trues DEGs > >that I do _not_ > >find because of the experimental noise. And, if the experimental noise > >is of the same > >order of magnitude as my biological signal, I guess my conclusions > >would not be very meaningful. (am I right?) > > > >So, to compare the INTRA VS. INTER, I looked at the number of genes > >found above > >different fold change thresholds, between samples in the same or in > >different groups. (I used fold change because I have only three > >replicates so I can only do pairwise comparisons). Obviously this > >method has important limits but it is to get an idea. > > > >I was wondering if there was an established standart procedure to check this. > > > >I hope I made my thoughts clearer and that you can point me to something. > > > >Best wishes, > > > >Emmanuel > > > > > > > > > You should look at some quality control measures for your arrays. > > > > > If > > > all is well, then you should use a statistical measure of > > > differential expression. There are several available in > > > Bioconductor. I usually use Limma. Others like multtest, samr > > or siggenes. > > > > > > --Naomi > > > > > > At 03:23 PM 2/5/2007, you wrote: > > > >Dear All, > > > > > > > >I've got two conditions and three replicates per condition: > > > >A1 A2 A3 B1 B2 B3 > > > > > > > >To test the INTRA VS INTER group variance, I compared the fold changes > > > >within group and between groups: > > > > > > > >length(which(A1/A2 > 5))=686 > > > >length(which(A1/B1 > 5))=708 > > > > > > > >The fact that this is similar is quite scary! What do you think? > > > > > > > >Do you know of a package that would show somehow that the noise > > found above > > > >should not prevent me from getting meaningful results with these data? > > > > > > > >Many thanks in advance for your help, > > > > > > > >Emmanuel > > > > > > > >_______________________________________________ > > > >Bioconductor mailing list > > > >Bioconductor at stat.math.ethz.ch > > > >https://stat.ethz.ch/mailman/listinfo/bioconductor > > > >Search the archives: > > > >http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > Naomi S. Altman 814-865-3791 (voice) > > > Associate Professor > > > Dept. of Statistics 814-863-7114 (fax) > > > Penn State University 814-865-1348 (Statistics) > > > University Park, PA 16802-2111 > > > > > > > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor at stat.math.ethz.ch > >https://stat.ethz.ch/mailman/listinfo/bioconductor > >Search the archives: > >http://news.gmane.org/gmane.science.biology.informatics.conductor > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > >

ADD REPLY • link 17.2 years ago Emmanuel Levy ▴ 270

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.0 years ago

United States

I mostly work on problems with very high rates of differential expression. If the rate is low then the variation will be comparable. Clarification: qvalue uses p-values from any differential expression package, not just CyberT. --Naomi At 11:57 AM 2/6/2007, Emmanuel Levy wrote: >Dear Naomi, > >Thanks a lot, I guess this is what I was looking for :) >Thanks for the summary too. > >Typically do you know how common it is that INTER and INTRA variations >are comparable? > >Best, > >Emmanuel > > >On 2/6/07, Naomi Altman <naomi at="" stat.psu.edu=""> wrote: > > CyberT compares the experimental noise to the biological > > signal. Statistically significant genes are those that have signal > > higher than noise. > > > > I think that you are asking about the false detection and > > nondetection rates. False nondetection will be high if the noise is > > high. A rough estimate of the false nondetection rate (but not the > > contributing genes) can be made using the qvalue package. > > > > qvalue uses the p-values from cyberT to estimate FDR. En route, it > > estimate pi-0, the percentage of genes that do NOT differentially express. > > (1-pi-0)x Ngenes = estimated number of genes that do > differentially express. > > > > Subtract from this the estimated number of truly differentially > > expressed genes you have detected (1-FDR) x N significant. You now > > have a rough estimate of how many you missed. But realistically, the > > more noise in the data, the rougher this estimate is, too. > > > > --Naomi > > > > At 10:57 AM 2/6/2007, Emmanuel Levy wrote: > > >Dear James and Naomi, > > > > > >Thanks for your suggestions. > > > > > >Quality control is not exactly what I am looking for: I would > like to compare > > >the experimental noise compared to the "biological signal". > > > > > >I agree that fold change is not a great measure, and of course I use a > > >statisticaly > > >robust method for comparing the INTER variance (cyber-T). So I am > > >confident about > > >the DEGs I find. What I am more concerned about are the trues DEGs > > >that I do _not_ > > >find because of the experimental noise. And, if the experimental noise > > >is of the same > > >order of magnitude as my biological signal, I guess my conclusions > > >would not be very meaningful. (am I right?) > > > > > >So, to compare the INTRA VS. INTER, I looked at the number of genes > > >found above > > >different fold change thresholds, between samples in the same or in > > >different groups. (I used fold change because I have only three > > >replicates so I can only do pairwise comparisons). Obviously this > > >method has important limits but it is to get an idea. > > > > > >I was wondering if there was an established standart procedure > to check this. > > > > > >I hope I made my thoughts clearer and that you can point me to something. > > > > > >Best wishes, > > > > > >Emmanuel > > > > > > > > > > > > > You should look at some quality control measures for your arrays. > > > > > > > If > > > > all is well, then you should use a statistical measure of > > > > differential expression. There are several available in > > > > Bioconductor. I usually use Limma. Others like multtest, samr > > > or siggenes. > > > > > > > > --Naomi > > > > > > > > At 03:23 PM 2/5/2007, you wrote: > > > > >Dear All, > > > > > > > > > >I've got two conditions and three replicates per condition: > > > > >A1 A2 A3 B1 B2 B3 > > > > > > > > > >To test the INTRA VS INTER group variance, I compared the fold changes > > > > >within group and between groups: > > > > > > > > > >length(which(A1/A2 > 5))=686 > > > > >length(which(A1/B1 > 5))=708 > > > > > > > > > >The fact that this is similar is quite scary! What do you think? > > > > > > > > > >Do you know of a package that would show somehow that the noise > > > found above > > > > >should not prevent me from getting meaningful results with these data? > > > > > > > > > >Many thanks in advance for your help, > > > > > > > > > >Emmanuel > > > > > > > > > >_______________________________________________ > > > > >Bioconductor mailing list > > > > >Bioconductor at stat.math.ethz.ch > > > > >https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > >Search the archives: > > > > >http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > Naomi S. Altman 814-865-3791 (voice) > > > > Associate Professor > > > > Dept. of Statistics 814-863-7114 (fax) > > > > Penn State University 814-865-1348 (Statistics) > > > > University Park, PA 16802-2111 > > > > > > > > > > > > > >_______________________________________________ > > >Bioconductor mailing list > > >Bioconductor at stat.math.ethz.ch > > >https://stat.ethz.ch/mailman/listinfo/bioconductor > > >Search the archives: > > >http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > Naomi S. Altman 814-865-3791 (voice) > > Associate Professor > > Dept. of Statistics 814-863-7114 (fax) > > Penn State University 814-865-1348 (Statistics) > > University Park, PA 16802-2111 > > > > > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 17.2 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Emmanuel Levy ▴ 270

@emmanuel-levy-1240

Last seen 6.3 years ago

Dear James and Naomi, Thanks for your suggestions. Quality control is not exactly what I am looking for: I would like to compare the experimental noise compared to the "biological signal". I agree that fold change is not a great measure, and of course I use a statisticaly robust method for comparing the INTER variance (cyber-T). So I am confident about the DEGs I find. What I am more concerned about are the trues DEGs that I do _not_ find because of the experimental noise. And, if the experimental noise is of the same order of magnitude as my biological signal, I guess my conclusions would not be very meaningful. (am I right?) So, to compare the INTRA VS. INTER, I looked at the number of genes found above different fold change thresholds, between samples in the same or in different groups. (I used fold change because I have only three replicates so I can only do pairwise comparisons). Obviously this method has important limits but it is to get an idea. I was wondering if there was an established standart procedure to check this. I hope I made my thoughts clearer and that you can point me to something. Best wishes, Emmanuel > You should look at some quality control measures for your arrays. > If > all is well, then you should use a statistical measure of > differential expression. There are several available in > Bioconductor. I usually use Limma. Others like multtest, samr or siggenes. > > --Naomi > > At 03:23 PM 2/5/2007, you wrote: > >Dear All, > > > >I've got two conditions and three replicates per condition: > >A1 A2 A3 B1 B2 B3 > > > >To test the INTRA VS INTER group variance, I compared the fold changes > >within group and between groups: > > > >length(which(A1/A2 > 5))=686 > >length(which(A1/B1 > 5))=708 > > > >The fact that this is similar is quite scary! What do you think? > > > >Do you know of a package that would show somehow that the noise found above > >should not prevent me from getting meaningful results with these data? > > > >Many thanks in advance for your help, > > > >Emmanuel > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor at stat.math.ethz.ch > >https://stat.ethz.ch/mailman/listinfo/bioconductor > >Search the archives: > >http://news.gmane.org/gmane.science.biology.informatics.conductor > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > >

ADD COMMENT • link 17.2 years ago Emmanuel Levy ▴ 270

Login before adding your answer.