Question: LIMMA : design (1, 2, 3, 3 ) , I gotEXCITINGresults, what could be the logic, since i h
0
14.5 years ago by
Marcus Davy680
Marcus Davy680 wrote:
microarray go affy limma ebarrays • 688 views
modified 14.5 years ago by Marcus150 • written 14.5 years ago by Marcus Davy680
Answer: LIMMA : design (1, 2, 3, 3 ) , I gotEXCITINGresults, what could be the logic, si
0
14.5 years ago by
Marcus150
Marcus150 wrote:
Hello all users. I have a question regarding the mean calculations of the M-values in LIMMA. I guess that the fit$coeff is the mean of the M-values used for the linear model. The fit$coeff has the mean value of the data derived from a specific RNA source (as defined in the design matrix), and the value in fit$coeff[1] is the same as mean(MS[1,1:2]) (if I for example had Sample 1 on 2 arrays in my matrix containing the data. So...if you take the mean of two values (in the log 2 scale), for example M = 8 and M = 1, the mean (and hence the fit$coef ?) will be 4,5. If you want to look at the foldchange I guess that 2^fit$coeff is correctly calculated, so for the example it will be 2^4,5 = 22,6 times upregulated. But if you look at the values independently, M=8 will give 2^8 = 256 times, and 2^1 = 2 times upregulation. The mean of these values are (256 + 2) / 2 = 129 times. I know that the question is a bit naive, but how should one do when you take the mean of logarithms since the numbers are not related to each other as normal numbers are. E.g. the number 8 is not twice the size of 4 on a logarithmic scale, it is 10000 times more (on a log10 scale). So....how should one do, when I want to take the average of log values? Shouldn't I calculate the ratios back (not in log2 scale) and calculate the mean, and transform the data back, If I would like to have an average M value? Regards Marcus Marcus Gry Bj?rklund Royal Institute of Technology AlbaNova University Center Department of Molecular Biotechnology 106 91 Stockholm, Sweden Phone (office): +46 8 553 783 44 Fax: + 46 8 553 784 81 Visiting address: Roslagstullsbacken 21, Floor 3 Delivery address: Roslagsv?gen 30B Web: http://www.biotech.kth.se/molbio/microarray/index.html -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Marcus Davy Sent: Wednesday, April 27, 2005 15:41 To: ramasamy@cancer.org.uk; naomi@stat.psu.edu; saurin_jani@yahoo.com Cc: bioconductor@stat.math.ethz.ch Subject: Re: [BioC] LIMMA : design (1, 2, 3, 3 ) , I gotEXCITINGresults, what could be the logic, since i h Im another in agreement that you probably need to increase your biological replication of microarray slides. Any microarray facility has a certain experimental sensitivity. If the amount of information you have is low then the actual distribution of the observed proportion of false discoveries under simulation can become highly variable as pi_0, the proportion of truely equilalently expressing genes tends to 1 (depending on your cutoff alpha level), remembering that FDR control is an expectation for any realization of an experiment. So your list of differentially expressing genes could contain many (or few) false discoveries but in the long run should maintain control at ~pi_0*alpha under nonadaptive FDR control. Saurin, you mention that you have a large list of differentially expressing genes using a nonadaptive FDR cutoff of alpha=0.01. As an alternative approach you could analyze some of the chip comparisons of interest using EBarrays (after checking distribution assumptions), which models the entire exprSet using hierarchal Gamma Gamma Bernoulli, or Log Normal Normal Bernoulli models, modelling the null and alternate distributions as a mixture with unknown mixing proportion p. The advantage of EBarrays is that it can analyse a single cDNA two channel array or two affymetrix single channel arrays, providing reasonable estimates of pi_0. The same problem applies though with a lack of sensitivity without biological replication (buts it better than fold change at least). You could see *if* EBarrays predicts a value of pi_0 which is moderately less than 1 for comparisons of interest. Marcus >>> Naomi Altman <naomi@stat.psu.edu> 27/04/2005 4:55:11 a.m. >>> Significance should be based on biological replication. If the 2 chips for group 3 are technical replicates, then the variance estimate for the test is probably too small. In theory, statistical tests need only 2 replicate in a single condition, as the null distribution accounts for the number of replicates. However, for this theory to hold, the normality of the samples must be pretty good. When the data are exactly normally distributed (and the assumptions for limma for the distribution of variance hold) then the FDR values should be pretty good, but the FNR will be poor (as you have no power). However, I don't think anyone believes that microarray data are normally distributed. So, I would not really trust these results, even if you have a biological replicate. Of course the 2-fold rule is even worse, so really you should do more biological replication. --Naomi At 09:51 PM 4/26/2005, Saurin Jani wrote: >Hi Adai, > >Yes, you are right. I have 4 samples : > >Group1 = Growth Effect for Day 1 : 1 Affy GeneChip. >Group2 = Growth Effect for Day 2 : 1 Affy GeneChip. >Group3 = Growth Effect for Day 3 : 2 Affy GeneChips. > >so, my design matrix is: >design <- model.matrix(~ -1+factor(c(1,2,3,3))); > >LIMMA did not give any error or waring even it has 1 >sample per group...! ( I thought similar thing, since >it needs technical replicates per group to make a >decision). The results are very interesting. I have >many genes for 0.01 FDR, which is very good. > >Somehow,I don't understand the logic. Do you think is >this a valid design? Or You think I should go by Fold >Change Logic. Please, let me know. > >Thank you very much, >Saurin > > > > > >--- Adaikalavan Ramasamy <ramasamy@cancer.org.uk> >wrote: > > PLEASE correct me if I am wrong. > > > > You have a total of 4 samples that could be > > classified into one of 3 > > groups ? How do you plan on distinguishing > > biological from technical > > variation ? Shouldn't limma come with some sort of > > warning or error if > > there are only one sample per group ? > > > > Regards, Adai > > > > > > > > On Tue, 2005-04-26 at 10:01 -0700, Saurin Jani > > wrote: > > > Hi BioC, > > > > > > I have 3 groups but I have only 2 replicates for > > last > > > group. so, group 1 and 2 has only one Affy CEL > > file. I > > > Did..LIMMA as below and I got some Exciting > > results: > > > > > > #---------------------------------- > > > design <- model.matrix(~ -1+factor(c(1,2,3,3))); > > > colnames(design) <- c("g1","g2","g3"); > > > fit <- lmFit(myRMA,design); > > > > > > contrast.matrix <- > > > makeContrasts(g1-g2,g1-g3,g2-g3,levels = design); > > > > > > fit2 <- contrasts.fit(fit,contrast.matrix); > > > fit2 <- eBayes(fit2); > > > > > > results <- > > > decideTests(fit2,adjust="fdr",p.value=0.01); > > > > > > myGenes <- geneNames(myRMA); > > > i <- apply(results,c(1,2),all); > > > > > > a <- i[,1]; > > > b <- i[,2]; > > > c <- i[,3]; > > > tempgenes1 <- myGenes[a]; > > > tempgenes2 <- myGenes[b]; > > > tempgenes3 <- myGenes[c]; > > > > > > tempall <- c(tempgenes1,tempgenes2,tempgenes3); > > > myDEGenes <- tempall; > > > > > > esetSub2X <- MatrixRMA[myDEGenes,]; > > > esetSub2 <- new("exprSet",exprs = esetSub2X); > > > pData(esetSub2) <- pData(myRMA); > > > heatmap(esetSub2X); > > > #---------------------------------- > > > > > > I got EXCITING results, what could be the > > logic,since > > > i have 2 replicates for 3rd group only ? > > > > > > Could anyone point me out ? > > > > > > I highly appreciate your help , Thank you in > > advance. > > > > > > Thank you, > > > Saurin > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111 _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor ______________________________________________________ The contents of this e-mail are privileged and/or confidenti...{{dropped}} _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor ADD COMMENTlink written 14.5 years ago by Marcus150 Hi all. I have encountered the same problem. In LIMMA it is possible to handle two levels of replicates. You can use duplicateCorrelation for one level (technical replicates or duplicate spots) and use the rest as biological replicates to fit your model. But say that I have another level of replicates. I have replicate spots, technical replicates and biological replicates. I guess the right thing to do is to average over the replicate spots and use duplicate correlation for the technical replicates. Here I started wondering since limma, when calculating a contrast between two samples uses the arithmetic mean on the M-values which is the same as taking the geometric mean on the fold-changes and then taking the logarithm of that value, or ?!? Recall laws of logarithms: log(xy) = log(x) + log(y) log(x^n) = n*log(x) This means that if I take (log(M1)+log(M2)+log(M3))/3 this is the same as taking log((M1*M2*M3)^(1/3)) which is the same as taking the geometric mean on the fold changes and then taking the logarithm of that value. I wonder, can one motivate using geometric mean on expression data instead of arithmetic? See http://www.math.toronto.edu/mathnet/questionCorner/geomean.html for a nice tip of when to use what mean... For me is seem like one should, if you want to take a mean of M-values in an expression experiment, remove the logarithm, calculate the average fold change and them use the logarithm of desire on that value. Comments appreciated to a guy with limited math-skills being out on deep water.... // Johan L -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of marcus Sent: Wednesday, April 27, 2005 5:02 PM To: bioconductor@stat.math.ethz.ch Subject: [BioC] A question regarding the mean of M-values. Hello all users. I have a question regarding the mean calculations of the M-values in LIMMA. I guess that the fit$coeff is the mean of the M-values used for the linear model. The fit$coeff has the mean value of the data derived from a specific RNA source (as defined in the design matrix), and the value in fit$coeff[1] is the same as mean(MS[1,1:2]) (if I for example had Sample 1 on 2 arrays in my matrix containing the data. So...if you take the mean of two values (in the log 2 scale), for example M = 8 and M = 1, the mean (and hence the fit$coef ?) will be 4,5. If you want to look at the foldchange I guess that 2^fit$coeff is correctly calculated, so for the example it will be 2^4,5 = 22,6 times upregulated. But if you look at the values independently, M=8 will give 2^8 = 256 times, and 2^1 = 2 times upregulation. The mean of these values are (256 + 2) / 2 = 129 times. I know that the question is a bit naive, but how should one do when you take the mean of logarithms since the numbers are not related to each other as normal numbers are. E.g. the number 8 is not twice the size of 4 on a logarithmic scale, it is 10000 times more (on a log10 scale). So....how should one do, when I want to take the average of log values? Shouldn't I calculate the ratios back (not in log2 scale) and calculate the mean, and transform the data back, If I would like to have an average M value? Regards Marcus Marcus Gry Bj?rklund Royal Institute of Technology AlbaNova University Center Department of Molecular Biotechnology 106 91 Stockholm, Sweden Phone (office): +46 8 553 783 44 Fax: + 46 8 553 784 81 Visiting address: Roslagstullsbacken 21, Floor 3 Delivery address: Roslagsv?gen 30B Web: http://www.biotech.kth.se/molbio/microarray/index.html -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Marcus Davy Sent: Wednesday, April 27, 2005 15:41 To: ramasamy@cancer.org.uk; naomi@stat.psu.edu; saurin_jani@yahoo.com Cc: bioconductor@stat.math.ethz.ch Subject: Re: [BioC] LIMMA : design (1, 2, 3, 3 ) , I gotEXCITINGresults, what could be the logic, since i h Im another in agreement that you probably need to increase your biological replication of microarray slides. Any microarray facility has a certain experimental sensitivity. If the amount of information you have is low then the actual distribution of the observed proportion of false discoveries under simulation can become highly variable as pi_0, the proportion of truely equilalently expressing genes tends to 1 (depending on your cutoff alpha level), remembering that FDR control is an expectation for any realization of an experiment. So your list of differentially expressing genes could contain many (or few) false discoveries but in the long run should maintain control at ~pi_0*alpha under nonadaptive FDR control. Saurin, you mention that you have a large list of differentially expressing genes using a nonadaptive FDR cutoff of alpha=0.01. As an alternative approach you could analyze some of the chip comparisons of interest using EBarrays (after checking distribution assumptions), which models the entire exprSet using hierarchal Gamma Gamma Bernoulli, or Log Normal Normal Bernoulli models, modelling the null and alternate distributions as a mixture with unknown mixing proportion p. The advantage of EBarrays is that it can analyse a single cDNA two channel array or two affymetrix single channel arrays, providing reasonable estimates of pi_0. The same problem applies though with a lack of sensitivity without biological replication (buts it better than fold change at least). You could see *if* EBarrays predicts a value of pi_0 which is moderately less than 1 for comparisons of interest. Marcus >>> Naomi Altman <naomi@stat.psu.edu> 27/04/2005 4:55:11 a.m. >>> Significance should be based on biological replication. If the 2 chips for group 3 are technical replicates, then the variance estimate for the test is probably too small. In theory, statistical tests need only 2 replicate in a single condition, as the null distribution accounts for the number of replicates. However, for this theory to hold, the normality of the samples must be pretty good. When the data are exactly normally distributed (and the assumptions for limma for the distribution of variance hold) then the FDR values should be pretty good, but the FNR will be poor (as you have no power). However, I don't think anyone believes that microarray data are normally distributed. So, I would not really trust these results, even if you have a biological replicate. Of course the 2-fold rule is even worse, so really you should do more biological replication. --Naomi At 09:51 PM 4/26/2005, Saurin Jani wrote: >Hi Adai, > >Yes, you are right. I have 4 samples : > >Group1 = Growth Effect for Day 1 : 1 Affy GeneChip. >Group2 = Growth Effect for Day 2 : 1 Affy GeneChip. >Group3 = Growth Effect for Day 3 : 2 Affy GeneChips. > >so, my design matrix is: >design <- model.matrix(~ -1+factor(c(1,2,3,3))); > >LIMMA did not give any error or waring even it has 1 >sample per group...! ( I thought similar thing, since >it needs technical replicates per group to make a >decision). The results are very interesting. I have >many genes for 0.01 FDR, which is very good. > >Somehow,I don't understand the logic. Do you think is >this a valid design? Or You think I should go by Fold >Change Logic. Please, let me know. > >Thank you very much, >Saurin > > > > > >--- Adaikalavan Ramasamy <ramasamy@cancer.org.uk> >wrote: > > PLEASE correct me if I am wrong. > > > > You have a total of 4 samples that could be > > classified into one of 3 > > groups ? How do you plan on distinguishing > > biological from technical > > variation ? Shouldn't limma come with some sort of > > warning or error if > > there are only one sample per group ? > > > > Regards, Adai > > > > > > > > On Tue, 2005-04-26 at 10:01 -0700, Saurin Jani > > wrote: > > > Hi BioC, > > > > > > I have 3 groups but I have only 2 replicates for > > last > > > group. so, group 1 and 2 has only one Affy CEL > > file. I > > > Did..LIMMA as below and I got some Exciting > > results: > > > > > > #---------------------------------- > > > design <- model.matrix(~ -1+factor(c(1,2,3,3))); > > > colnames(design) <- c("g1","g2","g3"); > > > fit <- lmFit(myRMA,design); > > > > > > contrast.matrix <- > > > makeContrasts(g1-g2,g1-g3,g2-g3,levels = design); > > > > > > fit2 <- contrasts.fit(fit,contrast.matrix); > > > fit2 <- eBayes(fit2); > > > > > > results <- > > > decideTests(fit2,adjust="fdr",p.value=0.01); > > > > > > myGenes <- geneNames(myRMA); > > > i <- apply(results,c(1,2),all); > > > > > > a <- i[,1]; > > > b <- i[,2]; > > > c <- i[,3]; > > > tempgenes1 <- myGenes[a]; > > > tempgenes2 <- myGenes[b]; > > > tempgenes3 <- myGenes[c]; > > > > > > tempall <- c(tempgenes1,tempgenes2,tempgenes3); > > > myDEGenes <- tempall; > > > > > > esetSub2X <- MatrixRMA[myDEGenes,]; > > > esetSub2 <- new("exprSet",exprs = esetSub2X); > > > pData(esetSub2) <- pData(myRMA); > > > heatmap(esetSub2X); > > > #---------------------------------- > > > > > > I got EXCITING results, what could be the > > logic,since > > > i have 2 replicates for 3rd group only ? > > > > > > Could anyone point me out ? > > > > > > I highly appreciate your help , Thank you in > > advance. > > > > > > Thank you, > > > Saurin > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111 _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor ______________________________________________________ The contents of this e-mail are privileged and/or\ confident...{{dropped}}