Question

LIMMA, FDR and B-statistic

0

Entering edit mode

Jordi Altirriba Gutiérrez ▴ 350

@jordi-altirriba-gutierrez-682

Last seen 5.1 years ago

Hello to everyone!! I've been using RMA to normalize my data and LIMMA to obtain a list of significant genes. My design was a 2x2 factorial design with 4 groups: Diabetic treated, diabetic untreated, health treated and health untreated with 3 biological replicates in each group. I've got the list of significant genes with these commands (thanks again to Gordon!): >design<-model.matrix(~DIABETES*TREATMENT,data=pData(eset)) >fit<-lmFit(eset,design) >contrast.matrix<-makeContrasts(DIABETESTRUE,TREATMENTTRUE,DIABETESTRU E.TREATMENTTRUE,levels=design) >fit2<-contrasts.fit(fit,contrast.matrix) >fit2<-eBayes(fit2) >topTable(fit2, >number=100,genelist=geneNames(eset),coef="DIABETESTRUE",adjust="fdr") Now I've more questions (sorry to bother you all again). 1.- Is it possible to know at what false discovery rate are we working with these 100 genes? (something similar to the median and the 90th percentile of FDR that we obtain with SAM). If so, how can I get to know it ? 2.- When I observe my genelist for the TREATMENT I realize that the first gene of the list has a negative B value (-2.83), however when I obtain the genelist for the TREATMENT.DIABETES, in this case what I get for the top gene is a B value of 14. Is it correct to interpret that the drug only acts in the diabetic animals and in the healthy ones does not induce any difference in the gene expression? 3.- When we work with the p-value, there is an agreement (more or less) that a value <0.05 is significant. Is there an agreement with the B statistic? (I've read "replicated microarray data" of Lönnstdt and Speed and I think that it depends on your data and experiment, but is there any way to determine the cutoff?) Thanks again for your suggestions and patience! Yours sincerely, Jordi Altirriba, PhD student IDIBAPS - Hospital Clinic (Barcelona, Spain) _________________________________________________________________ Get tax tips, tools and access to IRS forms all in one place at MSN Money!

Microarray limma Microarray limma • 1.3k views

ADD COMMENT • link updated 20.1 years ago by Gordon Smyth 50k • written 20.1 years ago by Jordi Altirriba Gutiérrez ▴ 350

score 0 · Answer 1 · 2004-03-23

At 04:23 AM 24/03/2004, Jordi Altirriba Guti?rrez wrote: >Hello to everyone!! >I've been using RMA to normalize my data and LIMMA to obtain a list of >significant genes. My design was a 2x2 factorial design with 4 groups: >Diabetic treated, diabetic untreated, health treated and health untreated >with 3 biological replicates in each group. >I've got the list of significant genes with these commands (thanks again >to Gordon!): >>design<-model.matrix(~DIABETES*TREATMENT,data=pData(eset)) >>fit<-lmFit(eset,design) >>contrast.matrix<-makeContrasts(DIABETESTRUE,TREATMENTTRUE,DIABETESTR UE.TREATMENTTRUE,levels=design) >>fit2<-contrasts.fit(fit,contrast.matrix) >>fit2<-eBayes(fit2) >>topTable(fit2, >>number=100,genelist=geneNames(eset),coef="DIABETESTRUE",adjust="fdr" ) >Now I've more questions (sorry to bother you all again). >1.- Is it possible to know at what false discovery rate are we working >with these 100 genes? (something similar to the median and the 90th >percentile of FDR that we obtain with SAM). If so, how can I get to know it ? No it's not possible. Theoretically the 'fdr' method used by limma means that the adjusted p-value for each gene gives an upper bound on the expected false discovery rate if you were to cut on that value. (The theory assume independence between genes which is certainly not true.) The true false discovery rate is unknowable. My personal feeling is that you really need a set of special purpose control probes, such as spike-ins, even to get a decent unbiased estimator of the false discovery rate. >2.- When I observe my genelist for the TREATMENT I realize that the first >gene of the list has a negative B value (-2.83), however when I obtain the >genelist for the TREATMENT.DIABETES, in this case what I get for the top >gene is a B value of 14. Is it correct to interpret that the drug only >acts in the diabetic animals and in the healthy ones does not induce any >difference in the gene expression? Yes, that seems to be what the data is saying. >3.- When we work with the p-value, there is an agreement (more or less) >that a value <0.05 is significant. Is there an agreement with the B >statistic? (I've read "replicated microarray data" of L?nnstdt and Speed >and I think that it depends on your data and experiment, but is there any >way to determine the cutoff?) I dont' think there is any such agreement in the microarray literature even for p-values. Given all the assumptions, Terry and I are comfortable only with using B-statistics and modified p-values to rank genes rather than as absolute cut offs. I often use B>0, but only in an informal way. See comments in: Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3, No. 1, Article 3. Smyth, G. K., Yang, Y.-H., Speed, T. P. (2003). Statistical issues in microarray data analysis. In: Functional Genomics: Methods and Protocols, M. J. Brownstein and A. B. Khodursky (eds.), Methods in Molecular Biology Volume 224, Humana Press, Totowa, NJ, pages 111-136. Gordon >Thanks again for your suggestions and patience! >Yours sincerely, > >Jordi Altirriba, PhD student >IDIBAPS - Hospital Clinic (Barcelona, Spain)