Filtering before differential expression analysis of microarrays - New paper out (James W. MacDonald)
2
0
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 6 weeks ago
United States
Hi Sherosha, In general, you can filter by subsetting a MArrayLM object the exact same way as you would an ExpressionSet object. If you have any trouble, please post the code that you are trying to use. Cheers, Jenny At 10:47 AM 1/13/2009, Sherosha Raj wrote: >Hello all > >I"m sorry if this is a simple question, but how does one go about >filtering after the eBayes step since the resulting object is of the >class MArrayLM? >I am used to filtering expression sets directly. > >Thank you very much! >Sherosha > > > > ---------- Forwarded message ---------- > > From: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> > > To: Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> > > Date: Mon, 12 Jan 2009 09:25:02 -0500 > > Subject: Re: [BioC] Filtering before differential expression > analysis of microarrays - New paper out > > Hi Dan, > > > > Daniel Brewer wrote: > >> > >> Hi, > >> > >> There is a new paper out at BMC bioinformatics that seems to justify the > >> use of filtering before differential expression analysis is performed > >> (Hackstadt & Hess BMC Bioinformatics 2009, 10:11 - > >> http://www.biomedcentral.com/1471-2105/10/11/abstract). Specifically > >> filtering by variance and detection call. I have got the impression > >> from this list that the general opinion is that one should only filter > >> out the control genes before testing. I was wondering if anyone had any > >> opinions on this paper and the topic in general. > > > > I'm sure people do have opinions about this topic ;-D > > > > The reason people have so many opinions is because it isn't a > simple question, and it depends on what you consider important. > > > > If you are just trying to limit the number of multiple > comparisons to increase power, then filtering first is probably the way to go. > > > > If you are concerned with the accuracy of the FDR estimates, then > filtering first may not be ideal. > > > > If you are using limma (Hackstadt and Hess used multtest), then > you should filter after the eBayes step but before the FDR step, as > an assumption of the eBayes step is that all of the data from the > chip are available. > > > > Unless of course you are concerned about the accuracy of the FDR > estimates, in which case... well you see the point. > > > > With microarray data analysis the arguments for and against a > particular way of doing things can shed more heat than light, as > nobody really knows the underlying truth, and the measures we use > are really far removed from the actual phenomenon we are testing. > > > > Best, > > > > Jim > > > > > >> > >> Many thanks > >> > >> Dan > >> > > > > -- > > James W. MacDonald, M.S. > > Biostatistician > > Hildebrandt Lab > > 8220D MSRB III > > 1150 W. Medical Center Drive > > Ann Arbor MI 48109-5646 > > 734-936-8662 > > > > > > > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at illinois.edu
Microarray GO limma Microarray GO limma • 997 views
ADD COMMENT
0
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 6 weeks ago
United States
Hi Sherosha, The description of genefilter() says: genefilter filters genes in the array expr using the filter functions in flist. It returns an array of logical values (suitable for subscripting) of the same length as there are rows in expr. For each row of expr the returned value is TRUE if the row passed all the filter functions. Otherwise it is set to FALSE. Your output object "selected" is just a vector of TRUEs and FALSEs, so I'm assuming you used it this way if you were going to filter BEFORE the statistical analysis: > selected=genefilter(all.esetsub[,61:102],ff) > fit=lmFit(all.esetsub[selected,61:102],design) If you want use the same filters, which are selecting genes based on the normalized data, but not filter the genes out until after the analysis, you would do: > selected=genefilter(all.esetsub[,61:102],ff) #same as above > fit=lmFit(all.esetsub[,61:102],design) > fit.filtered <- fit[selected,] HTH, Jenny At 12:09 PM 1/13/2009, Sherosha Raj wrote: >Hello Jenny > >This is how I setup the filters: > >#setup filters > > f1=pOverA(0.25,log2(100)) > > f2=function(x)(IQR(x)>0.5) > > ff=filterfun(f1,f2) > >Around here I sub-select the probesets that come through the filter >from my expression set. > then proceed to > >#LIMMA > > >targets=readTargets("targets.txt",sep="") > > WD=paste(targets$.....) > > WD=factor(WD,levels=c(".........")) > > design=model.matrix(~0+WD) > > colnames(design)=levels(WD) > > fit=lmFit(all.esetsub[,61:102],design) #from a large eset > normalised over 102 chips so subsetting the relevant cel files > >#Contrast matrix > > contmatrix=makeContrasts(.........,levels=design) > > >fit2=contrasts.fit(fit,contmatrix) > >If I were to filter here using the two filters above...... > > >selected=genefilter(fit2,ff) > > sum(selected) >[1] 0 > > class(fit2q) >[1] "MArrayLM" >attr(,"package") >[1] "limma" > >#When I filter before starting limma, I get 11504 probesets coming through. >#I am confused how to proceed with the next steps....(i.e subset the >fit2 object and apply the eBayes)..:-( > >#previously proceeded as follows after the "contrasts.fit" step: > >fit2=eBayes(fit2) > > changinggenes.05=decideTests(fit2,adjust.method="BH",p.value=0.05) > >etc etc > > >I have been previously using filters before limma, but I 've been >following the discussions on this board and would try to see how the >data looks if I filtered prior o the eBayes step. > > >Any help is greatly appreciated!! >Thank you very much! >Regards, >Sherosha > >2009/1/13 Jenny Drnevich <drnevich at="" illinois.edu="">: > > Hi Sherosha, > > > > In general, you can filter by subsetting a MArrayLM object the exact same > > way as you would an ExpressionSet object. If you have any trouble, please > > post the code that you are trying to use. > > > > Cheers, > > Jenny > > > > At 10:47 AM 1/13/2009, Sherosha Raj wrote: > >> > >> Hello all > >> > >> I"m sorry if this is a simple question, but how does one go about > >> filtering after the eBayes step since the resulting object is of the > >> class MArrayLM? > >> I am used to filtering expression sets directly. > >> > >> Thank you very much! > >> Sherosha > >> > > >> > ---------- Forwarded message ---------- > >> > From: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> > >> > To: Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> > >> > Date: Mon, 12 Jan 2009 09:25:02 -0500 > >> > Subject: Re: [BioC] Filtering before differential expression analysis of > >> > microarrays - New paper out > >> > Hi Dan, > >> > > >> > Daniel Brewer wrote: > >> >> > >> >> Hi, > >> >> > >> >> There is a new paper out at BMC bioinformatics that seems to justify > >> >> the > >> >> use of filtering before differential expression analysis is performed > >> >> (Hackstadt & Hess BMC Bioinformatics 2009, 10:11 - > >> >> http://www.biomedcentral.com/1471-2105/10/11/abstract). Specifically > >> >> filtering by variance and detection call. I have got the impression > >> >> from this list that the general opinion is that one should only filter > >> >> out the control genes before testing. I was wondering if anyone had > >> >> any > >> >> opinions on this paper and the topic in general. > >> > > >> > I'm sure people do have opinions about this topic ;-D > >> > > >> > The reason people have so many opinions is because it isn't a simple > >> > question, and it depends on what you consider important. > >> > > >> > If you are just trying to limit the number of multiple comparisons to > >> > increase power, then filtering first is probably the way to go. > >> > > >> > If you are concerned with the accuracy of the FDR estimates, then > >> > filtering first may not be ideal. > >> > > >> > If you are using limma (Hackstadt and Hess used multtest), then you > >> > should filter after the eBayes step but before the FDR step, as an > >> > assumption of the eBayes step is that all of the data from the chip are > >> > available. > >> > > >> > Unless of course you are concerned about the accuracy of the FDR > >> > estimates, in which case... well you see the point. > >> > > >> > With microarray data analysis the arguments for and against a particular > >> > way of doing things can shed more heat than light, as nobody > really knows > >> > the underlying truth, and the measures we use are really far > removed from > >> > the actual phenomenon we are testing. > >> > > >> > Best, > >> > > >> > Jim > >> > > >> > > >> >> > >> >> Many thanks > >> >> > >> >> Dan > >> >> > >> > > >> > -- > >> > James W. MacDonald, M.S. > >> > Biostatistician > >> > Hildebrandt Lab > >> > 8220D MSRB III > >> > 1150 W. Medical Center Drive > >> > Ann Arbor MI 48109-5646 > >> > 734-936-8662 > >> > > >> > > >> > > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > Jenny Drnevich, Ph.D. > > > > Functional Genomics Bioinformatics Specialist > > W.M. Keck Center for Comparative and Functional Genomics > > Roy J. Carver Biotechnology Center > > University of Illinois, Urbana-Champaign > > > > 330 ERML > > 1201 W. Gregory Dr. > > Urbana, IL 61801 > > USA > > > > ph: 217-244-7355 > > fax: 217-265-5066 > > e-mail: drnevich at illinois.edu > >
ADD COMMENT
0
Entering edit mode
Hi Jenny Thank you very much for your help. It works now. These analyses are so interesting I'm still learning :-). Regards, Sherosha 2009/1/13 Jenny Drnevich <drnevich at="" illinois.edu="">: > Hi Sherosha, > > The description of genefilter() says: > > genefilter filters genes in the array expr using the filter functions in > flist. It returns an array of logical values (suitable for subscripting) of > the same length as there are rows in expr. For each row of expr the returned > value is TRUE if the row passed all the filter functions. Otherwise it is > set to FALSE. > > Your output object "selected" is just a vector of TRUEs and FALSEs, so I'm > assuming you used it this way if you were going to filter BEFORE the > statistical analysis: > >> selected=genefilter(all.esetsub[,61:102],ff) > >> fit=lmFit(all.esetsub[selected,61:102],design) > > If you want use the same filters, which are selecting genes based on the > normalized data, but not filter the genes out until after the analysis, you > would do: > >> selected=genefilter(all.esetsub[,61:102],ff) #same as above > >> fit=lmFit(all.esetsub[,61:102],design) > >> fit.filtered <- fit[selected,] > > HTH, > Jenny > > > At 12:09 PM 1/13/2009, Sherosha Raj wrote: >> >> Hello Jenny >> >> This is how I setup the filters: >> >> #setup filters >> > f1=pOverA(0.25,log2(100)) >> > f2=function(x)(IQR(x)>0.5) >> > ff=filterfun(f1,f2) >> >> Around here I sub-select the probesets that come through the filter >> from my expression set. >> then proceed to >> >> #LIMMA >> >> >targets=readTargets("targets.txt",sep="") >> > WD=paste(targets$.....) >> > WD=factor(WD,levels=c(".........")) >> > design=model.matrix(~0+WD) >> > colnames(design)=levels(WD) >> > fit=lmFit(all.esetsub[,61:102],design) #from a large eset normalised >> > over 102 chips so subsetting the relevant cel files >> >> #Contrast matrix >> > contmatrix=makeContrasts(.........,levels=design) >> >> >fit2=contrasts.fit(fit,contmatrix) >> >> If I were to filter here using the two filters above...... >> >> >selected=genefilter(fit2,ff) >> > sum(selected) >> [1] 0 >> > class(fit2q) >> [1] "MArrayLM" >> attr(,"package") >> [1] "limma" >> >> #When I filter before starting limma, I get 11504 probesets coming >> through. >> #I am confused how to proceed with the next steps....(i.e subset the >> fit2 object and apply the eBayes)..:-( >> >> #previously proceeded as follows after the "contrasts.fit" step: >> >fit2=eBayes(fit2) >> > changinggenes.05=decideTests(fit2,adjust.method="BH",p.value=0.05) >> >> etc etc >> >> >> I have been previously using filters before limma, but I 've been >> following the discussions on this board and would try to see how the >> data looks if I filtered prior o the eBayes step. >> >> >> Any help is greatly appreciated!! >> Thank you very much! >> Regards, >> Sherosha >> >> 2009/1/13 Jenny Drnevich <drnevich at="" illinois.edu="">: >> > Hi Sherosha, >> > >> > In general, you can filter by subsetting a MArrayLM object the exact >> > same >> > way as you would an ExpressionSet object. If you have any trouble, >> > please >> > post the code that you are trying to use. >> > >> > Cheers, >> > Jenny >> > >> > At 10:47 AM 1/13/2009, Sherosha Raj wrote: >> >> >> >> Hello all >> >> >> >> I"m sorry if this is a simple question, but how does one go about >> >> filtering after the eBayes step since the resulting object is of the >> >> class MArrayLM? >> >> I am used to filtering expression sets directly. >> >> >> >> Thank you very much! >> >> Sherosha >> >> > >> >> > ---------- Forwarded message ---------- >> >> > From: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> >> >> > To: Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> >> >> > Date: Mon, 12 Jan 2009 09:25:02 -0500 >> >> > Subject: Re: [BioC] Filtering before differential expression analysis >> >> > of >> >> > microarrays - New paper out >> >> > Hi Dan, >> >> > >> >> > Daniel Brewer wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> There is a new paper out at BMC bioinformatics that seems to justify >> >> >> the >> >> >> use of filtering before differential expression analysis is >> >> >> performed >> >> >> (Hackstadt & Hess BMC Bioinformatics 2009, 10:11 - >> >> >> http://www.biomedcentral.com/1471-2105/10/11/abstract). >> >> >> Specifically >> >> >> filtering by variance and detection call. I have got the impression >> >> >> from this list that the general opinion is that one should only >> >> >> filter >> >> >> out the control genes before testing. I was wondering if anyone had >> >> >> any >> >> >> opinions on this paper and the topic in general. >> >> > >> >> > I'm sure people do have opinions about this topic ;-D >> >> > >> >> > The reason people have so many opinions is because it isn't a simple >> >> > question, and it depends on what you consider important. >> >> > >> >> > If you are just trying to limit the number of multiple comparisons to >> >> > increase power, then filtering first is probably the way to go. >> >> > >> >> > If you are concerned with the accuracy of the FDR estimates, then >> >> > filtering first may not be ideal. >> >> > >> >> > If you are using limma (Hackstadt and Hess used multtest), then you >> >> > should filter after the eBayes step but before the FDR step, as an >> >> > assumption of the eBayes step is that all of the data from the chip >> >> > are >> >> > available. >> >> > >> >> > Unless of course you are concerned about the accuracy of the FDR >> >> > estimates, in which case... well you see the point. >> >> > >> >> > With microarray data analysis the arguments for and against a >> >> > particular >> >> > way of doing things can shed more heat than light, as nobody really >> >> > knows >> >> > the underlying truth, and the measures we use are really far removed >> >> > from >> >> > the actual phenomenon we are testing. >> >> > >> >> > Best, >> >> > >> >> > Jim >> >> > >> >> > >> >> >> >> >> >> Many thanks >> >> >> >> >> >> Dan >> >> >> >> >> > >> >> > -- >> >> > James W. MacDonald, M.S. >> >> > Biostatistician >> >> > Hildebrandt Lab >> >> > 8220D MSRB III >> >> > 1150 W. Medical Center Drive >> >> > Ann Arbor MI 48109-5646 >> >> > 734-936-8662 >> >> > >> >> > >> >> > >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at stat.math.ethz.ch >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > Jenny Drnevich, Ph.D. >> > >> > Functional Genomics Bioinformatics Specialist >> > W.M. Keck Center for Comparative and Functional Genomics >> > Roy J. Carver Biotechnology Center >> > University of Illinois, Urbana-Champaign >> > >> > 330 ERML >> > 1201 W. Gregory Dr. >> > Urbana, IL 61801 >> > USA >> > >> > ph: 217-244-7355 >> > fax: 217-265-5066 >> > e-mail: drnevich at illinois.edu >> > > > -- Regards, Sherosha
ADD REPLY
0
Entering edit mode
Sherosha Raj ▴ 90
@sherosha-raj-3225
Last seen 9.7 years ago
Hello Jenny This is how I setup the filters: #setup filters > f1=pOverA(0.25,log2(100)) > f2=function(x)(IQR(x)>0.5) > ff=filterfun(f1,f2) Around here I sub-select the probesets that come through the filter from my expression set. then proceed to #LIMMA >targets=readTargets("targets.txt",sep="") > WD=paste(targets$.....) > WD=factor(WD,levels=c(".........")) > design=model.matrix(~0+WD) > colnames(design)=levels(WD) > fit=lmFit(all.esetsub[,61:102],design) #from a large eset normalised over 102 chips so subsetting the relevant cel files #Contrast matrix > contmatrix=makeContrasts(.........,levels=design) >fit2=contrasts.fit(fit,contmatrix) If I were to filter here using the two filters above...... >selected=genefilter(fit2,ff) > sum(selected) [1] 0 > class(fit2q) [1] "MArrayLM" attr(,"package") [1] "limma" #When I filter before starting limma, I get 11504 probesets coming through. #I am confused how to proceed with the next steps....(i.e subset the fit2 object and apply the eBayes)..:-( #previously proceeded as follows after the "contrasts.fit" step: >fit2=eBayes(fit2) > changinggenes.05=decideTests(fit2,adjust.method="BH",p.value=0.05) etc etc I have been previously using filters before limma, but I 've been following the discussions on this board and would try to see how the data looks if I filtered prior o the eBayes step. Any help is greatly appreciated!! Thank you very much! Regards, Sherosha 2009/1/13 Jenny Drnevich <drnevich at="" illinois.edu="">: > Hi Sherosha, > > In general, you can filter by subsetting a MArrayLM object the exact same > way as you would an ExpressionSet object. If you have any trouble, please > post the code that you are trying to use. > > Cheers, > Jenny > > At 10:47 AM 1/13/2009, Sherosha Raj wrote: >> >> Hello all >> >> I"m sorry if this is a simple question, but how does one go about >> filtering after the eBayes step since the resulting object is of the >> class MArrayLM? >> I am used to filtering expression sets directly. >> >> Thank you very much! >> Sherosha >> > >> > ---------- Forwarded message ---------- >> > From: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> >> > To: Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> >> > Date: Mon, 12 Jan 2009 09:25:02 -0500 >> > Subject: Re: [BioC] Filtering before differential expression analysis of >> > microarrays - New paper out >> > Hi Dan, >> > >> > Daniel Brewer wrote: >> >> >> >> Hi, >> >> >> >> There is a new paper out at BMC bioinformatics that seems to justify >> >> the >> >> use of filtering before differential expression analysis is performed >> >> (Hackstadt & Hess BMC Bioinformatics 2009, 10:11 - >> >> http://www.biomedcentral.com/1471-2105/10/11/abstract). Specifically >> >> filtering by variance and detection call. I have got the impression >> >> from this list that the general opinion is that one should only filter >> >> out the control genes before testing. I was wondering if anyone had >> >> any >> >> opinions on this paper and the topic in general. >> > >> > I'm sure people do have opinions about this topic ;-D >> > >> > The reason people have so many opinions is because it isn't a simple >> > question, and it depends on what you consider important. >> > >> > If you are just trying to limit the number of multiple comparisons to >> > increase power, then filtering first is probably the way to go. >> > >> > If you are concerned with the accuracy of the FDR estimates, then >> > filtering first may not be ideal. >> > >> > If you are using limma (Hackstadt and Hess used multtest), then you >> > should filter after the eBayes step but before the FDR step, as an >> > assumption of the eBayes step is that all of the data from the chip are >> > available. >> > >> > Unless of course you are concerned about the accuracy of the FDR >> > estimates, in which case... well you see the point. >> > >> > With microarray data analysis the arguments for and against a particular >> > way of doing things can shed more heat than light, as nobody really knows >> > the underlying truth, and the measures we use are really far removed from >> > the actual phenomenon we are testing. >> > >> > Best, >> > >> > Jim >> > >> > >> >> >> >> Many thanks >> >> >> >> Dan >> >> >> > >> > -- >> > James W. MacDonald, M.S. >> > Biostatistician >> > Hildebrandt Lab >> > 8220D MSRB III >> > 1150 W. Medical Center Drive >> > Ann Arbor MI 48109-5646 >> > 734-936-8662 >> > >> > >> > >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > Jenny Drnevich, Ph.D. > > Functional Genomics Bioinformatics Specialist > W.M. Keck Center for Comparative and Functional Genomics > Roy J. Carver Biotechnology Center > University of Illinois, Urbana-Champaign > > 330 ERML > 1201 W. Gregory Dr. > Urbana, IL 61801 > USA > > ph: 217-244-7355 > fax: 217-265-5066 > e-mail: drnevich at illinois.edu >
ADD COMMENT

Login before adding your answer.

Traffic: 418 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6