genefiltering before or after the normalization?

0

Entering edit mode

Abhilash Venu ▴ 340

@abhilash-venu-2680

Last seen 9.7 years ago

Hi list, I am working with single color data from Agilent platform. After the limma analysis the adjusted p values were higher than 5% of FDR. At this instance I am thinking of filtering the genes using genefilter. As my data set contains only raw intensities of normal and test before the normalization, where I am uisng 'normalizeBetweenArrays' command after log transforming the data. In this scenario I am quite confused whether I should use the filter functions prior to normalization of after the normalization but efore fitting the linear model? As my data is not an expressionSet I cannot use the nonfilter commands, in this case any suggestions of using other filtering methods? Appreciate the suggestions -- Regards, Abhilash [[alternative HTML version deleted]]

Normalization genefilter Normalization genefilter • 2.1k views

ADD COMMENT • link updated 15.8 years ago by Jenny Drnevich ★ 2.0k • written 15.8 years ago by Abhilash Venu ▴ 340

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 25 days ago

EMBL European Molecular Biology Laborat…

Hi Abhilash > I am working with single color data from Agilent platform. After the limma > analysis the adjusted p values were higher than 5% of FDR. At this instance > I am thinking of filtering the genes using genefilter. As my data set > contains only raw intensities of normal and test before the normalization, > where I am uisng 'normalizeBetweenArrays' command after log transforming the > data. > In this scenario I am quite confused whether I should use the filter > functions prior to normalization of after the normalization but efore > fitting the linear model? > As my data is not an expressionSet I cannot use the nonfilter commands, in > this case any suggestions of using other filtering methods? > > Appreciate the suggestions > Such filtering is performed after normalisation, but it is essential that the filter criterion does *not use the sample annotations*. E.g. you can use for each gene the overall variance or IQR across the experiment. If x is a matrix with rows=genes and columns=samples, then this can be as simple as: rs = rowSds(x) fx = fx[ rs > quantile(rs, lambda), ] where rowSds is in the genefilter package, and lambda is a parameter between 0 and 1 that contains your belief in what fraction of probes on the array correspond to target molecules that are never expressed in the conditions you study. Also note that after such filtering, strictly speaking, the nominal p-values from the subsequent testing could be too small - but one can show that in typical microarray applications the bias is negligible (compared to the impact of other effects), and in any case the p-values can be used for ranking. Best wishes Wolfgang -- ---------------------------------------------------- Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber

ADD COMMENT • link 15.8 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Dear Dr. Huber, Thank you for the advice. I have tried the script that you have advised to use. As you mentioned I have used the script after the normalization, but that has shown the following error, which I do not understand, whether I am using in the right way. MA<-normalizeBetweenArrays(log2(Rgene$G), method="quantile")# normalization rs = rowSds(MA) fx = fx[ rs > quantile(rs, 0.05), ] Error: object "fx" not found Can you advise me on the same. Thanks in advance. Abhilash On Fri, Jul 11, 2008 at 4:06 AM, Wolfgang Huber <huber@ebi.ac.uk> wrote: > Hi Abhilash > > > I am working with single color data from Agilent platform. After the limma >> analysis the adjusted p values were higher than 5% of FDR. At this >> instance >> I am thinking of filtering the genes using genefilter. As my data set >> contains only raw intensities of normal and test before the normalization, >> where I am uisng 'normalizeBetweenArrays' command after log transforming >> the >> data. >> In this scenario I am quite confused whether I should use the filter >> functions prior to normalization of after the normalization but efore >> fitting the linear model? >> As my data is not an expressionSet I cannot use the nonfilter commands, in >> this case any suggestions of using other filtering methods? >> >> Appreciate the suggestions >> >> > Such filtering is performed after normalisation, but it is essential that > the filter criterion does *not use the sample annotations*. E.g. you can use > for each gene the overall variance or IQR across the experiment. > > If x is a matrix with rows=genes and columns=samples, then this can be as > simple as: > > rs = rowSds(x) > fx = fx[ rs > quantile(rs, lambda), ] > > where rowSds is in the genefilter package, and lambda is a parameter > between 0 and 1 that contains your belief in what fraction of probes on the > array correspond to target molecules that are never expressed in the > conditions you study. > > Also note that after such filtering, strictly speaking, the nominal > p-values from the subsequent testing could be too small - but one can show > that in typical microarray applications the bias is negligible (compared to > the impact of other effects), and in any case the p-values can be used for > ranking. > > Best wishes > Wolfgang > > > -- > ---------------------------------------------------- > Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber > -- Regards, Abhilash [[alternative HTML version deleted]]

ADD REPLY • link 15.8 years ago Abhilash Venu ▴ 340

0

Entering edit mode

On Fri, Jul 11, 2008 at 5:32 AM, Abhilash Venu <abhivenu at="" gmail.com=""> wrote: > Dear Dr. Huber, > > Thank you for the advice. I have tried the script that you have advised to > use. As you mentioned I have used the script after the normalization, but > that has shown the following error, which I do not understand, whether I am > using in the right way. > > MA<-normalizeBetweenArrays(log2(Rgene$G), method="quantile")# normalization > rs = rowSds(MA) > fx = fx[ rs > quantile(rs, 0.05), ] > Error: object "fx" not found Hi, Abhilash. I think that line should read: fx = x[rs > quantile(rs,0.05),] Wolfgang was simply suggesting subsetting x by the results of sd filtering. Sean > Can you advise me on the same. > Thanks in advance. > > Abhilash > > On Fri, Jul 11, 2008 at 4:06 AM, Wolfgang Huber <huber at="" ebi.ac.uk=""> wrote: > >> Hi Abhilash >> >> >> I am working with single color data from Agilent platform. After the limma >>> analysis the adjusted p values were higher than 5% of FDR. At this >>> instance >>> I am thinking of filtering the genes using genefilter. As my data set >>> contains only raw intensities of normal and test before the normalization, >>> where I am uisng 'normalizeBetweenArrays' command after log transforming >>> the >>> data. >>> In this scenario I am quite confused whether I should use the filter >>> functions prior to normalization of after the normalization but efore >>> fitting the linear model? >>> As my data is not an expressionSet I cannot use the nonfilter commands, in >>> this case any suggestions of using other filtering methods? >>> >>> Appreciate the suggestions >>> >>> >> Such filtering is performed after normalisation, but it is essential that >> the filter criterion does *not use the sample annotations*. E.g. you can use >> for each gene the overall variance or IQR across the experiment. >> >> If x is a matrix with rows=genes and columns=samples, then this can be as >> simple as: >> >> rs = rowSds(x) >> fx = fx[ rs > quantile(rs, lambda), ] >> >> where rowSds is in the genefilter package, and lambda is a parameter >> between 0 and 1 that contains your belief in what fraction of probes on the >> array correspond to target molecules that are never expressed in the >> conditions you study. >> >> Also note that after such filtering, strictly speaking, the nominal >> p-values from the subsequent testing could be too small - but one can show >> that in typical microarray applications the bias is negligible (compared to >> the impact of other effects), and in any case the p-values can be used for >> ranking. >> >> Best wishes >> Wolfgang >> >> >> -- >> ---------------------------------------------------- >> Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber >> > > > > -- > > Regards, > Abhilash > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 15.8 years ago Sean Davis 21k

0

Entering edit mode

Hi Sean, Yes, thank you. Yet my problem of the data did not get sorted out. I have tried different filtering methods including gapfilter and a combination of IQR with pOverA or cv etc. But my adj p values are above the FDR limit of 0.05 after the limma analysis. Also B values are generally -3. As Gorden has mentioned in one of the previous mails, this is a indication of little evidance for differential expression. What could be the reason for this. Is this really an indicative of absence of differential expression? Thank you in advance Abhilash On Fri, Jul 11, 2008 at 4:17 PM, Sean Davis <sdavis2@mail.nih.gov> wrote: > On Fri, Jul 11, 2008 at 5:32 AM, Abhilash Venu <abhivenu@gmail.com> wrote: > > Dear Dr. Huber, > > > > Thank you for the advice. I have tried the script that you have advised > to > > use. As you mentioned I have used the script after the normalization, but > > that has shown the following error, which I do not understand, whether I > am > > using in the right way. > > > > MA<-normalizeBetweenArrays(log2(Rgene$G), method="quantile")# > normalization > > rs = rowSds(MA) > > fx = fx[ rs > quantile(rs, 0.05), ] > > Error: object "fx" not found > > Hi, Abhilash. I think that line should read: > > fx = x[rs > quantile(rs,0.05),] > > Wolfgang was simply suggesting subsetting x by the results of sd filtering. > > Sean > > > Can you advise me on the same. > > Thanks in advance. > > > > Abhilash > > > > On Fri, Jul 11, 2008 at 4:06 AM, Wolfgang Huber <huber@ebi.ac.uk> wrote: > > > >> Hi Abhilash > >> > >> > >> I am working with single color data from Agilent platform. After the > limma > >>> analysis the adjusted p values were higher than 5% of FDR. At this > >>> instance > >>> I am thinking of filtering the genes using genefilter. As my data set > >>> contains only raw intensities of normal and test before the > normalization, > >>> where I am uisng 'normalizeBetweenArrays' command after log > transforming > >>> the > >>> data. > >>> In this scenario I am quite confused whether I should use the filter > >>> functions prior to normalization of after the normalization but efore > >>> fitting the linear model? > >>> As my data is not an expressionSet I cannot use the nonfilter commands, > in > >>> this case any suggestions of using other filtering methods? > >>> > >>> Appreciate the suggestions > >>> > >>> > >> Such filtering is performed after normalisation, but it is essential > that > >> the filter criterion does *not use the sample annotations*. E.g. you can > use > >> for each gene the overall variance or IQR across the experiment. > >> > >> If x is a matrix with rows=genes and columns=samples, then this can be > as > >> simple as: > >> > >> rs = rowSds(x) > >> fx = fx[ rs > quantile(rs, lambda), ] > >> > >> where rowSds is in the genefilter package, and lambda is a parameter > >> between 0 and 1 that contains your belief in what fraction of probes on > the > >> array correspond to target molecules that are never expressed in the > >> conditions you study. > >> > >> Also note that after such filtering, strictly speaking, the nominal > >> p-values from the subsequent testing could be too small - but one can > show > >> that in typical microarray applications the bias is negligible (compared > to > >> the impact of other effects), and in any case the p-values can be used > for > >> ranking. > >> > >> Best wishes > >> Wolfgang > >> > >> > >> -- > >> ---------------------------------------------------- > >> Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber > >> > > > > > > > > -- > > > > Regards, > > Abhilash > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- Regards, Abhilash [[alternative HTML version deleted]]

ADD REPLY • link 15.8 years ago Abhilash Venu ▴ 340

0

Entering edit mode

On Sat, Jul 12, 2008 at 11:26 AM, Abhilash Venu <abhivenu at="" gmail.com=""> wrote: > Hi Sean, > > Yes, thank you. > > Yet my problem of the data did not get sorted out. I have tried different > filtering methods including gapfilter and a combination of IQR with pOverA > or cv etc. But my adj p values are above the FDR limit of 0.05 after the > limma analysis. Also B values are generally -3. As Gorden has mentioned in > one of the previous mails, this is a indication of little evidance for > differential expression. > > What could be the reason for this. Is this really an indicative of absence > of differential expression? It sounds like it. Though people think of filtering as a way to reduce the number of genes and improve the strength of signal after multiple-testing correction, I don't think that is the correct mindset. Filtering is useful to remove probes from analysis that are not measuring anything interesting (no change across experiments) or are not well-measured. So, the thought process should not be to do hypothesis testing and then, if negative, to do filtering to try to improve the situation, but to do filtering based on rational thresholds for removing uninteresting or less-than-credible values as part of a series of preprocessing steps. Sean > On Fri, Jul 11, 2008 at 4:17 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > >> On Fri, Jul 11, 2008 at 5:32 AM, Abhilash Venu <abhivenu at="" gmail.com=""> wrote: >> > Dear Dr. Huber, >> > >> > Thank you for the advice. I have tried the script that you have advised >> to >> > use. As you mentioned I have used the script after the normalization, but >> > that has shown the following error, which I do not understand, whether I >> am >> > using in the right way. >> > >> > MA<-normalizeBetweenArrays(log2(Rgene$G), method="quantile")# >> normalization >> > rs = rowSds(MA) >> > fx = fx[ rs > quantile(rs, 0.05), ] >> > Error: object "fx" not found >> >> Hi, Abhilash. I think that line should read: >> >> fx = x[rs > quantile(rs,0.05),] >> >> Wolfgang was simply suggesting subsetting x by the results of sd filtering. >> >> Sean >> >> > Can you advise me on the same. >> > Thanks in advance. >> > >> > Abhilash >> > >> > On Fri, Jul 11, 2008 at 4:06 AM, Wolfgang Huber <huber at="" ebi.ac.uk=""> wrote: >> > >> >> Hi Abhilash >> >> >> >> >> >> I am working with single color data from Agilent platform. After the >> limma >> >>> analysis the adjusted p values were higher than 5% of FDR. At this >> >>> instance >> >>> I am thinking of filtering the genes using genefilter. As my data set >> >>> contains only raw intensities of normal and test before the >> normalization, >> >>> where I am uisng 'normalizeBetweenArrays' command after log >> transforming >> >>> the >> >>> data. >> >>> In this scenario I am quite confused whether I should use the filter >> >>> functions prior to normalization of after the normalization but efore >> >>> fitting the linear model? >> >>> As my data is not an expressionSet I cannot use the nonfilter commands, >> in >> >>> this case any suggestions of using other filtering methods? >> >>> >> >>> Appreciate the suggestions >> >>> >> >>> >> >> Such filtering is performed after normalisation, but it is essential >> that >> >> the filter criterion does *not use the sample annotations*. E.g. you can >> use >> >> for each gene the overall variance or IQR across the experiment. >> >> >> >> If x is a matrix with rows=genes and columns=samples, then this can be >> as >> >> simple as: >> >> >> >> rs = rowSds(x) >> >> fx = fx[ rs > quantile(rs, lambda), ] >> >> >> >> where rowSds is in the genefilter package, and lambda is a parameter >> >> between 0 and 1 that contains your belief in what fraction of probes on >> the >> >> array correspond to target molecules that are never expressed in the >> >> conditions you study. >> >> >> >> Also note that after such filtering, strictly speaking, the nominal >> >> p-values from the subsequent testing could be too small - but one can >> show >> >> that in typical microarray applications the bias is negligible (compared >> to >> >> the impact of other effects), and in any case the p-values can be used >> for >> >> ranking. >> >> >> >> Best wishes >> >> Wolfgang >> >> >> >> >> >> -- >> >> ---------------------------------------------------- >> >> Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber >> >> >> > >> > >> > >> > -- >> > >> > Regards, >> > Abhilash >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at stat.math.ethz.ch >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > > > > -- > > Regards, > Abhilash > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 15.8 years ago Sean Davis 21k

0

Entering edit mode

Hi Sean, Thank you for sharing the thoughts. I have done the filtering, using the same code prior to the normalization, and it started to show some changes. I am providing the topTable results, the odds ratios started to show the positive change but still adj.P.Val is showing little higher, So in this scenario, whether I should do more stringent filtering before the analysis? GeneName logFC AveExpr t P.Value adj.P.Val B NUDT16L1 2.7559164 14.32567 10.098560 1.520399e-07 0.0065018 4.829862 MGC4268 1.5820444 12.06414 7.695917 3.280927e-06 0.061246 3.208160 AR 1.7511488 10.19825 7.506490 4.296601e-06 0.0612466 3.048297 LOC124220 0.9476445 15.51240 6.697382 1.431390e-05 0.1530298 2.302016 A_24_P289130 1.7622555 11.07025 6.401121 2.272696e-05 0.156454 2.001432 ZNF501 1.804305 10.69845 6.345654 2.481481e-05 0.156454 1.943447 ADAM22 -1.650837 11.89608 -6.187425 3.195991e-05 0.156454 1.77502 THC2351317 1.0793141 12.34347 6.179724 3.235878e-05 0.156454 1.766717 AW276332 1.8253290 10.55792 6.147119 3.410664e-05 0.1564544 1.731409 THC2323609 2.0122396 10.82117 6.076649 3.823291e-05 0.15645 1.654439 Regards Abhilash On Sat, Jul 12, 2008 at 10:32 PM, Sean Davis <sdavis2@mail.nih.gov> wrote: > On Sat, Jul 12, 2008 at 11:26 AM, Abhilash Venu <abhivenu@gmail.com> > wrote: > > Hi Sean, > > > > Yes, thank you. > > > > Yet my problem of the data did not get sorted out. I have tried different > > filtering methods including gapfilter and a combination of IQR with > pOverA > > or cv etc. But my adj p values are above the FDR limit of 0.05 after the > > limma analysis. Also B values are generally -3. As Gorden has mentioned > in > > one of the previous mails, this is a indication of little evidance for > > differential expression. > > > > What could be the reason for this. Is this really an indicative of > absence > > of differential expression? > > It sounds like it. Though people think of filtering as a way to > reduce the number of genes and improve the strength of signal after > multiple-testing correction, I don't think that is the correct > mindset. Filtering is useful to remove probes from analysis that are > not measuring anything interesting (no change across experiments) or > are not well-measured. So, the thought process should not be to do > hypothesis testing and then, if negative, to do filtering to try to > improve the situation, but to do filtering based on rational > thresholds for removing uninteresting or less-than-credible values as > part of a series of preprocessing steps. > > Sean > > > On Fri, Jul 11, 2008 at 4:17 PM, Sean Davis <sdavis2@mail.nih.gov> > wrote: > > > >> On Fri, Jul 11, 2008 at 5:32 AM, Abhilash Venu <abhivenu@gmail.com> > wrote: > >> > Dear Dr. Huber, > >> > > >> > Thank you for the advice. I have tried the script that you have > advised > >> to > >> > use. As you mentioned I have used the script after the normalization, > but > >> > that has shown the following error, which I do not understand, whether > I > >> am > >> > using in the right way. > >> > > >> > MA<-normalizeBetweenArrays(log2(Rgene$G), method="quantile")# > >> normalization > >> > rs = rowSds(MA) > >> > fx = fx[ rs > quantile(rs, 0.05), ] > >> > Error: object "fx" not found > >> > >> Hi, Abhilash. I think that line should read: > >> > >> fx = x[rs > quantile(rs,0.05),] > >> > >> Wolfgang was simply suggesting subsetting x by the results of sd > filtering. > >> > >> Sean > >> > >> > Can you advise me on the same. > >> > Thanks in advance. > >> > > >> > Abhilash > >> > > >> > On Fri, Jul 11, 2008 at 4:06 AM, Wolfgang Huber <huber@ebi.ac.uk> > wrote: > >> > > >> >> Hi Abhilash > >> >> > >> >> > >> >> I am working with single color data from Agilent platform. After the > >> limma > >> >>> analysis the adjusted p values were higher than 5% of FDR. At this > >> >>> instance > >> >>> I am thinking of filtering the genes using genefilter. As my data > set > >> >>> contains only raw intensities of normal and test before the > >> normalization, > >> >>> where I am uisng 'normalizeBetweenArrays' command after log > >> transforming > >> >>> the > >> >>> data. > >> >>> In this scenario I am quite confused whether I should use the filter > >> >>> functions prior to normalization of after the normalization but > efore > >> >>> fitting the linear model? > >> >>> As my data is not an expressionSet I cannot use the nonfilter > commands, > >> in > >> >>> this case any suggestions of using other filtering methods? > >> >>> > >> >>> Appreciate the suggestions > >> >>> > >> >>> > >> >> Such filtering is performed after normalisation, but it is essential > >> that > >> >> the filter criterion does *not use the sample annotations*. E.g. you > can > >> use > >> >> for each gene the overall variance or IQR across the experiment. > >> >> > >> >> If x is a matrix with rows=genes and columns=samples, then this can > be > >> as > >> >> simple as: > >> >> > >> >> rs = rowSds(x) > >> >> fx = fx[ rs > quantile(rs, lambda), ] > >> >> > >> >> where rowSds is in the genefilter package, and lambda is a parameter > >> >> between 0 and 1 that contains your belief in what fraction of probes > on > >> the > >> >> array correspond to target molecules that are never expressed in the > >> >> conditions you study. > >> >> > >> >> Also note that after such filtering, strictly speaking, the nominal > >> >> p-values from the subsequent testing could be too small - but one can > >> show > >> >> that in typical microarray applications the bias is negligible > (compared > >> to > >> >> the impact of other effects), and in any case the p-values can be > used > >> for > >> >> ranking. > >> >> > >> >> Best wishes > >> >> Wolfgang > >> >> > >> >> > >> >> -- > >> >> ---------------------------------------------------- > >> >> Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber > >> >> > >> > > >> > > >> > > >> > -- > >> > > >> > Regards, > >> > Abhilash > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > _______________________________________________ > >> > Bioconductor mailing list > >> > Bioconductor@stat.math.ethz.ch > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > >> > > > > > > > > -- > > > > Regards, > > Abhilash > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- Regards, Abhilash [[alternative HTML version deleted]]

ADD REPLY • link 15.8 years ago Abhilash Venu ▴ 340

0

Entering edit mode

Jenny Drnevich ★ 2.0k

@jenny-drnevich-2812

Last seen 29 days ago

United States

At 09:14 AM 7/16/2008, Abhilash Venu wrote: >Hi Sean, >Thank you for sharing the thoughts. >I have done the filtering, using the same code prior to the normalization, >and it started to show some changes. I am providing the topTable results, >the odds ratios started to show the positive change but still adj.P.Val is >showing little higher, So in this scenario, whether I should do more >stringent filtering before the analysis? Hi Abhilash, As Sean said before, the goal of data pre-processing and filtering should not be to *get* the results you want, but rather to arrive at the most _correct_ results given the type of data that is generated. It's a big statistical no-no to try several different analysis methods and then pick the one that gives you the results you like best. I'm not sure why you tried filtering before doing normalization when you were already told that it's supposed to be done after normalization. I know it's frustrating to not have any "significant" genes, especially when you know there are expression changes due to the treatment. Remember that a FDR level of 0.05 is not a magical threshold of significance, rather the amount of false positives YOU are willing to tolerate in your gene list. I've seen papers where they've used gene lists with 0.1 or even 0.2 FDR thresholds. Another route is to just use the top 50 or 100 genes, as these have the most evidence for DE, even if they don't surpass any reasonable FDR adjustment. Finally, remember that Affy arrays, and many other methods of expression measurement, are only measuring a tiny portion of the expected transcript. There are many known cases in which "expression" differences won't be reflected in that portion of the transcript. In these cases, the microarray data are "correct", even if they aren't telling you the entire story... Best, Jenny >GeneName logFC AveExpr t P.Value > adj.P.Val B >NUDT16L1 2.7559164 14.32567 10.098560 1.520399e-07 >0.0065018 4.829862 >MGC4268 1.5820444 12.06414 7.695917 3.280927e-06 >0.061246 3.208160 >AR 1.7511488 10.19825 7.506490 4.296601e-06 >0.0612466 3.048297 >LOC124220 0.9476445 15.51240 6.697382 1.431390e-05 >0.1530298 2.302016 >A_24_P289130 1.7622555 11.07025 6.401121 2.272696e-05 >0.156454 2.001432 >ZNF501 1.804305 10.69845 6.345654 2.481481e-05 >0.156454 1.943447 >ADAM22 -1.650837 11.89608 -6.187425 3.195991e-05 >0.156454 1.77502 THC2351317 1.0793141 12.34347 >6.179724 3.235878e-05 0.156454 1.766717 >AW276332 1.8253290 10.55792 6.147119 3.410664e-05 >0.1564544 1.731409 >THC2323609 2.0122396 10.82117 6.076649 3.823291e-05 >0.15645 1.654439 > > >Regards >Abhilash > >On Sat, Jul 12, 2008 at 10:32 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > > > On Sat, Jul 12, 2008 at 11:26 AM, Abhilash Venu <abhivenu at="" gmail.com=""> > > wrote: > > > Hi Sean, > > > > > > Yes, thank you. > > > > > > Yet my problem of the data did not get sorted out. I have tried different > > > filtering methods including gapfilter and a combination of IQR with > > pOverA > > > or cv etc. But my adj p values are above the FDR limit of 0.05 after the > > > limma analysis. Also B values are generally -3. As Gorden has mentioned > > in > > > one of the previous mails, this is a indication of little evidance for > > > differential expression. > > > > > > What could be the reason for this. Is this really an indicative of > > absence > > > of differential expression? > > > > It sounds like it. Though people think of filtering as a way to > > reduce the number of genes and improve the strength of signal after > > multiple-testing correction, I don't think that is the correct > > mindset. Filtering is useful to remove probes from analysis that are > > not measuring anything interesting (no change across experiments) or > > are not well-measured. So, the thought process should not be to do > > hypothesis testing and then, if negative, to do filtering to try to > > improve the situation, but to do filtering based on rational > > thresholds for removing uninteresting or less-than-credible values as > > part of a series of preprocessing steps. > > > > Sean > > > > > On Fri, Jul 11, 2008 at 4:17 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> > > wrote: > > > > > >> On Fri, Jul 11, 2008 at 5:32 AM, Abhilash Venu <abhivenu at="" gmail.com=""> > > wrote: > > >> > Dear Dr. Huber, > > >> > > > >> > Thank you for the advice. I have tried the script that you have > > advised > > >> to > > >> > use. As you mentioned I have used the script after the normalization, > > but > > >> > that has shown the following error, which I do not understand, whether > > I > > >> am > > >> > using in the right way. > > >> > > > >> > MA<-normalizeBetweenArrays(log2(Rgene$G), method="quantile")# > > >> normalization > > >> > rs = rowSds(MA) > > >> > fx = fx[ rs > quantile(rs, 0.05), ] > > >> > Error: object "fx" not found > > >> > > >> Hi, Abhilash. I think that line should read: > > >> > > >> fx = x[rs > quantile(rs,0.05),] > > >> > > >> Wolfgang was simply suggesting subsetting x by the results of sd > > filtering. > > >> > > >> Sean > > >> > > >> > Can you advise me on the same. > > >> > Thanks in advance. > > >> > > > >> > Abhilash > > >> > > > >> > On Fri, Jul 11, 2008 at 4:06 AM, Wolfgang Huber <huber at="" ebi.ac.uk=""> > > wrote: > > >> > > > >> >> Hi Abhilash > > >> >> > > >> >> > > >> >> I am working with single color data from Agilent platform. After the > > >> limma > > >> >>> analysis the adjusted p values were higher than 5% of FDR. At this > > >> >>> instance > > >> >>> I am thinking of filtering the genes using genefilter. As my data > > set > > >> >>> contains only raw intensities of normal and test before the > > >> normalization, > > >> >>> where I am uisng 'normalizeBetweenArrays' command after log > > >> transforming > > >> >>> the > > >> >>> data. > > >> >>> In this scenario I am quite confused whether I should use the filter > > >> >>> functions prior to normalization of after the normalization but > > efore > > >> >>> fitting the linear model? > > >> >>> As my data is not an expressionSet I cannot use the nonfilter > > commands, > > >> in > > >> >>> this case any suggestions of using other filtering methods? > > >> >>> > > >> >>> Appreciate the suggestions > > >> >>> > > >> >>> > > >> >> Such filtering is performed after normalisation, but it is essential > > >> that > > >> >> the filter criterion does *not use the sample annotations*. E.g. you > > can > > >> use > > >> >> for each gene the overall variance or IQR across the experiment. > > >> >> > > >> >> If x is a matrix with rows=genes and columns=samples, then this can > > be > > >> as > > >> >> simple as: > > >> >> > > >> >> rs = rowSds(x) > > >> >> fx = fx[ rs > quantile(rs, lambda), ] > > >> >> > > >> >> where rowSds is in the genefilter package, and lambda is a parameter > > >> >> between 0 and 1 that contains your belief in what fraction of probes > > on > > >> the > > >> >> array correspond to target molecules that are never expressed in the > > >> >> conditions you study. > > >> >> > > >> >> Also note that after such filtering, strictly speaking, the nominal > > >> >> p-values from the subsequent testing could be too small - but one can > > >> show > > >> >> that in typical microarray applications the bias is negligible > > (compared > > >> to > > >> >> the impact of other effects), and in any case the p-values can be > > used > > >> for > > >> >> ranking. > > >> >> > > >> >> Best wishes > > >> >> Wolfgang > > >> >> > > >> >> > > >> >> -- > > >> >> ---------------------------------------------------- > > >> >> Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber > > >> >> > > >> > > > >> > > > >> > > > >> > -- > > >> > > > >> > Regards, > > >> > Abhilash > > >> > > > >> > [[alternative HTML version deleted]] > > >> > > > >> > _______________________________________________ > > >> > Bioconductor mailing list > > >> > Bioconductor at stat.math.ethz.ch > > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > > >> > Search the archives: > > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > >> > > > >> > > > > > > > > > > > > -- > > > > > > Regards, > > > Abhilash > > > > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > >-- > >Regards, >Abhilash > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at illinois.edu

ADD COMMENT • link 15.8 years ago Jenny Drnevich ★ 2.0k

0

Entering edit mode

Hi Jenny, Thank you for your explanation, as a biologist who started doing analysis by R, my experimental plan is to select the a few top genes based on the p. value and to evaluate using PCR. I completely agree with you that arrays may not represent the true biology, but I feel that I should not go wrong in the statical analysis. Regards, Abhilash On Wed, Jul 16, 2008 at 8:31 PM, Jenny Drnevich <drnevich@illinois.edu> wrote: > At 09:14 AM 7/16/2008, Abhilash Venu wrote: > >> Hi Sean, >> Thank you for sharing the thoughts. >> I have done the filtering, using the same code prior to the normalization, >> and it started to show some changes. I am providing the topTable results, >> the odds ratios started to show the positive change but still adj.P.Val is >> showing little higher, So in this scenario, whether I should do more >> stringent filtering before the analysis? >> > > Hi Abhilash, > > As Sean said before, the goal of data pre-processing and filtering should > not be to *get* the results you want, but rather to arrive at the most > _correct_ results given the type of data that is generated. It's a big > statistical no-no to try several different analysis methods and then pick > the one that gives you the results you like best. I'm not sure why you tried > filtering before doing normalization when you were already told that it's > supposed to be done after normalization. I know it's frustrating to not have > any "significant" genes, especially when you know there are expression > changes due to the treatment. Remember that a FDR level of 0.05 is not a > magical threshold of significance, rather the amount of false positives YOU > are willing to tolerate in your gene list. I've seen papers where they've > used gene lists with 0.1 or even 0.2 FDR thresholds. Another route is to > just use the top 50 or 100 genes, as these have the most evidence for DE, > even if they don't surpass any reasonable FDR adjustment. > > Finally, remember that Affy arrays, and many other methods of expression > measurement, are only measuring a tiny portion of the expected transcript. > There are many known cases in which "expression" differences won't be > reflected in that portion of the transcript. In these cases, the microarray > data are "correct", even if they aren't telling you the entire story... > > Best, > Jenny > > > > GeneName logFC AveExpr t P.Value >> adj.P.Val B >> NUDT16L1 2.7559164 14.32567 10.098560 1.520399e-07 >> 0.0065018 4.829862 >> MGC4268 1.5820444 12.06414 7.695917 3.280927e-06 >> 0.061246 3.208160 >> AR 1.7511488 10.19825 7.506490 4.296601e-06 >> 0.0612466 3.048297 >> LOC124220 0.9476445 15.51240 6.697382 1.431390e-05 >> 0.1530298 2.302016 >> A_24_P289130 1.7622555 11.07025 6.401121 2.272696e-05 >> 0.156454 2.001432 >> ZNF501 1.804305 10.69845 6.345654 2.481481e-05 >> 0.156454 1.943447 >> ADAM22 -1.650837 11.89608 -6.187425 3.195991e-05 >> 0.156454 1.77502 THC2351317 1.0793141 12.34347 >> 6.179724 3.235878e-05 0.156454 1.766717 >> AW276332 1.8253290 10.55792 6.147119 3.410664e-05 >> 0.1564544 1.731409 >> THC2323609 2.0122396 10.82117 6.076649 3.823291e-05 >> 0.15645 1.654439 >> >> >> Regards >> Abhilash >> >> On Sat, Jul 12, 2008 at 10:32 PM, Sean Davis <sdavis2@mail.nih.gov> >> wrote: >> >> > On Sat, Jul 12, 2008 at 11:26 AM, Abhilash Venu <abhivenu@gmail.com> >> > wrote: >> > > Hi Sean, >> > > >> > > Yes, thank you. >> > > >> > > Yet my problem of the data did not get sorted out. I have tried >> different >> > > filtering methods including gapfilter and a combination of IQR with >> > pOverA >> > > or cv etc. But my adj p values are above the FDR limit of 0.05 after >> the >> > > limma analysis. Also B values are generally -3. As Gorden has >> mentioned >> > in >> > > one of the previous mails, this is a indication of little evidance for >> > > differential expression. >> > > >> > > What could be the reason for this. Is this really an indicative of >> > absence >> > > of differential expression? >> > >> > It sounds like it. Though people think of filtering as a way to >> > reduce the number of genes and improve the strength of signal after >> > multiple-testing correction, I don't think that is the correct >> > mindset. Filtering is useful to remove probes from analysis that are >> > not measuring anything interesting (no change across experiments) or >> > are not well-measured. So, the thought process should not be to do >> > hypothesis testing and then, if negative, to do filtering to try to >> > improve the situation, but to do filtering based on rational >> > thresholds for removing uninteresting or less-than-credible values as >> > part of a series of preprocessing steps. >> > >> > Sean >> > >> > > On Fri, Jul 11, 2008 at 4:17 PM, Sean Davis <sdavis2@mail.nih.gov> >> > wrote: >> > > >> > >> On Fri, Jul 11, 2008 at 5:32 AM, Abhilash Venu <abhivenu@gmail.com> >> > wrote: >> > >> > Dear Dr. Huber, >> > >> > >> > >> > Thank you for the advice. I have tried the script that you have >> > advised >> > >> to >> > >> > use. As you mentioned I have used the script after the >> normalization, >> > but >> > >> > that has shown the following error, which I do not understand, >> whether >> > I >> > >> am >> > >> > using in the right way. >> > >> > >> > >> > MA<-normalizeBetweenArrays(log2(Rgene$G), method="quantile")# >> > >> normalization >> > >> > rs = rowSds(MA) >> > >> > fx = fx[ rs > quantile(rs, 0.05), ] >> > >> > Error: object "fx" not found >> > >> >> > >> Hi, Abhilash. I think that line should read: >> > >> >> > >> fx = x[rs > quantile(rs,0.05),] >> > >> >> > >> Wolfgang was simply suggesting subsetting x by the results of sd >> > filtering. >> > >> >> > >> Sean >> > >> >> > >> > Can you advise me on the same. >> > >> > Thanks in advance. >> > >> > >> > >> > Abhilash >> > >> > >> > >> > On Fri, Jul 11, 2008 at 4:06 AM, Wolfgang Huber <huber@ebi.ac.uk> >> > wrote: >> > >> > >> > >> >> Hi Abhilash >> > >> >> >> > >> >> >> > >> >> I am working with single color data from Agilent platform. After >> the >> > >> limma >> > >> >>> analysis the adjusted p values were higher than 5% of FDR. At >> this >> > >> >>> instance >> > >> >>> I am thinking of filtering the genes using genefilter. As my data >> > set >> > >> >>> contains only raw intensities of normal and test before the >> > >> normalization, >> > >> >>> where I am uisng 'normalizeBetweenArrays' command after log >> > >> transforming >> > >> >>> the >> > >> >>> data. >> > >> >>> In this scenario I am quite confused whether I should use the >> filter >> > >> >>> functions prior to normalization of after the normalization but >> > efore >> > >> >>> fitting the linear model? >> > >> >>> As my data is not an expressionSet I cannot use the nonfilter >> > commands, >> > >> in >> > >> >>> this case any suggestions of using other filtering methods? >> > >> >>> >> > >> >>> Appreciate the suggestions >> > >> >>> >> > >> >>> >> > >> >> Such filtering is performed after normalisation, but it is >> essential >> > >> that >> > >> >> the filter criterion does *not use the sample annotations*. E.g. >> you >> > can >> > >> use >> > >> >> for each gene the overall variance or IQR across the experiment. >> > >> >> >> > >> >> If x is a matrix with rows=genes and columns=samples, then this >> can >> > be >> > >> as >> > >> >> simple as: >> > >> >> >> > >> >> rs = rowSds(x) >> > >> >> fx = fx[ rs > quantile(rs, lambda), ] >> > >> >> >> > >> >> where rowSds is in the genefilter package, and lambda is a >> parameter >> > >> >> between 0 and 1 that contains your belief in what fraction of >> probes >> > on >> > >> the >> > >> >> array correspond to target molecules that are never expressed in >> the >> > >> >> conditions you study. >> > >> >> >> > >> >> Also note that after such filtering, strictly speaking, the >> nominal >> > >> >> p-values from the subsequent testing could be too small - but one >> can >> > >> show >> > >> >> that in typical microarray applications the bias is negligible >> > (compared >> > >> to >> > >> >> the impact of other effects), and in any case the p-values can be >> > used >> > >> for >> > >> >> ranking. >> > >> >> >> > >> >> Best wishes >> > >> >> Wolfgang >> > >> >> >> > >> >> >> > >> >> -- >> > >> >> ---------------------------------------------------- >> > >> >> Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber >> > >> >> >> > >> > >> > >> > >> > >> > >> > >> > -- >> > >> > >> > >> > Regards, >> > >> > Abhilash >> > >> > >> > >> > [[alternative HTML version deleted]] >> > >> > >> > >> > _______________________________________________ >> > >> > Bioconductor mailing list >> > >> > Bioconductor@stat.math.ethz.ch >> > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > >> > Search the archives: >> > >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > >> > >> >> > > >> > > >> > > >> > > -- >> > > >> > > Regards, >> > > Abhilash >> > > >> > > [[alternative HTML version deleted]] >> > > >> > > _______________________________________________ >> > > Bioconductor mailing list >> > > Bioconductor@stat.math.ethz.ch >> > > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > >> > >> >> >> >> -- >> >> Regards, >> Abhilash >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > Jenny Drnevich, Ph.D. > > Functional Genomics Bioinformatics Specialist > W.M. Keck Center for Comparative and Functional Genomics > Roy J. Carver Biotechnology Center > University of Illinois, Urbana-Champaign > > 330 ERML > 1201 W. Gregory Dr. > Urbana, IL 61801 > USA > > ph: 217-244-7355 > fax: 217-265-5066 > e-mail: drnevich@illinois.edu > -- Regards, Abhilash [[alternative HTML version deleted]]

ADD REPLY • link 15.8 years ago Abhilash Venu ▴ 340

Login before adding your answer.