Question

siggenes parameters

0

Entering edit mode

Assa Yeroslaviz ★ 1.5k

@assa-yeroslaviz-1597

Last seen 12 weeks ago

Germany

Hi Holger, Thanks for the response (I am sorry to answer so late as I was on a business trip). I will look into the description to try and understand it better. I already thought that the number is too low. I am still not sure though about the first two runs. What about the R.fold parameter? Does it make sense to set it before I run the SAM analysis, or is it better to first run it with all the genes and than exclude those genes with a fold induction under a certain threshold? In the second run I get almost the same amount of called genes with a "better" FDR value. Is there a way to say which FDR value can be accepted? Thanks Assa On Tue, Jun 28, 2011 at 13:11, Holger Schwender <holger.schw@gmx.de> wrote: > Hi Assa, > > siggenes is described in > > http://cran.r-project.org/doc/Rnews/Rnews_2006-5.pdf > > on pages 45ff, and with a bit more technical details in > > https://eldorado.tu- dortmund.de/bitstream/2003/23306/1/diss_schwender.pdf > > Section 6.3.2. Another possibility is to look into the code for sam, in > particular into the functions d.stat and siggenes:::stats.cal. > > The reason why the FDR is so low in the third example is the very, very low > estimate for p0. Take a look at the number of falsely called genes (False) > for about the same number of called genes. These numbers are close to each > other. The FDR is estimated by > > p0 * False/Called > > so the FDR gets very low if p0 is very small. > > I would thus not trust the results of the third analysis, as the p0 > estimation has not worked properly here. Not really sure why, but the reason > might be that too many null genes are removed when use.dm=FALSE. > > Best, > Holger > > -------- Original-Nachricht -------- > > Datum: Tue, 28 Jun 2011 09:41:53 +0200 > > Von: Assa Yeroslaviz <frymor@gmail.com> > > An: bioconductor <bioconductor@stat.math.ethz.ch>, holger.schw@gmx.de > > Betreff: siggenes parameters > > > Hi everybody, > > > > I have a question about the behavior of siggenes in R (2.12) with > > different > > parameters. > > Maybe I need to understand better how siggenes calculated the > > differentially > > regulated genes, but I would like to ask you for your help in > > understanding > > it. > > > > Maybe you can also direct me to where I can find the answers for this > > problem(s) > > > > I am running a sam analysis of wild type vs. mutant in drosophila miRNA > > microarrays. > > > > I first read the files into an affybatch and than normalized them (RMA). > > After normalizing the arrays I extracted all non-drosophila miRNA from my > > expressionSet and ran the sam analysis only with the 186 dme-miRNA > probes. > > > > cl <- c(0,0,0,1,1,1) # "wt1", "wt2", "wt3", "mt1", "mt2", "mt3" > > sam.out <- sam(dme_rma,cl, var.equal=FALSE, B=100, include.zero=FALSE, > > gene.names=Probe_names, na.replace=FALSE, rand=123) > > > > I run the sam analysis with different parameters and got *very* different > > results. > > > > the first run was with the default parameters. Here I got these results: > > SAM Analysis for the Two-Class Unpaired Case Assuming Unequal Variances > > > > s0 = 0.3176 (The 55 % quantile of the s values.) > > > > Delta p0 False Called FDR cutlow cutup j2 j1 > > 1 0.10 0.882 74.85 92 0.717 -0.130 2.135 91 186 > > 2 0.11 0.882 74.85 92 0.717 -0.130 2.135 91 186 > > 3 0.12 0.882 28.25 39 0.639 -0.575 2.135 38 186 > > 4 0.13 0.882 24.35 38 0.565 -0.633 2.135 37 186 > > ... > > 10 0.19 0.882 21.65 37 0.516 -0.672 2.135 36 186 > > 11 0.20 0.882 12.4 24 0.456 -0.860 2.135 23 186 > > 12 0.21 0.882 12.4 24 0.456 -0.860 2.135 23 186 > > 13 0.22 0.882 12.4 24 0.456 -0.860 2.135 23 186 > > 14 0.23 0.882 10.9 22 0.437 -0.905 2.135 21 186 > > 15 0.24 0.882 0.4 1 0.353 -Inf 2.135 0 186 > > ... > > 20 0.29 0.882 0.4 1 0.353 -Inf 2.135 0 186 > > 21 0.30 0.882 0 0 0 -Inf Inf 0 187 > > 22 0.31 0.882 0 0 0 -Inf Inf 0 187 > > > > I get here a very high FDR value for very few miRNAs. > > > > For my second run I added the parameter R.fold = 1.2, to exclude all > > miRNAs > > under this value from the analysis. > > Here I get much better results with lower FDR values: > > > > Number of variables having a fold change >= 1.2 or <= 0.8333 : 90 > > > > s0 = 0.0477 (The 0 % quantile of the s values.) > > > > Delta p0 False Called FDR cutlow cutup j2 j1 > > 1 0.1 0.378 55.5 82 0.2557 -0.708 0.425 45 54 > > 2 0.2 0.378 52.25 80 0.2467 -0.708 0.562 45 56 > > 3 0.3 0.378 48.4 79 0.2314 -0.708 0.676 45 57 > > 4 0.5 0.378 24.9 48 0.1960 -0.708 2.504 45 88 > > 5 0.6 0.378 23.7 45 0.1990 -0.708 Inf 45 91 > > 6 0.7 0.378 10.65 31 0.1298 -1.170 Inf 31 91 > > 7 0.8 0.378 6.75 25 0.1020 -1.513 Inf 25 91 > > 8 1.0 0.378 0.85 4 0.0803 -2.712 Inf 4 91 > > 9 1.1 0.378 0.25 1 0.0944 -4.810 Inf 1 91 > > 10 1.2 0.378 0 0 0 -Inf Inf 0 91 > > > > I than ran a third sam with the parameter use.dm=FALSE. This gave me > again > > different results with much lower FDR values: > > > > Number of variables having a fold change >= 1.2 or <= 0.8333 : 97 > > > > s0 = 0.1488 (The 5 % quantile of the s values.) > > > > Delta p0 False Called FDR cutlow cutup j2 j1 > > 1 0.1 0.009 53.5 81 0.00612 -0.541 0.545 48 65 > > 2 0.2 0.009 50.65 80 0.00587 -0.541 0.616 48 66 > > 3 0.3 0.009 26.85 48 0.00519 -0.541 Inf 48 98 > > 4 0.4 0.009 26.85 48 0.00519 -0.541 Inf 48 98 > > 5 0.5 0.009 26.85 48 0.00519 -0.541 Inf 48 98 > > 6 0.6 0.009 0 0 0 -Inf Inf 0 98 > > 7 0.7 0.009 0 0 0 -Inf Inf 0 98 > > 8 0.8 0.009 0 0 0 -Inf Inf 0 98 > > 9 0.9 0.009 0 0 0 -Inf Inf 0 98 > > 10 1.0 0.009 0 0 0 -Inf Inf 0 98 > > > > > > As you can see my obvious problem. Which is the better (right???) way of > > running this analysis. > > I think, to understand that, one needs to know how SAM works. I don't > > understand why I get such different results in the FDR between the three > > possibilities. > > > > As far as I understand it, the lower the FDR value, the better. But how > > can > > it be, that I have in the third run over 80 miRNAs with such a low FDR, > > while I have only few DE miRNAs in the first run with a much higher FDR > > value? > > > > I would appreciate any help, > > > > thanks > > Assa > > -- > Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir > belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de > [[alternative HTML version deleted]]

siggenes siggenes • 1.5k views

ADD COMMENT • link 14.5 years ago Assa Yeroslaviz ★ 1.5k

score 0 · Answer 1 · 2011-07-05

Hi Holger, Does that mean, that all genes which are found by SAM are significantly deregulated (under the given FDR value)? If I don't do the fold-change prefiltering, I get also genes with a very small change of expression. Is SAM reliable enough to detect such small changes? I know you can't set a fixed FDR values for all analyses, but is there a general guideline as to which FDR is too high to accept? Thanks Assa On Mon, Jul 4, 2011 at 10:09, Holger Schwender <holger.schw@gmx.de> wrote: > Hi Assa, > > the FDR is not an individual measure, but a measure for a set of genes. So > if you remove null genes beforehand (what you do by using the fold change), > then also false positives will be removed, and so the FDR gets lower. > > In the original analysis, Tusher et al. used the fold change additionally > to the test statistic. So I guess they would suggest to specify R.fold. And > that's why this option is available. > > However, I personally wouldn't use R.fold for a preselection of genes. I > would rather use it afterwards in a volcano plot (more exactly, the log2 of > it, so the numerator of the test statistic). If I would do some > prefiltering, I would use something that is not directly related to testing > for differential expression (see, e.g., the genefilter package. > > Best, > Holger > > > -------- Original-Nachricht -------- > > Datum: Sun, 3 Jul 2011 17:34:18 +0200 > > Von: Assa Yeroslaviz <frymor@gmail.com> > > An: Holger Schwender <holger.schw@gmx.de> > > CC: bioconductor@stat.math.ethz.ch > > Betreff: Re: siggenes parameters > > > Hi Holger, > > > > Thanks for the response (I am sorry to answer so late as I was on a > > business > > trip). > > > > I will look into the description to try and understand it better. > > > > I already thought that the number is too low. I am still not sure though > > about the first two runs. > > > > What about the R.fold parameter? > > Does it make sense to set it before I run the SAM analysis, or is it > > better > > to first run it with all the genes and than exclude those genes with a > > fold > > induction under a certain threshold? > > > > In the second run I get almost the same amount of called genes with a > > "better" FDR value. > > Is there a way to say which FDR value can be accepted? > > > > Thanks > > > > Assa > > > > > > On Tue, Jun 28, 2011 at 13:11, Holger Schwender <holger.schw@gmx.de> > > wrote: > > > > > Hi Assa, > > > > > > siggenes is described in > > > > > > http://cran.r-project.org/doc/Rnews/Rnews_2006-5.pdf > > > > > > on pages 45ff, and with a bit more technical details in > > > > > > > > > https://eldorado.tu- dortmund.de/bitstream/2003/23306/1/diss_schwender.pdf > > > > > > Section 6.3.2. Another possibility is to look into the code for sam, in > > > particular into the functions d.stat and siggenes:::stats.cal. > > > > > > The reason why the FDR is so low in the third example is the very, very > > low > > > estimate for p0. Take a look at the number of falsely called genes > > (False) > > > for about the same number of called genes. These numbers are close to > > each > > > other. The FDR is estimated by > > > > > > p0 * False/Called > > > > > > so the FDR gets very low if p0 is very small. > > > > > > I would thus not trust the results of the third analysis, as the p0 > > > estimation has not worked properly here. Not really sure why, but the > > reason > > > might be that too many null genes are removed when use.dm=FALSE. > > > > > > Best, > > > Holger > > > > > > -------- Original-Nachricht -------- > > > > Datum: Tue, 28 Jun 2011 09:41:53 +0200 > > > > Von: Assa Yeroslaviz <frymor@gmail.com> > > > > An: bioconductor <bioconductor@stat.math.ethz.ch>, > holger.schw@gmx.de > > > > Betreff: siggenes parameters > > > > > > > Hi everybody, > > > > > > > > I have a question about the behavior of siggenes in R (2.12) with > > > > different > > > > parameters. > > > > Maybe I need to understand better how siggenes calculated the > > > > differentially > > > > regulated genes, but I would like to ask you for your help in > > > > understanding > > > > it. > > > > > > > > Maybe you can also direct me to where I can find the answers for this > > > > problem(s) > > > > > > > > I am running a sam analysis of wild type vs. mutant in drosophila > > miRNA > > > > microarrays. > > > > > > > > I first read the files into an affybatch and than normalized them > > (RMA). > > > > After normalizing the arrays I extracted all non-drosophila miRNA > from > > my > > > > expressionSet and ran the sam analysis only with the 186 dme- miRNA > > > probes. > > > > > > > > cl <- c(0,0,0,1,1,1) # "wt1", "wt2", "wt3", "mt1", "mt2", "mt3" > > > > sam.out <- sam(dme_rma,cl, var.equal=FALSE, B=100, > include.zero=FALSE, > > > > gene.names=Probe_names, na.replace=FALSE, rand=123) > > > > > > > > I run the sam analysis with different parameters and got *very* > > different > > > > results. > > > > > > > > the first run was with the default parameters. Here I got these > > results: > > > > SAM Analysis for the Two-Class Unpaired Case Assuming Unequal > > Variances > > > > > > > > s0 = 0.3176 (The 55 % quantile of the s values.) > > > > > > > > Delta p0 False Called FDR cutlow cutup j2 j1 > > > > 1 0.10 0.882 74.85 92 0.717 -0.130 2.135 91 186 > > > > 2 0.11 0.882 74.85 92 0.717 -0.130 2.135 91 186 > > > > 3 0.12 0.882 28.25 39 0.639 -0.575 2.135 38 186 > > > > 4 0.13 0.882 24.35 38 0.565 -0.633 2.135 37 186 > > > > ... > > > > 10 0.19 0.882 21.65 37 0.516 -0.672 2.135 36 186 > > > > 11 0.20 0.882 12.4 24 0.456 -0.860 2.135 23 186 > > > > 12 0.21 0.882 12.4 24 0.456 -0.860 2.135 23 186 > > > > 13 0.22 0.882 12.4 24 0.456 -0.860 2.135 23 186 > > > > 14 0.23 0.882 10.9 22 0.437 -0.905 2.135 21 186 > > > > 15 0.24 0.882 0.4 1 0.353 -Inf 2.135 0 186 > > > > ... > > > > 20 0.29 0.882 0.4 1 0.353 -Inf 2.135 0 186 > > > > 21 0.30 0.882 0 0 0 -Inf Inf 0 187 > > > > 22 0.31 0.882 0 0 0 -Inf Inf 0 187 > > > > > > > > I get here a very high FDR value for very few miRNAs. > > > > > > > > For my second run I added the parameter R.fold = 1.2, to exclude all > > > > miRNAs > > > > under this value from the analysis. > > > > Here I get much better results with lower FDR values: > > > > > > > > Number of variables having a fold change >= 1.2 or <= 0.8333 : 90 > > > > > > > > s0 = 0.0477 (The 0 % quantile of the s values.) > > > > > > > > Delta p0 False Called FDR cutlow cutup j2 j1 > > > > 1 0.1 0.378 55.5 82 0.2557 -0.708 0.425 45 54 > > > > 2 0.2 0.378 52.25 80 0.2467 -0.708 0.562 45 56 > > > > 3 0.3 0.378 48.4 79 0.2314 -0.708 0.676 45 57 > > > > 4 0.5 0.378 24.9 48 0.1960 -0.708 2.504 45 88 > > > > 5 0.6 0.378 23.7 45 0.1990 -0.708 Inf 45 91 > > > > 6 0.7 0.378 10.65 31 0.1298 -1.170 Inf 31 91 > > > > 7 0.8 0.378 6.75 25 0.1020 -1.513 Inf 25 91 > > > > 8 1.0 0.378 0.85 4 0.0803 -2.712 Inf 4 91 > > > > 9 1.1 0.378 0.25 1 0.0944 -4.810 Inf 1 91 > > > > 10 1.2 0.378 0 0 0 -Inf Inf 0 91 > > > > > > > > I than ran a third sam with the parameter use.dm=FALSE. This gave me > > > again > > > > different results with much lower FDR values: > > > > > > > > Number of variables having a fold change >= 1.2 or <= 0.8333 : 97 > > > > > > > > s0 = 0.1488 (The 5 % quantile of the s values.) > > > > > > > > Delta p0 False Called FDR cutlow cutup j2 j1 > > > > 1 0.1 0.009 53.5 81 0.00612 -0.541 0.545 48 65 > > > > 2 0.2 0.009 50.65 80 0.00587 -0.541 0.616 48 66 > > > > 3 0.3 0.009 26.85 48 0.00519 -0.541 Inf 48 98 > > > > 4 0.4 0.009 26.85 48 0.00519 -0.541 Inf 48 98 > > > > 5 0.5 0.009 26.85 48 0.00519 -0.541 Inf 48 98 > > > > 6 0.6 0.009 0 0 0 -Inf Inf 0 98 > > > > 7 0.7 0.009 0 0 0 -Inf Inf 0 98 > > > > 8 0.8 0.009 0 0 0 -Inf Inf 0 98 > > > > 9 0.9 0.009 0 0 0 -Inf Inf 0 98 > > > > 10 1.0 0.009 0 0 0 -Inf Inf 0 98 > > > > > > > > > > > > As you can see my obvious problem. Which is the better (right???) > way > > of > > > > running this analysis. > > > > I think, to understand that, one needs to know how SAM works. I don't > > > > understand why I get such different results in the FDR between the > > three > > > > possibilities. > > > > > > > > As far as I understand it, the lower the FDR value, the better. But > > how > > > > can > > > > it be, that I have in the third run over 80 miRNAs with such a low > > FDR, > > > > while I have only few DE miRNAs in the first run with a much higher > > FDR > > > > value? > > > > > > > > I would appreciate any help, > > > > > > > > thanks > > > > Assa > > > > > > -- > > > Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir > > > belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de > > > > > -- > NEU: FreePhone - kostenlos mobil telefonieren! > Jetzt informieren: http://www.gmx.net/de/go/freephone > [[alternative HTML version deleted]]