siggenes parameters
1
0
Entering edit mode
Assa Yeroslaviz ★ 1.5k
@assa-yeroslaviz-1597
Last seen 4 months ago
Germany
Hi Holger, Thanks for the response (I am sorry to answer so late as I was on a business trip). I will look into the description to try and understand it better. I already thought that the number is too low. I am still not sure though about the first two runs. What about the R.fold parameter? Does it make sense to set it before I run the SAM analysis, or is it better to first run it with all the genes and than exclude those genes with a fold induction under a certain threshold? In the second run I get almost the same amount of called genes with a "better" FDR value. Is there a way to say which FDR value can be accepted? Thanks Assa On Tue, Jun 28, 2011 at 13:11, Holger Schwender <holger.schw@gmx.de> wrote: > Hi Assa, > > siggenes is described in > > http://cran.r-project.org/doc/Rnews/Rnews_2006-5.pdf > > on pages 45ff, and with a bit more technical details in > > https://eldorado.tu- dortmund.de/bitstream/2003/23306/1/diss_schwender.pdf > > Section 6.3.2. Another possibility is to look into the code for sam, in > particular into the functions d.stat and siggenes:::stats.cal. > > The reason why the FDR is so low in the third example is the very, very low > estimate for p0. Take a look at the number of falsely called genes (False) > for about the same number of called genes. These numbers are close to each > other. The FDR is estimated by > > p0 * False/Called > > so the FDR gets very low if p0 is very small. > > I would thus not trust the results of the third analysis, as the p0 > estimation has not worked properly here. Not really sure why, but the reason > might be that too many null genes are removed when use.dm=FALSE. > > Best, > Holger > > -------- Original-Nachricht -------- > > Datum: Tue, 28 Jun 2011 09:41:53 +0200 > > Von: Assa Yeroslaviz <frymor@gmail.com> > > An: bioconductor <bioconductor@stat.math.ethz.ch>, holger.schw@gmx.de > > Betreff: siggenes parameters > > > Hi everybody, > > > > I have a question about the behavior of siggenes in R (2.12) with > > different > > parameters. > > Maybe I need to understand better how siggenes calculated the > > differentially > > regulated genes, but I would like to ask you for your help in > > understanding > > it. > > > > Maybe you can also direct me to where I can find the answers for this > > problem(s) > > > > I am running a sam analysis of wild type vs. mutant in drosophila miRNA > > microarrays. > > > > I first read the files into an affybatch and than normalized them (RMA). > > After normalizing the arrays I extracted all non-drosophila miRNA from my > > expressionSet and ran the sam analysis only with the 186 dme-miRNA > probes. > > > > cl <- c(0,0,0,1,1,1) # "wt1", "wt2", "wt3", "mt1", "mt2", "mt3" > > sam.out <- sam(dme_rma,cl, var.equal=FALSE, B=100, include.zero=FALSE, > > gene.names=Probe_names, na.replace=FALSE, rand=123) > > > > I run the sam analysis with different parameters and got *very* different > > results. > > > > the first run was with the default parameters. Here I got these results: > > SAM Analysis for the Two-Class Unpaired Case Assuming Unequal Variances > > > > s0 = 0.3176 (The 55 % quantile of the s values.) > > > > Delta p0 False Called FDR cutlow cutup j2 j1 > > 1 0.10 0.882 74.85 92 0.717 -0.130 2.135 91 186 > > 2 0.11 0.882 74.85 92 0.717 -0.130 2.135 91 186 > > 3 0.12 0.882 28.25 39 0.639 -0.575 2.135 38 186 > > 4 0.13 0.882 24.35 38 0.565 -0.633 2.135 37 186 > > ... > > 10 0.19 0.882 21.65 37 0.516 -0.672 2.135 36 186 > > 11 0.20 0.882 12.4 24 0.456 -0.860 2.135 23 186 > > 12 0.21 0.882 12.4 24 0.456 -0.860 2.135 23 186 > > 13 0.22 0.882 12.4 24 0.456 -0.860 2.135 23 186 > > 14 0.23 0.882 10.9 22 0.437 -0.905 2.135 21 186 > > 15 0.24 0.882 0.4 1 0.353 -Inf 2.135 0 186 > > ... > > 20 0.29 0.882 0.4 1 0.353 -Inf 2.135 0 186 > > 21 0.30 0.882 0 0 0 -Inf Inf 0 187 > > 22 0.31 0.882 0 0 0 -Inf Inf 0 187 > > > > I get here a very high FDR value for very few miRNAs. > > > > For my second run I added the parameter R.fold = 1.2, to exclude all > > miRNAs > > under this value from the analysis. > > Here I get much better results with lower FDR values: > > > > Number of variables having a fold change >= 1.2 or <= 0.8333 : 90 > > > > s0 = 0.0477 (The 0 % quantile of the s values.) > > > > Delta p0 False Called FDR cutlow cutup j2 j1 > > 1 0.1 0.378 55.5 82 0.2557 -0.708 0.425 45 54 > > 2 0.2 0.378 52.25 80 0.2467 -0.708 0.562 45 56 > > 3 0.3 0.378 48.4 79 0.2314 -0.708 0.676 45 57 > > 4 0.5 0.378 24.9 48 0.1960 -0.708 2.504 45 88 > > 5 0.6 0.378 23.7 45 0.1990 -0.708 Inf 45 91 > > 6 0.7 0.378 10.65 31 0.1298 -1.170 Inf 31 91 > > 7 0.8 0.378 6.75 25 0.1020 -1.513 Inf 25 91 > > 8 1.0 0.378 0.85 4 0.0803 -2.712 Inf 4 91 > > 9 1.1 0.378 0.25 1 0.0944 -4.810 Inf 1 91 > > 10 1.2 0.378 0 0 0 -Inf Inf 0 91 > > > > I than ran a third sam with the parameter use.dm=FALSE. This gave me > again > > different results with much lower FDR values: > > > > Number of variables having a fold change >= 1.2 or <= 0.8333 : 97 > > > > s0 = 0.1488 (The 5 % quantile of the s values.) > > > > Delta p0 False Called FDR cutlow cutup j2 j1 > > 1 0.1 0.009 53.5 81 0.00612 -0.541 0.545 48 65 > > 2 0.2 0.009 50.65 80 0.00587 -0.541 0.616 48 66 > > 3 0.3 0.009 26.85 48 0.00519 -0.541 Inf 48 98 > > 4 0.4 0.009 26.85 48 0.00519 -0.541 Inf 48 98 > > 5 0.5 0.009 26.85 48 0.00519 -0.541 Inf 48 98 > > 6 0.6 0.009 0 0 0 -Inf Inf 0 98 > > 7 0.7 0.009 0 0 0 -Inf Inf 0 98 > > 8 0.8 0.009 0 0 0 -Inf Inf 0 98 > > 9 0.9 0.009 0 0 0 -Inf Inf 0 98 > > 10 1.0 0.009 0 0 0 -Inf Inf 0 98 > > > > > > As you can see my obvious problem. Which is the better (right???) way of > > running this analysis. > > I think, to understand that, one needs to know how SAM works. I don't > > understand why I get such different results in the FDR between the three > > possibilities. > > > > As far as I understand it, the lower the FDR value, the better. But how > > can > > it be, that I have in the third run over 80 miRNAs with such a low FDR, > > while I have only few DE miRNAs in the first run with a much higher FDR > > value? > > > > I would appreciate any help, > > > > thanks > > Assa > > -- > Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir > belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de > [[alternative HTML version deleted]]
siggenes siggenes • 1.2k views
ADD COMMENT
0
Entering edit mode
Assa Yeroslaviz ★ 1.5k
@assa-yeroslaviz-1597
Last seen 4 months ago
Germany
Hi Holger, Does that mean, that all genes which are found by SAM are significantly deregulated (under the given FDR value)? If I don't do the fold-change prefiltering, I get also genes with a very small change of expression. Is SAM reliable enough to detect such small changes? I know you can't set a fixed FDR values for all analyses, but is there a general guideline as to which FDR is too high to accept? Thanks Assa On Mon, Jul 4, 2011 at 10:09, Holger Schwender <holger.schw@gmx.de> wrote: > Hi Assa, > > the FDR is not an individual measure, but a measure for a set of genes. So > if you remove null genes beforehand (what you do by using the fold change), > then also false positives will be removed, and so the FDR gets lower. > > In the original analysis, Tusher et al. used the fold change additionally > to the test statistic. So I guess they would suggest to specify R.fold. And > that's why this option is available. > > However, I personally wouldn't use R.fold for a preselection of genes. I > would rather use it afterwards in a volcano plot (more exactly, the log2 of > it, so the numerator of the test statistic). If I would do some > prefiltering, I would use something that is not directly related to testing > for differential expression (see, e.g., the genefilter package. > > Best, > Holger > > > -------- Original-Nachricht -------- > > Datum: Sun, 3 Jul 2011 17:34:18 +0200 > > Von: Assa Yeroslaviz <frymor@gmail.com> > > An: Holger Schwender <holger.schw@gmx.de> > > CC: bioconductor@stat.math.ethz.ch > > Betreff: Re: siggenes parameters > > > Hi Holger, > > > > Thanks for the response (I am sorry to answer so late as I was on a > > business > > trip). > > > > I will look into the description to try and understand it better. > > > > I already thought that the number is too low. I am still not sure though > > about the first two runs. > > > > What about the R.fold parameter? > > Does it make sense to set it before I run the SAM analysis, or is it > > better > > to first run it with all the genes and than exclude those genes with a > > fold > > induction under a certain threshold? > > > > In the second run I get almost the same amount of called genes with a > > "better" FDR value. > > Is there a way to say which FDR value can be accepted? > > > > Thanks > > > > Assa > > > > > > On Tue, Jun 28, 2011 at 13:11, Holger Schwender <holger.schw@gmx.de> > > wrote: > > > > > Hi Assa, > > > > > > siggenes is described in > > > > > > http://cran.r-project.org/doc/Rnews/Rnews_2006-5.pdf > > > > > > on pages 45ff, and with a bit more technical details in > > > > > > > > > https://eldorado.tu- dortmund.de/bitstream/2003/23306/1/diss_schwender.pdf > > > > > > Section 6.3.2. Another possibility is to look into the code for sam, in > > > particular into the functions d.stat and siggenes:::stats.cal. > > > > > > The reason why the FDR is so low in the third example is the very, very > > low > > > estimate for p0. Take a look at the number of falsely called genes > > (False) > > > for about the same number of called genes. These numbers are close to > > each > > > other. The FDR is estimated by > > > > > > p0 * False/Called > > > > > > so the FDR gets very low if p0 is very small. > > > > > > I would thus not trust the results of the third analysis, as the p0 > > > estimation has not worked properly here. Not really sure why, but the > > reason > > > might be that too many null genes are removed when use.dm=FALSE. > > > > > > Best, > > > Holger > > > > > > -------- Original-Nachricht -------- > > > > Datum: Tue, 28 Jun 2011 09:41:53 +0200 > > > > Von: Assa Yeroslaviz <frymor@gmail.com> > > > > An: bioconductor <bioconductor@stat.math.ethz.ch>, > holger.schw@gmx.de > > > > Betreff: siggenes parameters > > > > > > > Hi everybody, > > > > > > > > I have a question about the behavior of siggenes in R (2.12) with > > > > different > > > > parameters. > > > > Maybe I need to understand better how siggenes calculated the > > > > differentially > > > > regulated genes, but I would like to ask you for your help in > > > > understanding > > > > it. > > > > > > > > Maybe you can also direct me to where I can find the answers for this > > > > problem(s) > > > > > > > > I am running a sam analysis of wild type vs. mutant in drosophila > > miRNA > > > > microarrays. > > > > > > > > I first read the files into an affybatch and than normalized them > > (RMA). > > > > After normalizing the arrays I extracted all non-drosophila miRNA > from > > my > > > > expressionSet and ran the sam analysis only with the 186 dme- miRNA > > > probes. > > > > > > > > cl <- c(0,0,0,1,1,1) # "wt1", "wt2", "wt3", "mt1", "mt2", "mt3" > > > > sam.out <- sam(dme_rma,cl, var.equal=FALSE, B=100, > include.zero=FALSE, > > > > gene.names=Probe_names, na.replace=FALSE, rand=123) > > > > > > > > I run the sam analysis with different parameters and got *very* > > different > > > > results. > > > > > > > > the first run was with the default parameters. Here I got these > > results: > > > > SAM Analysis for the Two-Class Unpaired Case Assuming Unequal > > Variances > > > > > > > > s0 = 0.3176 (The 55 % quantile of the s values.) > > > > > > > > Delta p0 False Called FDR cutlow cutup j2 j1 > > > > 1 0.10 0.882 74.85 92 0.717 -0.130 2.135 91 186 > > > > 2 0.11 0.882 74.85 92 0.717 -0.130 2.135 91 186 > > > > 3 0.12 0.882 28.25 39 0.639 -0.575 2.135 38 186 > > > > 4 0.13 0.882 24.35 38 0.565 -0.633 2.135 37 186 > > > > ... > > > > 10 0.19 0.882 21.65 37 0.516 -0.672 2.135 36 186 > > > > 11 0.20 0.882 12.4 24 0.456 -0.860 2.135 23 186 > > > > 12 0.21 0.882 12.4 24 0.456 -0.860 2.135 23 186 > > > > 13 0.22 0.882 12.4 24 0.456 -0.860 2.135 23 186 > > > > 14 0.23 0.882 10.9 22 0.437 -0.905 2.135 21 186 > > > > 15 0.24 0.882 0.4 1 0.353 -Inf 2.135 0 186 > > > > ... > > > > 20 0.29 0.882 0.4 1 0.353 -Inf 2.135 0 186 > > > > 21 0.30 0.882 0 0 0 -Inf Inf 0 187 > > > > 22 0.31 0.882 0 0 0 -Inf Inf 0 187 > > > > > > > > I get here a very high FDR value for very few miRNAs. > > > > > > > > For my second run I added the parameter R.fold = 1.2, to exclude all > > > > miRNAs > > > > under this value from the analysis. > > > > Here I get much better results with lower FDR values: > > > > > > > > Number of variables having a fold change >= 1.2 or <= 0.8333 : 90 > > > > > > > > s0 = 0.0477 (The 0 % quantile of the s values.) > > > > > > > > Delta p0 False Called FDR cutlow cutup j2 j1 > > > > 1 0.1 0.378 55.5 82 0.2557 -0.708 0.425 45 54 > > > > 2 0.2 0.378 52.25 80 0.2467 -0.708 0.562 45 56 > > > > 3 0.3 0.378 48.4 79 0.2314 -0.708 0.676 45 57 > > > > 4 0.5 0.378 24.9 48 0.1960 -0.708 2.504 45 88 > > > > 5 0.6 0.378 23.7 45 0.1990 -0.708 Inf 45 91 > > > > 6 0.7 0.378 10.65 31 0.1298 -1.170 Inf 31 91 > > > > 7 0.8 0.378 6.75 25 0.1020 -1.513 Inf 25 91 > > > > 8 1.0 0.378 0.85 4 0.0803 -2.712 Inf 4 91 > > > > 9 1.1 0.378 0.25 1 0.0944 -4.810 Inf 1 91 > > > > 10 1.2 0.378 0 0 0 -Inf Inf 0 91 > > > > > > > > I than ran a third sam with the parameter use.dm=FALSE. This gave me > > > again > > > > different results with much lower FDR values: > > > > > > > > Number of variables having a fold change >= 1.2 or <= 0.8333 : 97 > > > > > > > > s0 = 0.1488 (The 5 % quantile of the s values.) > > > > > > > > Delta p0 False Called FDR cutlow cutup j2 j1 > > > > 1 0.1 0.009 53.5 81 0.00612 -0.541 0.545 48 65 > > > > 2 0.2 0.009 50.65 80 0.00587 -0.541 0.616 48 66 > > > > 3 0.3 0.009 26.85 48 0.00519 -0.541 Inf 48 98 > > > > 4 0.4 0.009 26.85 48 0.00519 -0.541 Inf 48 98 > > > > 5 0.5 0.009 26.85 48 0.00519 -0.541 Inf 48 98 > > > > 6 0.6 0.009 0 0 0 -Inf Inf 0 98 > > > > 7 0.7 0.009 0 0 0 -Inf Inf 0 98 > > > > 8 0.8 0.009 0 0 0 -Inf Inf 0 98 > > > > 9 0.9 0.009 0 0 0 -Inf Inf 0 98 > > > > 10 1.0 0.009 0 0 0 -Inf Inf 0 98 > > > > > > > > > > > > As you can see my obvious problem. Which is the better (right???) > way > > of > > > > running this analysis. > > > > I think, to understand that, one needs to know how SAM works. I don't > > > > understand why I get such different results in the FDR between the > > three > > > > possibilities. > > > > > > > > As far as I understand it, the lower the FDR value, the better. But > > how > > > > can > > > > it be, that I have in the third run over 80 miRNAs with such a low > > FDR, > > > > while I have only few DE miRNAs in the first run with a much higher > > FDR > > > > value? > > > > > > > > I would appreciate any help, > > > > > > > > thanks > > > > Assa > > > > > > -- > > > Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir > > > belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de > > > > > -- > NEU: FreePhone - kostenlos mobil telefonieren! > Jetzt informieren: http://www.gmx.net/de/go/freephone > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 431 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6