get full variance per gene from DESeq
1
0
Entering edit mode
@wolfgang-huber-3550
Last seen 4 months ago
EMBL European Molecular Biology Laborat…
Hi Graham an alternative, and maybe easier and slightly more consistent approach might be to use the p-values (and the sign of the foldChange) to transform to z-scores, perhaps like this: with(BNevBTG, qnorm(pval/2) * sign(log2FoldChange) ) PS: "more consistent" because the residuals distribution (of the counts is skewed), while the p-values are uniform under the null. Best wishes Wolfgang On Jul/30/10 11:47 AM, Graham Thomas wrote: > Hi All, > > I am wondering whether there is an easy way of obtaining the full variance > for a given gene and condition from my DESeq analysis? > > When I take a look at my results I end up with a table like so): > > > head ( BNevBTG ) > id baseMean baseMeanA baseMeanB foldChange log2FoldChange > 1 ENSMUSG00000001627 162.62034 119.50785 205.73284 1.7215006 0.78366671 > 2 ENSMUSG00000001630 45.51063 41.94099 49.08027 1.1702220 0.22678230 > 3 ENSMUSG00000001632 1626.19532 1328.32256 1924.06807 1.4484946 0.53455431 > 4 ENSMUSG00000001642 54.09075 54.53378 53.64772 0.9837521 -0.02363326 > 5 ENSMUSG00000001655 0.00000 0.00000 0.00000 NaN NaN > 6 ENSMUSG00000001656 0.00000 0.00000 0.00000 NaN NaN > pval padj resVarA resVarB > 1 9.930922e-05 0.0005666467 0.4043083 2.7960349 > 2 4.628048e-01 0.6573258560 0.1815569 0.5295679 > 3 4.216269e-04 0.0021251136 0.3436054 0.3589968 > 4 9.559678e-01 1.0000000000 1.0424044 0.8391154 > 5 NA NA NA NA > 6 NA NA NA NA > > > I would like to transform my results into z-scores for GSEA purposes. As > far as I understand in order to do this I require my baseMean value > (which I have), my baseMeanA(and B) values, and the full variance (which > I want). > > It may be noted that my practical statistics knowledge is HEAVILY > limited so any helpp at all here is greatly appreiciated! > > > Regards, > Graham > > -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
DESeq DESeq • 1.3k views
ADD COMMENT
0
Entering edit mode
@graham-thomas-4192
Last seen 10.4 years ago
Hi Wolfgang, Thank you for your reply to my previous message. Due to the low number of biological conditions I am comparing, and the algorithm I would ideally like to use (that proposed in Jiang and Gentleman (2007)) I am inclined to think that I will obtain greater power if I can obtain a single z-score for each gene and each replicate (6 measurements) rather than per replicate (which is 2). As far as I'm aware 3 measurements per group will improve my power in permutation testing during GSEA, if this is not the case then I may test using the z-scores obtained as suggested with the following: with(BNevBTG, qnorm(pval/2) * sign(log2FoldChange) ) Is this correct? Otherwise, I require a method to obtain an appropriate variance estimate for each gene per condition. I can outline the approach I am considering (and the problem I have) with the following argument:- geneA geneB 1) 50 200 2) 55 150 3) 45 175 My thinking is that in order to obtain appropriate z-scores for each measurement geneA[1,] -> geneA[3,] (and, of course, as appropriate for geneB) I may use the following: vf <- rawVarFunc( cds, "geneA)" ) And then add the variance due to the Poisson process (this is what I am unclear about how to obtain). With this I can use baseMeanA to generate z-scores for each measurement in the group. As a sanity check at this point it may not be absurd to do some kind of non-specific filtering based on the residuals I get? Is this a reasonable approach to calculating my z-scores and, if so, how do I get the variance due to the counting process? If not, any comments or suggestions on how to proceed would be greatly appreciated! Apologies for the verbose explanation! Regards, Graham On 02/08/10 11:00, bioconductor-request at stat.math.ethz.ch wrote: > Message: 2 Date: Sun, 01 Aug 2010 22:50:57 +0200 From: Wolfgang Huber > <whuber at="" embl.de=""> To: bioconductor at stat.math.ethz.ch Subject: Re: > [BioC] get full variance per gene from DESeq Message-ID: > <4C55DE31.1010700 at embl.de> Content-Type: text/plain; > charset=ISO-8859-1; format=flowed Hi Graham an alternative, and maybe > easier and slightly more consistent approach might be to use the > p-values (and the sign of the foldChange) to transform to z-scores, > perhaps like this: with(BNevBTG, qnorm(pval/2) * sign(log2FoldChange) > ) PS: "more consistent" because the residuals distribution (of the > counts is skewed), while the p-values are uniform under the null. Best > wishes Wolfgang On Jul/30/10 11:47 AM, Graham Thomas wrote: >> > Hi All, >> > >> > I am wondering whether there is an easy way of obtaining the full variance >> > for a given gene and condition from my DESeq analysis? >> > >> > When I take a look at my results I end up with a table like so): >> > >> > > head ( BNevBTG ) >> > id baseMean baseMeanA baseMeanB foldChange log2FoldChange >> > 1 ENSMUSG00000001627 162.62034 119.50785 205.73284 1.7215006 0.78366671 >> > 2 ENSMUSG00000001630 45.51063 41.94099 49.08027 1.1702220 0.22678230 >> > 3 ENSMUSG00000001632 1626.19532 1328.32256 1924.06807 1.4484946 0.53455431 >> > 4 ENSMUSG00000001642 54.09075 54.53378 53.64772 0.9837521 -0.02363326 >> > 5 ENSMUSG00000001655 0.00000 0.00000 0.00000 NaN NaN >> > 6 ENSMUSG00000001656 0.00000 0.00000 0.00000 NaN NaN >> > pval padj resVarA resVarB >> > 1 9.930922e-05 0.0005666467 0.4043083 2.7960349 >> > 2 4.628048e-01 0.6573258560 0.1815569 0.5295679 >> > 3 4.216269e-04 0.0021251136 0.3436054 0.3589968 >> > 4 9.559678e-01 1.0000000000 1.0424044 0.8391154 >> > 5 NA NA NA NA >> > 6 NA NA NA NA >> > >> > >> > I would like to transform my results into z-scores for GSEA purposes. As >> > far as I understand in order to do this I require my baseMean value >> > (which I have), my baseMeanA(and B) values, and the full variance (which >> > I want). >> > >> > It may be noted that my practical statistics knowledge is HEAVILY >> > limited so any helpp at all here is greatly appreiciated! >> > >> > >> > Regards, >> > Graham >> > >> > >> > -- Wolfgang Huber EMBL > http://www.embl.de/research/units/genome_biology/huber > ------------------------------ -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
ADD COMMENT
0
Entering edit mode
Graham the approach you describe below seems reasonable. I think that an essentially equivalent, and perhaps operationally easier way to get there is to use the data-dependent variance stabilising transformation provided by DESeq (see the vignette). The result of that should be a reasonable input for gene set enrichment computations. A sample size of 6 is still pretty small for permutation testing, perhaps you will need to do something parametric. Wolfgang On 05/08/10 12:55, Graham Thomas wrote: > Hi Wolfgang, > > Thank you for your reply to my previous message. Due to the low number > of biological conditions I am comparing, and the algorithm I would > ideally like to use (that proposed in Jiang and Gentleman (2007)) I am > inclined to think that I will obtain greater power if I can obtain a > single z-score for each gene and each replicate (6 measurements) rather > than per replicate (which is 2). > > As far as I'm aware 3 measurements per group will improve my power in > permutation testing during GSEA, if this is not the case then I may test > using the z-scores obtained as suggested with the following: > > with(BNevBTG, > qnorm(pval/2) * sign(log2FoldChange) > ) > > Is this correct? > > Otherwise, I require a method to obtain an appropriate variance estimate > for each gene per condition. I can outline the approach I am considering > (and the problem I have) with the following argument:- > > geneA geneB > 1) 50 200 > 2) 55 150 > 3) 45 175 > > My thinking is that in order to obtain appropriate z-scores for each > measurement geneA[1,] -> geneA[3,] (and, of course, as appropriate for > geneB) I may use the following: > > vf <- rawVarFunc( cds, "geneA)" ) > > And then add the variance due to the Poisson process (this is what I am > unclear about how to obtain). With this I can use baseMeanA to generate > z-scores for each measurement in the group. As a sanity check at this > point it may not be absurd to do some kind of non-specific filtering > based on the residuals I get? > > Is this a reasonable approach to calculating my z-scores and, if so, how > do I get the variance due to the counting process? If not, any comments > or suggestions on how to proceed would be greatly appreciated! Apologies > for the verbose explanation! > > Regards, > Graham > > > > On 02/08/10 11:00, bioconductor-request at stat.math.ethz.ch wrote: >> Message: 2 Date: Sun, 01 Aug 2010 22:50:57 +0200 From: Wolfgang Huber >> <whuber at="" embl.de=""> To: bioconductor at stat.math.ethz.ch Subject: Re: >> [BioC] get full variance per gene from DESeq Message-ID: >> <4C55DE31.1010700 at embl.de> Content-Type: text/plain; >> charset=ISO-8859-1; format=flowed Hi Graham an alternative, and maybe >> easier and slightly more consistent approach might be to use the >> p-values (and the sign of the foldChange) to transform to z-scores, >> perhaps like this: with(BNevBTG, qnorm(pval/2) * sign(log2FoldChange) >> ) PS: "more consistent" because the residuals distribution (of the >> counts is skewed), while the p-values are uniform under the null. Best >> wishes Wolfgang On Jul/30/10 11:47 AM, Graham Thomas wrote: >>> > Hi All, >>> > >>> > I am wondering whether there is an easy way of obtaining the full >>> variance >>> > for a given gene and condition from my DESeq analysis? >>> > >>> > When I take a look at my results I end up with a table like so): >>> > >>> > > head ( BNevBTG ) >>> > id baseMean baseMeanA baseMeanB foldChange log2FoldChange >>> > 1 ENSMUSG00000001627 162.62034 119.50785 205.73284 1.7215006 >>> 0.78366671 >>> > 2 ENSMUSG00000001630 45.51063 41.94099 49.08027 1.1702220 0.22678230 >>> > 3 ENSMUSG00000001632 1626.19532 1328.32256 1924.06807 1.4484946 >>> 0.53455431 >>> > 4 ENSMUSG00000001642 54.09075 54.53378 53.64772 0.9837521 -0.02363326 >>> > 5 ENSMUSG00000001655 0.00000 0.00000 0.00000 NaN NaN >>> > 6 ENSMUSG00000001656 0.00000 0.00000 0.00000 NaN NaN >>> > pval padj resVarA resVarB >>> > 1 9.930922e-05 0.0005666467 0.4043083 2.7960349 >>> > 2 4.628048e-01 0.6573258560 0.1815569 0.5295679 >>> > 3 4.216269e-04 0.0021251136 0.3436054 0.3589968 >>> > 4 9.559678e-01 1.0000000000 1.0424044 0.8391154 >>> > 5 NA NA NA NA >>> > 6 NA NA NA NA >>> > >>> > >>> > I would like to transform my results into z-scores for GSEA >>> purposes. As >>> > far as I understand in order to do this I require my baseMean value >>> > (which I have), my baseMeanA(and B) values, and the full variance >>> (which >>> > I want). >>> > >>> > It may be noted that my practical statistics knowledge is HEAVILY >>> > limited so any helpp at all here is greatly appreiciated! >>> > >>> > >>> > Regards, >>> > Graham >>> > >>> > >> -- Wolfgang Huber EMBL >> http://www.embl.de/research/units/genome_biology/huber >> ------------------------------ > > -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
ADD REPLY

Login before adding your answer.

Traffic: 593 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6