working with two factors in DESeq to analyze RNA-seq
2
0
Entering edit mode
@biase-fernando-4475
Last seen 9.7 years ago
Dear Dr Anders, Based on my readings I am willing to use DESeq to analyze my RNA-seq data. I found it very easy to follow your instructions written on the package vignette, and to work with it for one factor comparison. However, I am not sure if I am using it correctly to account for a second factor in the model. So, here are my two questions: How do I compare two groups, accounting for a second factor in the model? (it is not a factorial model that I am looking for) Can I simply write: The condition names as "Ax", "Ax", "Ax", "Ax", "Ay", "Ay", "Ay", "Bx", "Bx", "Bx", "By", "By", "By" nbinomTest ( cds, “A”, “B”) being A/B may main factor and x/y my secondary factor? That is what I understood from the page 3 of the vignette “> conds <- c( "T", "T", "T", "Tb", "N", "N" ) where \T" stands for a sample derived from a certain tumour type and \N" for a sample derived from non-pathological tissue. The rst three samples had a very similar histopathological phe- notype, while the fourth sample was atypical, and hence, we assign it another condition (\Tb"). ” Can I test for interaction effect between two factors? Thanks so much, Fernando Research Associate - PhD ---------------------------------------------------------------------- - University of Illinois at Urbana-Champaign Institute for Genomic Biology Principal Investigator: Prof Dr Harris Lewin 210 Edward R. Madigan Laboratory 1201 W. Gregory Drive Urbana, IL 61801 Phone: (217) 244-6745 E-mail: biase@illinois.edu ---------------------------------------------------------------------- - [[alternative HTML version deleted]]
DESeq ASSIGN DESeq ASSIGN • 1.3k views
ADD COMMENT
0
Entering edit mode
Simon Anders ★ 3.7k
@simon-anders-3855
Last seen 3.8 years ago
Zentrum für Molekularbiologie, Universi…
Dear Fernando On 02/08/2011 06:02 PM, Biase, Fernando wrote: > Dear Dr Anders, > > Based on my readings I am willing to use DESeq to analyze my RNA-seq > data. I found it very easy to follow your instructions written on the > package vignette, and to work with it for one factor comparison. > However, I am not sure if I am using it correctly to account for a > second factor in the model. So, here are my two questions: > > How do I compare two groups, accounting for a second factor in the > model? (it is not a factorial model that I am looking for) > > Can I simply write: The condition names as "Ax", "Ax", "Ax", "Ax", > "Ay", "Ay", "Ay", "Bx", "Bx", "Bx", "By", "By", "By" > > nbinomTest ( cds, ?A?, ?B?) > > being A/B may main factor and x/y my secondary factor? [...] Not quite. You need to use the new GLM functionality that I added to DESeq in the newest release (current version: 1.2.1). First, when calling 'newCountDataSet', pass for 'condition' not a vector but a data frame with the design, i.e., something like this: design <- data.frame( treatment = c( "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B" ), block = c( "x", "x", "x", "x", "y", "y", "y", "x", "x", "x", "y", "y", "y" ) ) cds <- newCountDataSet( countTable, design ) Then, estimate the size factors and the variance functions, the later using the new "pooled" mode: cds <- estimateSizeFactors( cds) cds <- estimateVarianceFunctions( cds, method="pooled" ) Now, you can fit two models, a null model, which only account for your blocking factor, and the full model, which also takes into account your factor of interest: fit0 <- nbinomFitGLM( cds, count ~ block ) fit1 <- nbinomFitGLM( cds, count ~ block + treatment ) The fitting might take a few minutes. Once it is finished, you get your p values with a chi-squared test: pvals <- nbinomGLMTest( fit1, fit0 ) These p values inform you whether the treatment has a significant effect on the counts, after the effect of the blocking factor has been taken into account. Remember to adjust the p values for multiple testing before you look for hits: padj <- p.adjust( pvals, method="BH" ) If your count table had gene symbols in the row names, you can now list your hits: rownames(counts(cds))[ padj < .1 ] This new functionality is not yet properly documented in the vignette because I have not yet found a public data set that would be suitable to demonstrate it. Hence, as it has not been used that widely, I'd be interested to hear how well it works for you. Best regards Simon
ADD COMMENT
0
Entering edit mode
Thanks very much Dr. Anders, If you had sequences from may samples (RNA-seq), would you consider including the library batch, flow cell and lane for the GLM? Any input will be very much appreciated. Thanks, Fernando -----Original Message----- From: Simon Anders [mailto:anders@embl.de] Sent: Wednesday, February 09, 2011 4:54 AM To: Biase, Fernando; bioconductor@r-project.org Subject: Re: [BioC] working with two factors in DESeq to analyze RNA- seq Dear Fernando On 02/08/2011 06:02 PM, Biase, Fernando wrote: > Dear Dr Anders, > > Based on my readings I am willing to use DESeq to analyze my RNA-seq > data. I found it very easy to follow your instructions written on the > package vignette, and to work with it for one factor comparison. > However, I am not sure if I am using it correctly to account for a > second factor in the model. So, here are my two questions: > > How do I compare two groups, accounting for a second factor in the > model? (it is not a factorial model that I am looking for) > > Can I simply write: The condition names as "Ax", "Ax", "Ax", "Ax", > "Ay", "Ay", "Ay", "Bx", "Bx", "Bx", "By", "By", "By" > > nbinomTest ( cds, "A", "B") > > being A/B may main factor and x/y my secondary factor? [...] Not quite. You need to use the new GLM functionality that I added to DESeq in the newest release (current version: 1.2.1). First, when calling 'newCountDataSet', pass for 'condition' not a vector but a data frame with the design, i.e., something like this: design <- data.frame( treatment = c( "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B" ), block = c( "x", "x", "x", "x", "y", "y", "y", "x", "x", "x", "y", "y", "y" ) ) cds <- newCountDataSet( countTable, design ) Then, estimate the size factors and the variance functions, the later using the new "pooled" mode: cds <- estimateSizeFactors( cds) cds <- estimateVarianceFunctions( cds, method="pooled" ) Now, you can fit two models, a null model, which only account for your blocking factor, and the full model, which also takes into account your factor of interest: fit0 <- nbinomFitGLM( cds, count ~ block ) fit1 <- nbinomFitGLM( cds, count ~ block + treatment ) The fitting might take a few minutes. Once it is finished, you get your p values with a chi-squared test: pvals <- nbinomGLMTest( fit1, fit0 ) These p values inform you whether the treatment has a significant effect on the counts, after the effect of the blocking factor has been taken into account. Remember to adjust the p values for multiple testing before you look for hits: padj <- p.adjust( pvals, method="BH" ) If your count table had gene symbols in the row names, you can now list your hits: rownames(counts(cds))[ padj < .1 ] This new functionality is not yet properly documented in the vignette because I have not yet found a public data set that would be suitable to demonstrate it. Hence, as it has not been used that widely, I'd be interested to hear how well it works for you. Best regards Simon [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 1 minute ago
WEHI, Melbourne, Australia
Dear Fernando, If you're willing read yet another vignette, Section 11 of the edgeR vignette http://bioconductor.org/packages/2.7/bioc/vignettes/edgeR/inst/doc/edg eR.pdf gives a case study explaining how to do exactly what you ask, comparing two groups while accounting for a second factor. You can include any number of factors, test for interactions, and so on. Best wishes Gordon > Date: Tue, 8 Feb 2011 11:02:35 -0600 > From: "Biase, Fernando" <biase at="" illinois.edu=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Subject: [BioC] working with two factors in DESeq to analyze RNA-seq > > Dear Dr Anders, > > Based on my readings I am willing to use DESeq to analyze my RNA-seq > data. I found it very easy to follow your instructions written on the > package vignette, and to work with it for one factor comparison. > However, I am not sure if I am using it correctly to account for a > second factor in the model. So, here are my two questions: > > How do I compare two groups, accounting for a second factor in the > model? (it is not a factorial model that I am looking for) > > Can I simply write: > The condition names as > "Ax", "Ax", "Ax", "Ax", "Ay", "Ay", "Ay", "Bx", "Bx", "Bx", "By", "By", "By" > > nbinomTest ( cds, "A", "B") > > being A/B may main factor and x/y my secondary factor? > > That is what I understood from the page 3 of the vignette > > "> conds <- c( "T", "T", "T", "Tb", "N", "N" ) > where \T" stands for a sample derived from a certain tumour type and \N" for a sample derived > from non-pathological tissue. The > rst three samples had a very similar histopathological phe- > notype, while the fourth sample was atypical, and hence, we assign it another condition (\Tb"). > " > > Can I test for interaction effect between two factors? > > Thanks so much, > Fernando > > Research Associate - PhD > -------------------------------------------------------------------- --- > University of Illinois at Urbana-Champaign > Institute for Genomic Biology > Principal Investigator: Prof Dr Harris Lewin > > 210 Edward R. Madigan Laboratory > 1201 W. Gregory Drive > Urbana, IL 61801 > Phone: (217) 244-6745 E-mail: biase at illinois.edu > -------------------------------------------------------------------- --- ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT

Login before adding your answer.

Traffic: 780 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6