DESeq2 get coefficients for each variable in the design formula
1
0
Entering edit mode
@tran-nhu-quynh-t-6628
Last seen 9.6 years ago
Hi, I'm working on a RNA-seq data set and would like to control for age or bmi in the model together with the disease status, which I am able to do. My question is how can I get the coefficients for both age and the disease together? what I did was #Adjust for age: cds.acromegaly.age = DESeqDataSetFromMatrix(countData=acromegaly.protein.coding, colData=acromegaly.mapping, design=~age+group) cds.acromegaly.age$age <- relevel(cds.acromegaly.age$age, "(0,40]") acromegaly.cds.age <- DESeq(cds.acromegaly.age) acromegaly.results.age <- results(acromegaly.cds.age) #Get the results for group (or disease): acromegaly.results.age <- results(acromegaly.cds.age) #Get the results for age: acro.age.effect <- results(acromegaly.cds.age, contrast=c("age","(40.60]", "(0.40]")) sum(acro.age.effect$padj<0.05, na.rm=TRUE) Then I merge those two results to get the genes that are affected by age or affected by group, and affected by both. Is there another way? Is there coefficients for each of the independent variables. I saw only log2FCs and p-values. acro.age.combined <- merge(acromegaly.results.age, acro.age.effect, by.x="row.names", by.y="row.names") Thanks, Quynh
• 3.1k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 8 hours ago
United States
hi Quynh, The default pipeline is for testing individual coefficients, and the code you have is fine, defining two sets based on the two tests, and look at the union of these. It is also possible to perform a statistical test of both at once, using a likelihood ratio test. There is a section in the vignette explaining the LRT. This would look like (using dds to shorten your object name): dds <- DESeq(dds, test="LRT", reduced=~1) res <- results(dds) You should then add the two log fold change columns from the results tables you have above as columns to this new results table. (I'm still working in the devel branch on making this part simpler, specifying contrast LFCs to add to LRT results tables). Mike On Tue, Jul 1, 2014 at 2:03 PM, Tran, Nhu Quynh T <qtran1 at="" uthsc.edu=""> wrote: > Hi, > > I'm working on a RNA-seq data set and would like to control for age or bmi in the model together with the disease status, which I am able to do. My question is how can I get the coefficients for both age and the disease together? what I did was > > #Adjust for age: > cds.acromegaly.age = DESeqDataSetFromMatrix(countData=acromegaly.protein.coding, colData=acromegaly.mapping, design=~age+group) > cds.acromegaly.age$age <- relevel(cds.acromegaly.age$age, "(0,40]") > acromegaly.cds.age <- DESeq(cds.acromegaly.age) > acromegaly.results.age <- results(acromegaly.cds.age) > > #Get the results for group (or disease): > acromegaly.results.age <- results(acromegaly.cds.age) > > #Get the results for age: > acro.age.effect <- results(acromegaly.cds.age, contrast=c("age","(40.60]", "(0.40]")) > sum(acro.age.effect$padj<0.05, na.rm=TRUE) > > Then I merge those two results to get the genes that are affected by age or affected by group, and affected by both. Is there another way? Is there coefficients for each of the independent variables. I saw only log2FCs and p-values. > acro.age.combined <- merge(acromegaly.results.age, acro.age.effect, by.x="row.names", by.y="row.names") > > Thanks, > Quynh > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi Mike, What do you mean by "testing individual coefficients"? I thought that when adding age in the design formula, the model will test the coefficient of GROUP in the presence of AGE, to check if age plays a confounding role in the changes in gene expression (if any) between the two groups. Quynh On Jul 1, 2014, at 1:18 PM, Michael Love wrote: > hi Quynh, > > The default pipeline is for testing individual coefficients, and the > code you have is fine, defining two sets based on the two tests, and > look at the union of these. > > It is also possible to perform a statistical test of both at once, > using a likelihood ratio test. There is a section in the vignette > explaining the LRT. > > This would look like (using dds to shorten your object name): > > dds <- DESeq(dds, test="LRT", reduced=~1) > res <- results(dds) > > You should then add the two log fold change columns from the results > tables you have above as columns to this new results table. (I'm still > working in the devel branch on making this part simpler, specifying > contrast LFCs to add to LRT results tables). > > Mike > > On Tue, Jul 1, 2014 at 2:03 PM, Tran, Nhu Quynh T <qtran1 at="" uthsc.edu=""> wrote: >> Hi, >> >> I'm working on a RNA-seq data set and would like to control for age or bmi in the model together with the disease status, which I am able to do. My question is how can I get the coefficients for both age and the disease together? what I did was >> >> #Adjust for age: >> cds.acromegaly.age = DESeqDataSetFromMatrix(countData=acromegaly.protein.coding, colData=acromegaly.mapping, design=~age+group) >> cds.acromegaly.age$age <- relevel(cds.acromegaly.age$age, "(0,40]") >> acromegaly.cds.age <- DESeq(cds.acromegaly.age) >> acromegaly.results.age <- results(acromegaly.cds.age) >> >> #Get the results for group (or disease): >> acromegaly.results.age <- results(acromegaly.cds.age) >> >> #Get the results for age: >> acro.age.effect <- results(acromegaly.cds.age, contrast=c("age","(40.60]", "(0.40]")) >> sum(acro.age.effect$padj<0.05, na.rm=TRUE) >> >> Then I merge those two results to get the genes that are affected by age or affected by group, and affected by both. Is there another way? Is there coefficients for each of the independent variables. I saw only log2FCs and p-values. >> acro.age.combined <- merge(acromegaly.results.age, acro.age.effect, by.x="row.names", by.y="row.names") >> >> Thanks, >> Quynh >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
hi Quynh, On Tue, Jul 1, 2014 at 2:38 PM, Tran, Nhu Quynh T <qtran1 at="" uthsc.edu=""> wrote: > Hi Mike, > > What do you mean by "testing individual coefficients"? I thought that when adding age in the design formula, the model will test the coefficient of GROUP in the presence of AGE, to check if age plays a confounding role in the changes in gene expression (if any) between the two groups. Yes, when you have both terms in the design, the model fit by DESeq() uses both terms, and the LFCs are different than the ones you would get with ~ age or ~ group alone. By "testing individual coefficients", I meant that, when you call results() after fitting this model with both terms, and you specify the group term, you are pulling out the group term and testing if it is equal to 0. > > Quynh > > On Jul 1, 2014, at 1:18 PM, Michael Love wrote: > >> hi Quynh, >> >> The default pipeline is for testing individual coefficients, and the >> code you have is fine, defining two sets based on the two tests, and >> look at the union of these. >> >> It is also possible to perform a statistical test of both at once, >> using a likelihood ratio test. There is a section in the vignette >> explaining the LRT. >> >> This would look like (using dds to shorten your object name): >> >> dds <- DESeq(dds, test="LRT", reduced=~1) >> res <- results(dds) >> >> You should then add the two log fold change columns from the results >> tables you have above as columns to this new results table. (I'm still >> working in the devel branch on making this part simpler, specifying >> contrast LFCs to add to LRT results tables). >> >> Mike >> >> On Tue, Jul 1, 2014 at 2:03 PM, Tran, Nhu Quynh T <qtran1 at="" uthsc.edu=""> wrote: >>> Hi, >>> >>> I'm working on a RNA-seq data set and would like to control for age or bmi in the model together with the disease status, which I am able to do. My question is how can I get the coefficients for both age and the disease together? what I did was >>> >>> #Adjust for age: >>> cds.acromegaly.age = DESeqDataSetFromMatrix(countData=acromegaly.protein.coding, colData=acromegaly.mapping, design=~age+group) >>> cds.acromegaly.age$age <- relevel(cds.acromegaly.age$age, "(0,40]") >>> acromegaly.cds.age <- DESeq(cds.acromegaly.age) >>> acromegaly.results.age <- results(acromegaly.cds.age) >>> >>> #Get the results for group (or disease): >>> acromegaly.results.age <- results(acromegaly.cds.age) >>> >>> #Get the results for age: >>> acro.age.effect <- results(acromegaly.cds.age, contrast=c("age","(40.60]", "(0.40]")) >>> sum(acro.age.effect$padj<0.05, na.rm=TRUE) >>> >>> Then I merge those two results to get the genes that are affected by age or affected by group, and affected by both. Is there another way? Is there coefficients for each of the independent variables. I saw only log2FCs and p-values. >>> acro.age.combined <- merge(acromegaly.results.age, acro.age.effect, by.x="row.names", by.y="row.names") >>> >>> Thanks, >>> Quynh >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Thank you, Mike. I'll try LRT next. QT On Jul 1, 2014, at 1:57 PM, Michael Love wrote: hi Quynh, On Tue, Jul 1, 2014 at 2:38 PM, Tran, Nhu Quynh T <qtran1@uthsc.edu<mailto:qtran1@uthsc.edu>> wrote: Hi Mike, What do you mean by "testing individual coefficients"? I thought that when adding age in the design formula, the model will test the coefficient of GROUP in the presence of AGE, to check if age plays a confounding role in the changes in gene expression (if any) between the two groups. Yes, when you have both terms in the design, the model fit by DESeq() uses both terms, and the LFCs are different than the ones you would get with ~ age or ~ group alone. By "testing individual coefficients", I meant that, when you call results() after fitting this model with both terms, and you specify the group term, you are pulling out the group term and testing if it is equal to 0. Quynh On Jul 1, 2014, at 1:18 PM, Michael Love wrote: hi Quynh, The default pipeline is for testing individual coefficients, and the code you have is fine, defining two sets based on the two tests, and look at the union of these. It is also possible to perform a statistical test of both at once, using a likelihood ratio test. There is a section in the vignette explaining the LRT. This would look like (using dds to shorten your object name): dds <- DESeq(dds, test="LRT", reduced=~1) res <- results(dds) You should then add the two log fold change columns from the results tables you have above as columns to this new results table. (I'm still working in the devel branch on making this part simpler, specifying contrast LFCs to add to LRT results tables). Mike On Tue, Jul 1, 2014 at 2:03 PM, Tran, Nhu Quynh T <qtran1@uthsc.edu<mailto:qtran1@uthsc.edu>> wrote: Hi, I'm working on a RNA-seq data set and would like to control for age or bmi in the model together with the disease status, which I am able to do. My question is how can I get the coefficients for both age and the disease together? what I did was #Adjust for age: cds.acromegaly.age = DESeqDataSetFromMatrix(countData=acromegaly.protein.coding, colData=acromegaly.mapping, design=~age+group) cds.acromegaly.age$age <- relevel(cds.acromegaly.age$age, "(0,40]") acromegaly.cds.age <- DESeq(cds.acromegaly.age) acromegaly.results.age <- results(acromegaly.cds.age) #Get the results for group (or disease): acromegaly.results.age <- results(acromegaly.cds.age) #Get the results for age: acro.age.effect <- results(acromegaly.cds.age, contrast=c("age","(40.60]", "(0.40]")) sum(acro.age.effect$padj<0.05, na.rm=TRUE) Then I merge those two results to get the genes that are affected by age or affected by group, and affected by both. Is there another way? Is there coefficients for each of the independent variables. I saw only log2FCs and p-values. acro.age.combined <- merge(acromegaly.results.age, acro.age.effect, by.x="row.names", by.y="row.names") Thanks, Quynh _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 684 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6