Search
Question: DESeq2 in R_devel: a bug?
0
gravatar for Sylvain Foisy
3.7 years ago by
Sylvain Foisy80 wrote:
Hi, I used DEseq2 v. 1.3 under R-devel because I have many different values for my design (celltype) and it worked ok. I upgraded my libraries this morning (bad,bad,bad?) and now it is not working. Might this be a bug? > exp.data<-read.table("./count_table_by_gene.txt",header=T,row.names=1) > exp.ann<-read.table("./desc_colonnes.txt",header=T) > genex<-DESeqDataSetFromMatrix( + countData=exp.data, + colData=exp.ann, + design=~ celltype) > genex class: DESeqDataSet dim: 25192 103 exptData(0): assays(1): counts rownames(25192): A1BG A1BG-AS1 ... ZZEF1 ZZZ3 rowData metadata column names(0): colnames(103): 169377_LYMPHO_B_accepted_hits.bam 169377_celltype1_accepted_hits.bam ... 999046_celltype_m_accepted_hits.bam 999046_celltype_n_accepted_hits.bam colData names(2): sizeFactor celltype > genex<-DESeq(genex) using pre-existing size factors estimating dispersions gene-wise dispersion estimates Error in relevel.factor(coldata[[v]], as.character(coldata[[v]][idx])) : 'ref' must be of length one My count files were created using easyRNASeq-stable. Thanks in advance Sylvain
ADD COMMENTlink modified 3.7 years ago by Michael Love15k • written 3.7 years ago by Sylvain Foisy80
0
gravatar for Michael Love
3.7 years ago by
Michael Love15k
United States
Michael Love15k wrote:
Hi Sylvain, Thanks for posting this bug. I should be able to commit a simple change today, which fixes this. Mike On Wed, Mar 5, 2014 at 10:42 AM, Sylvain Foisy Ph. D. < sylvain.foisy@diploide.net> wrote: > Hi, > > I used DEseq2 v. 1.3 under R-devel because I have many different values > for my design (celltype) and it worked ok. I upgraded my libraries this > morning (bad,bad,bad…) and now it is not working. Might this be a bug? > > > exp.data<-read.table("./count_table_by_gene.txt",header=T,row.names=1) > > exp.ann<-read.table("./desc_colonnes.txt",header=T) > > genex<-DESeqDataSetFromMatrix( > + countData=exp.data, > + colData=exp.ann, > + design=~ celltype) > > genex > class: DESeqDataSet > dim: 25192 103 > exptData(0): > assays(1): counts > rownames(25192): A1BG A1BG-AS1 ... ZZEF1 ZZZ3 > rowData metadata column names(0): > colnames(103): 169377_LYMPHO_B_accepted_hits.bam > 169377_celltype1_accepted_hits.bam ... > 999046_celltype_m_accepted_hits.bam > 999046_celltype_n_accepted_hits.bam > colData names(2): sizeFactor celltype > > genex<-DESeq(genex) > using pre-existing size factors > estimating dispersions > gene-wise dispersion estimates > Error in relevel.factor(coldata[[v]], as.character(coldata[[v]][idx])) : > 'ref' must be of length one > > My count files were created using easyRNASeq-stable. > > Thanks in advance > > Sylvain > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENTlink written 3.7 years ago by Michael Love15k
Hi Michael, Cool ;-) Thanks for the excellent work!! Best regards S On Mar 5, 2014, at 12:34 PM, Michael Love <michaelisaiahlove@gmail.com> wrote: > Hi Sylvain, > > Thanks for posting this bug. I should be able to commit a simple change today, which fixes this. > > Mike > > > On Wed, Mar 5, 2014 at 10:42 AM, Sylvain Foisy Ph. D. <sylvain.foisy@diploide.net> wrote: > Hi, > > I used DEseq2 v. 1.3 under R-devel because I have many different values for my design (celltype) and it worked ok. I upgraded my libraries this morning (bad,bad,bad ) and now it is not working. Might this be a bug? > > > exp.data<-read.table("./count_table_by_gene.txt",header=T,row.names=1) > > exp.ann<-read.table("./desc_colonnes.txt",header=T) > > genex<-DESeqDataSetFromMatrix( > + countData=exp.data, > + colData=exp.ann, > + design=~ celltype) > > genex > class: DESeqDataSet > dim: 25192 103 > exptData(0): > assays(1): counts > rownames(25192): A1BG A1BG-AS1 ... ZZEF1 ZZZ3 > rowData metadata column names(0): > colnames(103): 169377_LYMPHO_B_accepted_hits.bam > 169377_celltype1_accepted_hits.bam ... 999046_celltype_m_accepted_hits.bam > 999046_celltype_n_accepted_hits.bam > colData names(2): sizeFactor celltype > > genex<-DESeq(genex) > using pre-existing size factors > estimating dispersions > gene-wise dispersion estimates > Error in relevel.factor(coldata[[v]], as.character(coldata[[v]][idx])) : > 'ref' must be of length one > > My count files were created using easyRNASeq-stable. > > Thanks in advance > > Sylvain > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Sylvain Foisy80
hi Sylvain, I committed version 1.3.45 today, which should be available in 1 day. Please let me know if this clears up the error. thanks, Mike On Wed, Mar 5, 2014 at 1:13 PM, Sylvain Foisy Ph. D. < sylvain.foisy@diploide.net> wrote: > Hi Michael, > > Cool ;-) Thanks for the excellent work!! > > Best regards > > S > > On Mar 5, 2014, at 12:34 PM, Michael Love <michaelisaiahlove@gmail.com> > wrote: > > Hi Sylvain, > > Thanks for posting this bug. I should be able to commit a simple change > today, which fixes this. > > Mike > > > On Wed, Mar 5, 2014 at 10:42 AM, Sylvain Foisy Ph. D. < > sylvain.foisy@diploide.net> wrote: > >> Hi, >> >> I used DEseq2 v. 1.3 under R-devel because I have many different values >> for my design (celltype) and it worked ok. I upgraded my libraries this >> morning (bad,bad,bad…) and now it is not working. Might this be a bug? >> >> > exp.data<-read.table("./count_table_by_gene.txt",header=T,row.names=1) >> > exp.ann<-read.table("./desc_colonnes.txt",header=T) >> > genex<-DESeqDataSetFromMatrix( >> + countData=exp.data, >> + colData=exp.ann, >> + design=~ celltype) >> > genex >> class: DESeqDataSet >> dim: 25192 103 >> exptData(0): >> assays(1): counts >> rownames(25192): A1BG A1BG-AS1 ... ZZEF1 ZZZ3 >> rowData metadata column names(0): >> colnames(103): 169377_LYMPHO_B_accepted_hits.bam >> 169377_celltype1_accepted_hits.bam ... >> 999046_celltype_m_accepted_hits.bam >> 999046_celltype_n_accepted_hits.bam >> colData names(2): sizeFactor celltype >> > genex<-DESeq(genex) >> using pre-existing size factors >> estimating dispersions >> gene-wise dispersion estimates >> Error in relevel.factor(coldata[[v]], as.character(coldata[[v]][idx])) : >> 'ref' must be of length one >> >> My count files were created using easyRNASeq-stable. >> >> Thanks in advance >> >> Sylvain >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Michael Love15k
0
gravatar for Michael Love
3.7 years ago by
Michael Love15k
United States
Michael Love15k wrote:
hi Sylvain, fyi I'm cc-ing the list. It appears that exp.ann might already have a column called sizeFactors. Can you show me this column, if it exists? thanks Mike On Fri, Mar 7, 2014 at 10:34 AM, Sylvain Foisy Ph. D. < sylvain.foisy@diploide.net> wrote: > Hi, > > Just updated my R-devel install and it works a-ok. One question however: > when I create my DESeq object, I get this: > > > genex<-DESeqDataSetFromMatrix( > + countData=exp.data, > + colData=exp.ann, > + design=~ celltype) > > genex<-DESeq(genex) > using pre-existing size factors > estimating dispersions > gene-wise dispersion estimates > Error in .local(object, ...) : > first calculate size factors, add normalizationFactors, or set > normalized=FALSE > > genex<-estimateSizeFactors(genex) > > genex<-DESeq(genex) > using pre-existing size factors > estimating dispersions > gene-wise dispersion estimates > mean-dispersion relationship > final dispersion estimates > fitting model and testing > blah…blah…blah… > > Why do I need to use estimateSizeFactors first? According to the vignette, > the DESeq function should do this step; btw, the version I used before did > not need me to do this prior to run the DESeq() function. > > Best regards > > S > > On Mar 5, 2014, at 9:44 PM, Michael Love <michaelisaiahlove@gmail.com> > wrote: > > > hi Sylvain, > > > > I committed version 1.3.45 today, which should be available in 1 day. > Please let me know if this clears up the error. > > > > thanks, > > > > Mike > > > > > > On Wed, Mar 5, 2014 at 1:13 PM, Sylvain Foisy Ph. D. < > sylvain.foisy@diploide.net> wrote: > > Hi Michael, > > > > Cool ;-) Thanks for the excellent work!! > > > > Best regards > > > > S > > > > On Mar 5, 2014, at 12:34 PM, Michael Love <michaelisaiahlove@gmail.com> > wrote: > > > >> Hi Sylvain, > >> > >> Thanks for posting this bug. I should be able to commit a simple > change today, which fixes this. > >> > >> Mike > >> > >> > >> On Wed, Mar 5, 2014 at 10:42 AM, Sylvain Foisy Ph. D. < > sylvain.foisy@diploide.net> wrote: > >> Hi, > >> > >> I used DEseq2 v. 1.3 under R-devel because I have many different values > for my design (celltype) and it worked ok. I upgraded my libraries this > morning (bad,bad,bad…) and now it is not working. Might this be a bug? > >> > >> > exp.data<-read.table("./count_table_by_gene.txt",header=T,row.names=1) > >> > exp.ann<-read.table("./desc_colonnes.txt",header=T) > >> > genex<-DESeqDataSetFromMatrix( > >> + countData=exp.data, > >> + colData=exp.ann, > >> + design=~ celltype) > >> > genex > >> class: DESeqDataSet > >> dim: 25192 103 > >> exptData(0): > >> assays(1): counts > >> rownames(25192): A1BG A1BG-AS1 ... ZZEF1 ZZZ3 > >> rowData metadata column names(0): > >> colnames(103): 169377_LYMPHO_B_accepted_hits.bam > >> 169377_celltype1_accepted_hits.bam ... > 999046_celltype_m_accepted_hits.bam > >> 999046_celltype_n_accepted_hits.bam > >> colData names(2): sizeFactor celltype > >> > genex<-DESeq(genex) > >> using pre-existing size factors > >> estimating dispersions > >> gene-wise dispersion estimates > >> Error in relevel.factor(coldata[[v]], as.character(coldata[[v]][idx])) : > >> 'ref' must be of length one > >> > >> My count files were created using easyRNASeq-stable. > >> > >> Thanks in advance > >> > >> Sylvain > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > > [[alternative HTML version deleted]]
ADD COMMENTlink written 3.7 years ago by Michael Love15k
Hi, Mike and All: I am testing DESeq2 for multi-factor contrast setting for my own data with more complex meta data but currently use the simpler dataset from the user guide for testing purpose, and run into some issues that need your input and advice. Here are the commands (only show some more relevant outputs): library("DESeq2") library("Biobase") library("pasilla") data("pasillaGenes") countData <- counts(pasillaGenes) colData <- pData(pasillaGenes)[,c("condition","type")] colData<-data.frame(colData,paste(colData$condition,colData$type,sep=" .")) colnames(colData)[3]<-"condition_type"; > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ condition_type) > colData(dds) DataFrame with 7 rows and 4 columns condition type condition_type sizeFactor <factor> <factor> <factor> <numeric> treated1fb treated single-read treated.single-read 1.5116926 treated2fb treated paired-end treated.paired-end 0.7843521 treated3fb treated paired-end treated.paired-end 0.8958321 untreated1fb untreated single-read untreated.single-read 1.0499961 untreated2fb untreated single-read untreated.single-read 1.6585559 untreated3fb untreated paired-end untreated.paired-end 0.7117763 untreated4fb untreated paired-end untreated.paired-end 0.783745 >dds <- DESeq(dds) estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model > resultsNames(dds) [1] "Intercept" [2] "condition_type_treated.single.read_vs_treated.paired.end" [3] "condition_type_untreated.paired.end_vs_treated.paired.end" [4] "condition_type_untreated.single.read_vs_treated.paired.end" > colData(dds) DataFrame with 7 rows and 4 columns condition type condition_type sizeFactor <factor> <factor> <factor> <numeric> treated1fb treated single-read treated.single-read 1.5116926 treated2fb treated paired-end treated.paired-end 0.7843521 treated3fb treated paired-end treated.paired-end 0.8958321 untreated1fb untreated single-read untreated.single-read 1.0499961 untreated2fb untreated single-read untreated.single-read 1.6585559 untreated3fb untreated paired-end untreated.paired-end 0.7117763 untreated4fb untreated paired-end untreated.paired-end 0.7837458 Then I tried a slight diiffermry setting of the design: > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~0+ condition_type) > dds <- DESeq(dds) estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model 580 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > resultsNames(dds) [1] "condition_typetreated.paired.end" [2] "condition_type_treated.single.read_vs_treated.paired.end" [3] "condition_type_untreated.paired.end_vs_treated.paired.end" [4] "condition_type_untreated.single.read_vs_treated.paired.end" > colData(dds) DataFrame with 7 rows and 4 columns condition type condition_type sizeFactor <factor> <factor> <factor> <numeric> treated1fb treated single-read treated.single-read 1.5116926 treated2fb treated paired-end treated.paired-end 0.7843521 treated3fb treated paired-end treated.paired-end 0.8958321 untreated1fb untreated single-read untreated.single-read 1.0499961 untreated2fb untreated single-read untreated.single-read 1.6585559 untreated3fb untreated paired-end untreated.paired-end 0.7117763 untreated4fb untreated paired-end untreated.paired-end 0.7837458 Then supposedly, I can use results(dds, "condition_type_treated.single.read_vs_treated.paired.end") to get DEGs for each contrast shown in resultsNames(dds). here are my questions: 1. I used design = ~0+ condition_type instead of design = ~ condition_type in 2nd case, try to skip the intercept so that I can easily get all possible contrasts, but seem not working the way I want. 2. I tried to get all possible contrasts: but besides the contrasts shown in resultsNames(dds) in both cases, the contrasts like untreated.single.read vs treated.single.read, untreated.paired.end vs untreated.single.read not even exists in the resultsNames(dds). also I like the contrast generally like: treated (including both treated.single.read and treated.paired-end) vs untreated (including both untreated.single-read and untreated paired-end). I know for this case, we can just to design = ~condition, but I wish to do this in the same roof of one single design model although I can do a separate design. In limma and edgeR, there is a function like: con.matrix<-makeContrasts() where one can set up any contrasts under the design at will. Is there anythign like that in DESeq2? I understand we can do design(dds) <- formula(~ condition_type), but no contrast setting can be made at will. Anything in DESeq2 can get around that? 3. Also for simple contrast, I understand one can use relevel(colData(dds)$condition,"control") kind of command to set base level, but for multiple-factors contrasts as I am after, I almost need some kind of makeContrasts() mechanism to set up contrasts at will or have to do that individually one by one, which obvioulsy would be tedious and also these contrasts won't be in a single model roof. Anything can get around like that as well? if question 2 is addressed, this one shall be no problem. Thanks in advance for your help! Appreciated very much! Best Ming ATRF/NCI-Frederick, Maryland, USA. [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Ming Yi340
hi Ming, I'm confused why you are not following the instructions in the vignette section 1.5 Multifactor designs? You should not and we do not recommend pasting together columns like this, nor inserting + 0 into the design. Please have a look at what we do recommend in this section. Extracting contrasts is covered in vignette section 3.2 Contrasts. First take a look at the entire vignette, as we've spent a lot of time writing the documentation to try to answer user questions. we are happy to discuss the best approach for your experiment, but first we need to hear more about your aims and experiment e.g. what hypotheses you wish to test, what kind of genes you are looking to find. It's hard for us to reverse engineer a recommendation rather than to go at it from basic aims. Mike On Fri, Mar 7, 2014 at 12:52 PM, Ming Yi <yi02@hotmail.com> wrote: > > Hi, Mike and All: > > I am testing DESeq2 for multi-factor contrast setting for my own data with > more complex meta data but currently use the simpler dataset from the user > guide for testing purpose, and run into some issues that need your input > and advice. Here are the commands (only show some more relevant outputs): > > library("DESeq2") > library("Biobase") > library("pasilla") > data("pasillaGenes") > countData <- counts(pasillaGenes) > colData <- pData(pasillaGenes)[,c("condition","type")] > colData<-data.frame(colData,paste(colData$condition,colData$type,sep =".")) > colnames(colData)[3]<-"condition_type"; > > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, > design = ~ condition_type) > > colData(dds) > DataFrame with 7 rows and 4 columns > condition type condition_type sizeFactor > <factor> <factor> <factor> <numeric> > treated1fb treated single-read treated.single-read 1.5116926 > treated2fb treated paired-end treated.paired-end 0.7843521 > treated3fb treated paired-end treated.paired-end 0.8958321 > untreated1fb untreated single-read untreated.single-read 1.0499961 > untreated2fb untreated single-read untreated.single-read 1.6585559 > untreated3fb untreated paired-end untreated.paired-end 0.7117763 > untreated4fb untreated paired-end untreated.paired-end 0.783745 > >dds <- DESeq(dds) > estimating size factors > estimating dispersions > gene-wise dispersion estimates > mean-dispersion relationship > final dispersion estimates > fitting generalized linear model > > resultsNames(dds) > [1] "Intercept" > [2] "condition_type_treated.single.read_vs_treated.paired.end" > [3] "condition_type_untreated.paired.end_vs_treated.paired.end" > [4] "condition_type_untreated.single.read_vs_treated.paired.end" > > colData(dds) > DataFrame with 7 rows and 4 columns > condition type condition_type sizeFactor > <factor> <factor> <factor> <numeric> > treated1fb treated single-read treated.single-read 1.5116926 > treated2fb treated paired-end treated.paired-end 0.7843521 > treated3fb treated paired-end treated.paired-end 0.8958321 > untreated1fb untreated single-read untreated.single-read 1.0499961 > untreated2fb untreated single-read untreated.single-read 1.6585559 > untreated3fb untreated paired-end untreated.paired-end 0.7117763 > untreated4fb untreated paired-end untreated.paired-end 0.7837458 > > Then I tried a slight diiffermry setting of the design: > > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, > design = ~0+ condition_type) > > dds <- DESeq(dds) > estimating size factors > estimating dispersions > gene-wise dispersion estimates > mean-dispersion relationship > final dispersion estimates > fitting generalized linear model > 580 rows did not converge in beta, labelled in mcols(object)$betaConv. Use > larger maxit argument with nbinomWaldTest > > resultsNames(dds) > [1] "condition_typetreated.paired.end" > [2] "condition_type_treated.single.read_vs_treated.paired.end" > [3] "condition_type_untreated.paired.end_vs_treated.paired.end" > [4] "condition_type_untreated.single.read_vs_treated.paired.end" > > colData(dds) > DataFrame with 7 rows and 4 columns > condition type condition_type sizeFactor > <factor> <factor> <factor> <numeric> > treated1fb treated single-read treated.single-read 1.5116926 > treated2fb treated paired-end treated.paired-end 0.7843521 > treated3fb treated paired-end treated.paired-end 0.8958321 > untreated1fb untreated single-read untreated.single-read 1.0499961 > untreated2fb untreated single-read untreated.single-read 1.6585559 > untreated3fb untreated paired-end untreated.paired-end 0.7117763 > untreated4fb untreated paired-end untreated.paired-end 0.7837458 > > Then supposedly, I can use results(dds, > "condition_type_treated.single.read_vs_treated.paired.end") to get DEGs for > each contrast shown in resultsNames(dds). > > here are my questions: > 1. I used design = ~0+ condition_type instead of design = ~ condition_type > in 2nd case, try to skip the intercept so that I can easily get > all possible contrasts, but seem not working the way I want. > 2. I tried to get all possible contrasts: but besides the contrasts shown > in resultsNames(dds) in both cases, the contrasts > like untreated.single.read vs treated.single.read, untreated.paired.end vs > untreated.single.read not even exists in the resultsNames(dds). also I like > the contrast generally like: treated (including both treated.single.read > and treated.paired-end) vs untreated (including both untreated .single-read > and untreated paired-end). I know for this case, we can just to design = > ~condition, but I wish to do this in the same roof of one single design > model although I can do a separate design. In limma and edgeR, there is a > function like: con.matrix<-makeContrasts() where one can set up > any contrasts under the design at will. Is there anythign like that in > DESeq2? I understand we can do design(dds) <- formula(~ condition_type), > but no contrast setting can be made at will. Anything in DESeq2 can get > around that? > 3. Also for simple contrast, I understand one can use > relevel(colData(dds)$condition,"control") kind of command to set base > level, but for multiple-factors contrasts as I am after, I almost need some > kind of makeContrasts() mechanism to set up contrasts at will or have to do > that individually one by one, which obvioulsy would be tedious and also > these contrasts won't be in a single model roof. Anything can get around > like that as well? if question 2 is addressed, this one shall be no problem. > > Thanks in advance for your help! Appreciated very much! > > Best > > Ming > ATRF/NCI-Frederick, > Maryland, USA. > > > > > > > [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Michael Love15k
Hi, Mike: Thx for the info, indeed I did try the following before with the following for the data in user guide: > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ condition+type) > dds <- DESeq(dds) > resultsNames(dds) [1] "Intercept" "condition_untreated_vs_treated" [3] "type_single.read_vs_paired.end" As you can see, the contrast I can get here is overall untreated_vs_treated and overall single.read_vs_paired.end, there is no subtype contrast such as treated.single.read vs treated.paired-end etc. I would love to discuss briefly what I need here. I have a dataset which has tumors and matched normal samples from many patients, and there are subtypes of the tumors, say RasOnly, RasP, RasPNot types of tumors, of course, corresponding matched would be also with subtypes of RasOnly, RasP,RasP1Hit, RasPNot, and the metadata like this: Subject SampleName Type RasType RasTum T6745_01A 49_6745 T6745_01A Tumor RasP RasP.Tumor N6745_11A 49_6745 N6745_11A Normal RasP RasP.Normal T6761_01A 49_6761 T6761_01A Tumor RasPNot RasPNot.Tumor N6761_11A 49_6761 N6761_11A Normal RasPNot RasPNot.Normal T5930_01A 50_5930 T5930_01A Tumor RasP1Hit RasP1Hit.Tumor N5930_11A 50_5930 N5930_11A Normal RasP1Hit RasP1Hit.Normal T5932_01A 50_5932 T5932_01A Tumor RasOnly RasOnly.Tumor N5932_11A 50_5932 N5932_11A Normal RasOnly RasOnly.Normal ........ Here are the contrasts I am interested to get DEGs: RasOnly.Tumor vs RasOnly.Normal RasP.Tumor vs RasP.Normal RasP.Tumor + RasP1Hit.Tumor vs RasP.Normal+RasP1Hit.Normal RasPNot.Tumor vs RasPNot.Normal RasP1Hit.Tumor vs RasP1Hit.Normal RasP.Tumor vs RasPNot.Tumor RasP.Tumor+RasP1Hit.Tumor vs RasPNot.Tumor RasOnly.Tumor vs RasPNot.Tumor RasOnly.Normal vs RasPNot.Normal RasP.Normal vs RasPNot.Normal RasPRasP1Hit.Normal vs RasPNot.Normal, Tumor vs Normal The last item Tumor vs Normal certainly can easily use design = ~ type to deal with. But many of the contrasts listed above not easy unless use the RasTum of the metadata shown above. I did try to use design=~Type+RasType here are the commands: > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~Type+RasType); > show(resultsNames(dds)) character(0) > dds <- DESeq(dds); > resultsNames(dds) [1] "Intercept" "Type_Tumor_vs_Normal" [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" [5] "RasType_RasPNot_vs_RasOnly" as you can see, I can only derive overall subtype contrasts from this way but not something like RasP.Tumor vs RasOnly.Tumor, the overall subtype contrasts for example, RasType_RasP_vs_RasOnly, consider both tumor and normal of RasP compared with those of RasOnly, which is certainly not what we want here. user guide section 3.2 did show resCtrst <- results(ddsCtrst, contrast=c("treatment","OHT","DPN")) resCtrst <- results(ddsCtrst, contrast=c(0,0,0,0,-1,1)) So besides RasTum, if you have a better way using just design = ~Type+RasType, that would be great. I did try the following: > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~RasTum); > dds <- DESeq(dds); estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model 172 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > resultsNames(dds) [1] "Intercept" [2] "RasTum_RasOnly.Tumor_vs_RasOnly.Normal" [3] "RasTum_RasP.Normal_vs_RasOnly.Normal" [4] "RasTum_RasP.Tumor_vs_RasOnly.Normal" [5] "RasTum_RasP1Hit.Normal_vs_RasOnly.Normal" [6] "RasTum_RasP1Hit.Tumor_vs_RasOnly.Normal" [7] "RasTum_RasPNot.Normal_vs_RasOnly.Normal" [8] "RasTum_RasPNot.Tumor_vs_RasOnly.Normal" I did get many contrasts as I desired, but the contrasts RasTum_RasP.Tumor_vs_RasOnly.Normal does not make sense here to me, but it shown up there in the resultsNames(dds) . based on section 3.2, I seem be able to derived more from the above contrasts: say: for contrast RasP.Tumor vs RasP.Normal, I can do: resCtrst<-result(dds, contrast=c(0,0,-1,1,0,0,0,0); But is there any better way to do the above contrasts listed above that I desire? Thx again for your advice! best Ming From: michaelisaiahlove@gmail.com Date: Fri, 7 Mar 2014 13:23:56 -0500 Subject: Re: Questions about multi-factor contrast setting in DESeq2 To: yi02@hotmail.com CC: bioconductor@r-project.org hi Ming, I'm confused why you are not following the instructions in the vignette section 1.5 Multifactor designs? You should not and we do not recommend pasting together columns like this, nor inserting + 0 into the design. Please have a look at what we do recommend in this section. Extracting contrasts is covered in vignette section 3.2 Contrasts. First take a look at the entire vignette, as we've spent a lot of time writing the documentation to try to answer user questions. we are happy to discuss the best approach for your experiment, but first we need to hear more about your aims and experiment e.g. what hypotheses you wish to test, what kind of genes you are looking to find. It's hard for us to reverse engineer a recommendation rather than to go at it from basic aims. Mike On Fri, Mar 7, 2014 at 12:52 PM, Ming Yi <yi02@hotmail.com> wrote: Hi, Mike and All: I am testing DESeq2 for multi-factor contrast setting for my own data with more complex meta data but currently use the simpler dataset from the user guide for testing purpose, and run into some issues that need your input and advice. Here are the commands (only show some more relevant outputs): library("DESeq2") library("Biobase") library("pasilla") data("pasillaGenes") countData <- counts(pasillaGenes) colData <- pData(pasillaGenes)[,c("condition","type")] colData<-data.frame(colData,paste(colData$condition,colData$type,sep=" .")) colnames(colData)[3]<-"condition_type"; > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ condition_type) > colData(dds) DataFrame with 7 rows and 4 columns condition type condition_type sizeFactor <factor> <factor> <factor> <numeric> treated1fb treated single-read treated.single-read 1.5116926 treated2fb treated paired-end treated.paired-end 0.7843521 treated3fb treated paired-end treated.paired-end 0.8958321 untreated1fb untreated single-read untreated.single-read 1.0499961 untreated2fb untreated single-read untreated.single-read 1.6585559 untreated3fb untreated paired-end untreated.paired-end 0.7117763 untreated4fb untreated paired-end untreated.paired-end 0.783745 >dds <- DESeq(dds) estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model > resultsNames(dds) [1] "Intercept" [2] "condition_type_treated.single.read_vs_treated.paired.end" [3] "condition_type_untreated.paired.end_vs_treated.paired.end" [4] "condition_type_untreated.single.read_vs_treated.paired.end" > colData(dds) DataFrame with 7 rows and 4 columns condition type condition_type sizeFactor <factor> <factor> <factor> <numeric> treated1fb treated single-read treated.single-read 1.5116926 treated2fb treated paired-end treated.paired-end 0.7843521 treated3fb treated paired-end treated.paired-end 0.8958321 untreated1fb untreated single-read untreated.single-read 1.0499961 untreated2fb untreated single-read untreated.single-read 1.6585559 untreated3fb untreated paired-end untreated.paired-end 0.7117763 untreated4fb untreated paired-end untreated.paired-end 0.7837458 Then I tried a slight diiffermry setting of the design: > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~0+ condition_type) > dds <- DESeq(dds) estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model 580 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > resultsNames(dds) [1] "condition_typetreated.paired.end" [2] "condition_type_treated.single.read_vs_treated.paired.end" [3] "condition_type_untreated.paired.end_vs_treated.paired.end" [4] "condition_type_untreated.single.read_vs_treated.paired.end" > colData(dds) DataFrame with 7 rows and 4 columns condition type condition_type sizeFactor <factor> <factor> <factor> <numeric> treated1fb treated single-read treated.single-read 1.5116926 treated2fb treated paired-end treated.paired-end 0.7843521 treated3fb treated paired-end treated.paired-end 0.8958321 untreated1fb untreated single-read untreated.single-read 1.0499961 untreated2fb untreated single-read untreated.single-read 1.6585559 untreated3fb untreated paired-end untreated.paired-end 0.7117763 untreated4fb untreated paired-end untreated.paired-end 0.7837458 Then supposedly, I can use results(dds, "condition_type_treated.single.read_vs_treated.paired.end") to get DEGs for each contrast shown in resultsNames(dds). here are my questions: 1. I used design = ~0+ condition_type instead of design = ~ condition_type in 2nd case, try to skip the intercept so that I can easily get all possible contrasts, but seem not working the way I want. 2. I tried to get all possible contrasts: but besides the contrasts shown in resultsNames(dds) in both cases, the contrasts like untreated.single.read vs treated.single.read, untreated.paired.end vs untreated.single.read not even exists in the resultsNames(dds). also I like the contrast generally like: treated (including both treated.single.read and treated.paired-end) vs untreated (including both untreated.single-read and untreated paired-end). I know for this case, we can just to design = ~condition, but I wish to do this in the same roof of one single design model although I can do a separate design. In limma and edgeR, there is a function like: con.matrix<-makeContrasts() where one can set up any contrasts under the design at will. Is there anythign like that in DESeq2? I understand we can do design(dds) <- formula(~ condition_type), but no contrast setting can be made at will. Anything in DESeq2 can get around that? 3. Also for simple contrast, I understand one can use relevel(colData(dds)$condition,"control") kind of command to set base level, but for multiple-factors contrasts as I am after, I almost need some kind of makeContrasts() mechanism to set up contrasts at will or have to do that individually one by one, which obvioulsy would be tedious and also these contrasts won't be in a single model roof. Anything can get around like that as well? if question 2 is addressed, this one shall be no problem. Thanks in advance for your help! Appreciated very much! Best Ming ATRF/NCI-Frederick, Maryland, USA. [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Ming Yi340
Hi Ming, Exploratory data analysis might be a more fruitful approach here rather than brute force combinatorics and testing. Copying from Wolfgang's recommendation in a similar situation: "my advice here would be to put less emphasis on the testing and move straight to clustering, using one of the transformations described in the DESeq2 vignette to bring the data to a 'well-behaved' (log-like) scale." "To filter out the genes that vary not much, use the range (max-min) or IQR and a subjective cutoff (e.g. retain the top 20% of genes), then use standard clustering functions (e.g. pam from the cluster package), and other exploratory data analyses (e.g. PCA) to see the types of behaviours." You might also try constructing a heatmap, as shown in the vignette, using a subset of genes which vary the most, and then explore the grouping of samples in the hierarchical clustering on the columns. For ease of visualization, this subset should probably be in the 100s. Mike On Mar 7, 2014 2:59 PM, "Ming Yi" <yi02@hotmail.com> wrote: > Hi, Mike: > > Thx for the info, indeed I did try the following before with the > following for the data in user guide: > > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, > design = ~ condition+type) > > dds <- DESeq(dds) > > resultsNames(dds) > [1] "Intercept" "condition_untreated_vs_treated" > [3] "type_single.read_vs_paired.end" > > As you can see, the contrast I can get here is overall > untreated_vs_treated and overall single.read_vs_paired.end, there is no > subtype contrast such as treated.single.read vs treated.paired-end etc. > > I would love to discuss briefly what I need here. I have a dataset which > has tumors and matched normal samples from many patients, and there are > subtypes of the tumors, say RasOnly, RasP, RasPNot types of tumors, of > course, corresponding matched would be also with subtypes of RasOnly, > RasP,RasP1Hit, RasPNot, and the metadata like this: > > Subject SampleName Type RasType RasTum > T6745_01A 49_6745 T6745_01A Tumor RasP RasP.Tumor > N6745_11A 49_6745 N6745_11A Normal RasP RasP.Normal > T6761_01A 49_6761 T6761_01A Tumor RasPNot RasPNot.Tumor > N6761_11A 49_6761 N6761_11A Normal RasPNot RasPNot.Normal > T5930_01A 50_5930 T5930_01A Tumor RasP1Hit RasP1Hit.Tumor > N5930_11A 50_5930 N5930_11A Normal RasP1Hit RasP1Hit.Normal > T5932_01A 50_5932 T5932_01A Tumor RasOnly RasOnly.Tumor > N5932_11A 50_5932 N5932_11A Normal RasOnly RasOnly.Normal > ........ > > Here are the contrasts I am interested to get DEGs: > RasOnly.Tumor vs RasOnly.Normal > RasP.Tumor vs RasP.Normal > RasP.Tumor + RasP1Hit.Tumor vs RasP.Normal+RasP1Hit.Normal > RasPNot.Tumor vs RasPNot.Normal > RasP1Hit.Tumor vs RasP1Hit.Normal > RasP.Tumor vs RasPNot.Tumor > RasP.Tumor+RasP1Hit.Tumor vs RasPNot.Tumor > RasOnly.Tumor vs RasPNot.Tumor > RasOnly.Normal vs RasPNot.Normal > RasP.Normal vs RasPNot.Normal > RasPRasP1Hit.Normal vs RasPNot.Normal, > Tumor vs Normal > > The last item Tumor vs Normal certainly can easily use design = ~ type to > deal with. But many of the contrasts listed above not easy unless use the > RasTum of the metadata shown above. I did try to use design=~Type+RasType > here are the commands: > > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = > ~Type+RasType); > > show(resultsNames(dds)) > character(0) > > dds <- DESeq(dds); > > resultsNames(dds) > [1] "Intercept" "Type_Tumor_vs_Normal" > [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" > [5] "RasType_RasPNot_vs_RasOnly" > > as you can see, I can only derive overall subtype contrasts from this way > but not something like RasP.Tumor vs RasOnly.Tumor, the overall subtype > contrasts for example, RasType_RasP_vs_RasOnly, consider both tumor and > normal of RasP compared with those of RasOnly, which is certainly not what > we want here. > user guide section 3.2 did show > resCtrst <- results(ddsCtrst, contrast=c("treatment","OHT","DPN")) > resCtrst <- results(ddsCtrst, contrast=c(0,0,0,0,-1,1)) > > So besides RasTum, if you have a better way using just design = > ~Type+RasType, that would be great. > I did try the following: > > > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = > ~RasTum); > > dds <- DESeq(dds); > estimating size factors > estimating dispersions > gene-wise dispersion estimates > mean-dispersion relationship > final dispersion estimates > fitting generalized linear model > 172 rows did not converge in beta, labelled in mcols(object)$betaConv. Use > larger maxit argument with nbinomWaldTest > > resultsNames(dds) > [1] "Intercept" > [2] "RasTum_RasOnly.Tumor_vs_RasOnly.Normal" > [3] "RasTum_RasP.Normal_vs_RasOnly.Normal" > [4] "RasTum_RasP.Tumor_vs_RasOnly.Normal" > [5] "RasTum_RasP1Hit.Normal_vs_RasOnly.Normal" > [6] "RasTum_RasP1Hit.Tumor_vs_RasOnly.Normal" > [7] "RasTum_RasPNot.Normal_vs_RasOnly.Normal" > [8] "RasTum_RasPNot.Tumor_vs_RasOnly.Normal" > > I did get many contrasts as I desired, but the contrasts > RasTum_RasP.Tumor_vs_RasOnly.Normal does not make sense here to me, but it > shown up there in the resultsNames(dds) . > > based on section 3.2, I seem be able to derived more from the above > contrasts: > say: for contrast RasP.Tumor vs RasP.Normal, I can do: > resCtrst<-result(dds, contrast=c(0,0,-1,1,0,0,0,0); > > But is there any better way to do the above contrasts listed above that I > desire? > > Thx again for your advice! > best > > Ming > > > > > > > > > ------------------------------ > From: michaelisaiahlove@gmail.com > Date: Fri, 7 Mar 2014 13:23:56 -0500 > Subject: Re: Questions about multi-factor contrast setting in DESeq2 > To: yi02@hotmail.com > CC: bioconductor@r-project.org > > hi Ming, > > I'm confused why you are not following the instructions in the vignette > section 1.5 Multifactor designs? You should not and we do not recommend > pasting together columns like this, nor inserting + 0 into the design. > Please have a look at what we do recommend in this section. > > Extracting contrasts is covered in vignette section 3.2 Contrasts. First > take a look at the entire vignette, as we've spent a lot of time writing > the documentation to try to answer user questions. > > we are happy to discuss the best approach for your experiment, but first > we need to hear more about your aims and experiment e.g. what hypotheses > you wish to test, what kind of genes you are looking to find. It's hard for > us to reverse engineer a recommendation rather than to go at it from basic > aims. > > Mike > > > On Fri, Mar 7, 2014 at 12:52 PM, Ming Yi <yi02@hotmail.com> wrote: > > > Hi, Mike and All: > > I am testing DESeq2 for multi-factor contrast setting for my own data with > more complex meta data but currently use the simpler dataset from the user > guide for testing purpose, and run into some issues that need your input > and advice. Here are the commands (only show some more relevant outputs): > > library("DESeq2") > library("Biobase") > library("pasilla") > data("pasillaGenes") > countData <- counts(pasillaGenes) > colData <- pData(pasillaGenes)[,c("condition","type")] > colData<-data.frame(colData,paste(colData$condition,colData$type,sep =".")) > colnames(colData)[3]<-"condition_type"; > > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, > design = ~ condition_type) > > colData(dds) > DataFrame with 7 rows and 4 columns > condition type condition_type sizeFactor > <factor> <factor> <factor> <numeric> > treated1fb treated single-read treated.single-read 1.5116926 > treated2fb treated paired-end treated.paired-end 0.7843521 > treated3fb treated paired-end treated.paired-end 0.8958321 > untreated1fb untreated single-read untreated.single-read 1.0499961 > untreated2fb untreated single-read untreated.single-read 1.6585559 > untreated3fb untreated paired-end untreated.paired-end 0.7117763 > untreated4fb untreated paired-end untreated.paired-end 0.783745 > >dds <- DESeq(dds) > estimating size factors > estimating dispersions > gene-wise dispersion estimates > mean-dispersion relationship > final dispersion estimates > fitting generalized linear model > > resultsNames(dds) > [1] "Intercept" > [2] "condition_type_treated.single.read_vs_treated.paired.end" > [3] "condition_type_untreated.paired.end_vs_treated.paired.end" > [4] "condition_type_untreated.single.read_vs_treated.paired.end" > > colData(dds) > DataFrame with 7 rows and 4 columns > condition type condition_type sizeFactor > <factor> <factor> <factor> <numeric> > treated1fb treated single-read treated.single-read 1.5116926 > treated2fb treated paired-end treated.paired-end 0.7843521 > treated3fb treated paired-end treated.paired-end 0.8958321 > untreated1fb untreated single-read untreated.single-read 1.0499961 > untreated2fb untreated single-read untreated.single-read 1.6585559 > untreated3fb untreated paired-end untreated.paired-end 0.7117763 > untreated4fb untreated paired-end untreated.paired-end 0.7837458 > > Then I tried a slight diiffermry setting of the design: > > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, > design = ~0+ condition_type) > > dds <- DESeq(dds) > estimating size factors > estimating dispersions > gene-wise dispersion estimates > mean-dispersion relationship > final dispersion estimates > fitting generalized linear model > 580 rows did not converge in beta, labelled in mcols(object)$betaConv. Use > larger maxit argument with nbinomWaldTest > > resultsNames(dds) > [1] "condition_typetreated.paired.end" > [2] "condition_type_treated.single.read_vs_treated.paired.end" > [3] "condition_type_untreated.paired.end_vs_treated.paired.end" > [4] "condition_type_untreated.single.read_vs_treated.paired.end" > > colData(dds) > DataFrame with 7 rows and 4 columns > condition type condition_type sizeFactor > <factor> <factor> <factor> <numeric> > treated1fb treated single-read treated.single-read 1.5116926 > treated2fb treated paired-end treated.paired-end 0.7843521 > treated3fb treated paired-end treated.paired-end 0.8958321 > untreated1fb untreated single-read untreated.single-read 1.0499961 > untreated2fb untreated single-read untreated.single-read 1.6585559 > untreated3fb untreated paired-end untreated.paired-end 0.7117763 > untreated4fb untreated paired-end untreated.paired-end 0.7837458 > > Then supposedly, I can use results(dds, > "condition_type_treated.single.read_vs_treated.paired.end") to get DEGs for > each contrast shown in resultsNames(dds). > > here are my questions: > 1. I used design = ~0+ condition_type instead of design = ~ condition_type > in 2nd case, try to skip the intercept so that I can easily get > all possible contrasts, but seem not working the way I want. > 2. I tried to get all possible contrasts: but besides the contrasts shown > in resultsNames(dds) in both cases, the contrasts > like untreated.single.read vs treated.single.read, untreated.paired.end vs > untreated.single.read not even exists in the resultsNames(dds). also I like > the contrast generally like: treated (including both treated.single.read > and treated.paired-end) vs untreated (including both untreated .single-read > and untreated paired-end). I know for this case, we can just to design = > ~condition, but I wish to do this in the same roof of one single design > model although I can do a separate design. In limma and edgeR, there is a > function like: con.matrix<-makeContrasts() where one can set up > any contrasts under the design at will. Is there anythign like that in > DESeq2? I understand we can do design(dds) <- formula(~ condition_type), > but no contrast setting can be made at will. Anything in DESeq2 can get > around that? > 3. Also for simple contrast, I understand one can use > relevel(colData(dds)$condition,"control") kind of command to set base > level, but for multiple-factors contrasts as I am after, I almost need some > kind of makeContrasts() mechanism to set up contrasts at will or have to do > that individually one by one, which obvioulsy would be tedious and also > these contrasts won't be in a single model roof. Anything can get around > like that as well? if question 2 is addressed, this one shall be no problem. > > Thanks in advance for your help! Appreciated very much! > > Best > > Ming > ATRF/NCI-Frederick, > Maryland, USA. > > > > > > > > > [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Michael Love15k
hi Ming, To follow up on the question about contrasts, the way to perform comparisons like "RasOnly.Tumor vs RasOnly.Normal", would be a design: Type + RasType + Type:RasType where: results(dds, contrast=c("Type","Tumor","Normal")) tests for the general effect, and then the results for the interactions -- which are present in resultsNames(dds) and can be extracted using the 'name' argument to results() -- tests for an effect in a specific RasType which is different than the general effect. Mike On Fri, Mar 7, 2014 at 5:31 PM, Michael Love <michaelisaiahlove@gmail.com>wrote: > Hi Ming, > > Exploratory data analysis might be a more fruitful approach here rather > than brute force combinatorics and testing. > > Copying from Wolfgang's recommendation in a similar situation: > > "my advice here would be to put less emphasis on the testing and move > straight to clustering, using one of the transformations described in the > DESeq2 vignette to bring the data to a 'well-behaved' (log-like) scale." > > "To filter out the genes that vary not much, use the range (max-min) or > IQR and a subjective cutoff (e.g. retain the top 20% of genes), then use > standard clustering functions (e.g. pam from the cluster package), and > other exploratory data analyses (e.g. PCA) to see the types of behaviours." > > You might also try constructing a heatmap, as shown in the vignette, using > a subset of genes which vary the most, and then explore the grouping of > samples in the hierarchical clustering on the columns. For ease of > visualization, this subset should probably be in the 100s. > > Mike > On Mar 7, 2014 2:59 PM, "Ming Yi" <yi02@hotmail.com> wrote: > >> Hi, Mike: >> >> Thx for the info, indeed I did try the following before with the >> following for the data in user guide: >> > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, >> design = ~ condition+type) >> > dds <- DESeq(dds) >> > resultsNames(dds) >> [1] "Intercept" "condition_untreated_vs_treated" >> [3] "type_single.read_vs_paired.end" >> >> As you can see, the contrast I can get here is overall >> untreated_vs_treated and overall single.read_vs_paired.end, there is no >> subtype contrast such as treated.single.read vs treated.paired-end etc. >> >> I would love to discuss briefly what I need here. I have a dataset which >> has tumors and matched normal samples from many patients, and there are >> subtypes of the tumors, say RasOnly, RasP, RasPNot types of tumors, of >> course, corresponding matched would be also with subtypes of RasOnly, >> RasP,RasP1Hit, RasPNot, and the metadata like this: >> >> Subject SampleName Type RasType RasTum >> T6745_01A 49_6745 T6745_01A Tumor RasP RasP.Tumor >> N6745_11A 49_6745 N6745_11A Normal RasP RasP.Normal >> T6761_01A 49_6761 T6761_01A Tumor RasPNot RasPNot.Tumor >> N6761_11A 49_6761 N6761_11A Normal RasPNot RasPNot.Normal >> T5930_01A 50_5930 T5930_01A Tumor RasP1Hit RasP1Hit.Tumor >> N5930_11A 50_5930 N5930_11A Normal RasP1Hit RasP1Hit.Normal >> T5932_01A 50_5932 T5932_01A Tumor RasOnly RasOnly.Tumor >> N5932_11A 50_5932 N5932_11A Normal RasOnly RasOnly.Normal >> ........ >> >> Here are the contrasts I am interested to get DEGs: >> RasOnly.Tumor vs RasOnly.Normal >> RasP.Tumor vs RasP.Normal >> RasP.Tumor + RasP1Hit.Tumor vs RasP.Normal+RasP1Hit.Normal >> RasPNot.Tumor vs RasPNot.Normal >> RasP1Hit.Tumor vs RasP1Hit.Normal >> RasP.Tumor vs RasPNot.Tumor >> RasP.Tumor+RasP1Hit.Tumor vs RasPNot.Tumor >> RasOnly.Tumor vs RasPNot.Tumor >> RasOnly.Normal vs RasPNot.Normal >> RasP.Normal vs RasPNot.Normal >> RasPRasP1Hit.Normal vs RasPNot.Normal, >> Tumor vs Normal >> >> The last item Tumor vs Normal certainly can easily use design = ~ type to >> deal with. But many of the contrasts listed above not easy unless use the >> RasTum of the metadata shown above. I did try to use design=~Type+RasType >> here are the commands: >> > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design >> = ~Type+RasType); >> > show(resultsNames(dds)) >> character(0) >> > dds <- DESeq(dds); >> > resultsNames(dds) >> [1] "Intercept" "Type_Tumor_vs_Normal" >> [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" >> [5] "RasType_RasPNot_vs_RasOnly" >> >> as you can see, I can only derive overall subtype contrasts from this way >> but not something like RasP.Tumor vs RasOnly.Tumor, the overall subtype >> contrasts for example, RasType_RasP_vs_RasOnly, consider both tumor and >> normal of RasP compared with those of RasOnly, which is certainly not what >> we want here. >> user guide section 3.2 did show >> resCtrst <- results(ddsCtrst, contrast=c("treatment","OHT","DPN")) >> resCtrst <- results(ddsCtrst, contrast=c(0,0,0,0,-1,1)) >> >> So besides RasTum, if you have a better way using just design = >> ~Type+RasType, that would be great. >> I did try the following: >> >> > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design >> = ~RasTum); >> > dds <- DESeq(dds); >> estimating size factors >> estimating dispersions >> gene-wise dispersion estimates >> mean-dispersion relationship >> final dispersion estimates >> fitting generalized linear model >> 172 rows did not converge in beta, labelled in mcols(object)$betaConv. >> Use larger maxit argument with nbinomWaldTest >> > resultsNames(dds) >> [1] "Intercept" >> [2] "RasTum_RasOnly.Tumor_vs_RasOnly.Normal" >> [3] "RasTum_RasP.Normal_vs_RasOnly.Normal" >> [4] "RasTum_RasP.Tumor_vs_RasOnly.Normal" >> [5] "RasTum_RasP1Hit.Normal_vs_RasOnly.Normal" >> [6] "RasTum_RasP1Hit.Tumor_vs_RasOnly.Normal" >> [7] "RasTum_RasPNot.Normal_vs_RasOnly.Normal" >> [8] "RasTum_RasPNot.Tumor_vs_RasOnly.Normal" >> >> I did get many contrasts as I desired, but the contrasts >> RasTum_RasP.Tumor_vs_RasOnly.Normal does not make sense here to me, but it >> shown up there in the resultsNames(dds) . >> >> based on section 3.2, I seem be able to derived more from the above >> contrasts: >> say: for contrast RasP.Tumor vs RasP.Normal, I can do: >> resCtrst<-result(dds, contrast=c(0,0,-1,1,0,0,0,0); >> >> But is there any better way to do the above contrasts listed above that I >> desire? >> >> Thx again for your advice! >> best >> >> Ming >> >> >> >> >> >> >> >> >> ------------------------------ >> From: michaelisaiahlove@gmail.com >> Date: Fri, 7 Mar 2014 13:23:56 -0500 >> Subject: Re: Questions about multi-factor contrast setting in DESeq2 >> To: yi02@hotmail.com >> CC: bioconductor@r-project.org >> >> hi Ming, >> >> I'm confused why you are not following the instructions in the vignette >> section 1.5 Multifactor designs? You should not and we do not recommend >> pasting together columns like this, nor inserting + 0 into the design. >> Please have a look at what we do recommend in this section. >> >> Extracting contrasts is covered in vignette section 3.2 Contrasts. First >> take a look at the entire vignette, as we've spent a lot of time writing >> the documentation to try to answer user questions. >> >> we are happy to discuss the best approach for your experiment, but first >> we need to hear more about your aims and experiment e.g. what hypotheses >> you wish to test, what kind of genes you are looking to find. It's hard for >> us to reverse engineer a recommendation rather than to go at it from basic >> aims. >> >> Mike >> >> >> On Fri, Mar 7, 2014 at 12:52 PM, Ming Yi <yi02@hotmail.com> wrote: >> >> >> Hi, Mike and All: >> >> I am testing DESeq2 for multi-factor contrast setting for my own data >> with more complex meta data but currently use the simpler dataset from the >> user guide for testing purpose, and run into some issues that need your >> input and advice. Here are the commands (only show some more relevant >> outputs): >> >> library("DESeq2") >> library("Biobase") >> library("pasilla") >> data("pasillaGenes") >> countData <- counts(pasillaGenes) >> colData <- pData(pasillaGenes)[,c("condition","type")] >> colData<-data.frame(colData,paste(colData$condition,colData$type,se p=".")) >> colnames(colData)[3]<-"condition_type"; >> > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, >> design = ~ condition_type) >> > colData(dds) >> DataFrame with 7 rows and 4 columns >> condition type condition_type sizeFactor >> <factor> <factor> <factor> <numeric> >> treated1fb treated single-read treated.single-read 1.5116926 >> treated2fb treated paired-end treated.paired-end 0.7843521 >> treated3fb treated paired-end treated.paired-end 0.8958321 >> untreated1fb untreated single-read untreated.single-read 1.0499961 >> untreated2fb untreated single-read untreated.single-read 1.6585559 >> untreated3fb untreated paired-end untreated.paired-end 0.7117763 >> untreated4fb untreated paired-end untreated.paired-end 0.783745 >> >dds <- DESeq(dds) >> estimating size factors >> estimating dispersions >> gene-wise dispersion estimates >> mean-dispersion relationship >> final dispersion estimates >> fitting generalized linear model >> > resultsNames(dds) >> [1] "Intercept" >> [2] "condition_type_treated.single.read_vs_treated.paired.end" >> [3] "condition_type_untreated.paired.end_vs_treated.paired.end" >> [4] "condition_type_untreated.single.read_vs_treated.paired.end" >> > colData(dds) >> DataFrame with 7 rows and 4 columns >> condition type condition_type sizeFactor >> <factor> <factor> <factor> <numeric> >> treated1fb treated single-read treated.single-read 1.5116926 >> treated2fb treated paired-end treated.paired-end 0.7843521 >> treated3fb treated paired-end treated.paired-end 0.8958321 >> untreated1fb untreated single-read untreated.single-read 1.0499961 >> untreated2fb untreated single-read untreated.single-read 1.6585559 >> untreated3fb untreated paired-end untreated.paired-end 0.7117763 >> untreated4fb untreated paired-end untreated.paired-end 0.7837458 >> >> Then I tried a slight diiffermry setting of the design: >> > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, >> design = ~0+ condition_type) >> > dds <- DESeq(dds) >> estimating size factors >> estimating dispersions >> gene-wise dispersion estimates >> mean-dispersion relationship >> final dispersion estimates >> fitting generalized linear model >> 580 rows did not converge in beta, labelled in mcols(object)$betaConv. >> Use larger maxit argument with nbinomWaldTest >> > resultsNames(dds) >> [1] "condition_typetreated.paired.end" >> [2] "condition_type_treated.single.read_vs_treated.paired.end" >> [3] "condition_type_untreated.paired.end_vs_treated.paired.end" >> [4] "condition_type_untreated.single.read_vs_treated.paired.end" >> > colData(dds) >> DataFrame with 7 rows and 4 columns >> condition type condition_type sizeFactor >> <factor> <factor> <factor> <numeric> >> treated1fb treated single-read treated.single-read 1.5116926 >> treated2fb treated paired-end treated.paired-end 0.7843521 >> treated3fb treated paired-end treated.paired-end 0.8958321 >> untreated1fb untreated single-read untreated.single-read 1.0499961 >> untreated2fb untreated single-read untreated.single-read 1.6585559 >> untreated3fb untreated paired-end untreated.paired-end 0.7117763 >> untreated4fb untreated paired-end untreated.paired-end 0.7837458 >> >> Then supposedly, I can use results(dds, >> "condition_type_treated.single.read_vs_treated.paired.end") to get DEGs for >> each contrast shown in resultsNames(dds). >> >> here are my questions: >> 1. I used design = ~0+ condition_type instead of design = ~ >> condition_type in 2nd case, try to skip the intercept so that I can easily >> get all possible contrasts, but seem not working the way I want. >> 2. I tried to get all possible contrasts: but besides the contrasts >> shown in resultsNames(dds) in both cases, the contrasts >> like untreated.single.read vs treated.single.read, untreated.paired.end vs >> untreated.single.read not even exists in the resultsNames(dds). also I like >> the contrast generally like: treated (including both treated.single.read >> and treated.paired-end) vs untreated (including both untreated .single-read >> and untreated paired-end). I know for this case, we can just to design = >> ~condition, but I wish to do this in the same roof of one single design >> model although I can do a separate design. In limma and edgeR, there is a >> function like: con.matrix<-makeContrasts() where one can set up >> any contrasts under the design at will. Is there anythign like that in >> DESeq2? I understand we can do design(dds) <- formula(~ condition_type), >> but no contrast setting can be made at will. Anything in DESeq2 can get >> around that? >> 3. Also for simple contrast, I understand one can use >> relevel(colData(dds)$condition,"control") kind of command to set base >> level, but for multiple-factors contrasts as I am after, I almost need some >> kind of makeContrasts() mechanism to set up contrasts at will or have to do >> that individually one by one, which obvioulsy would be tedious and also >> these contrasts won't be in a single model roof. Anything can get around >> like that as well? if question 2 is addressed, this one shall be no problem. >> >> Thanks in advance for your help! Appreciated very much! >> >> Best >> >> Ming >> ATRF/NCI-Frederick, >> Maryland, USA. >> >> >> >> >> >> >> >> >> [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Michael Love15k
Hi, Mike: Thx for your advice again and I did try what you suggested as below: > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~Type + RasType + Type:RasType); > dds <- DESeq(dds); estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model 105 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > show(resultsNames(dds)); [1] "Intercept" "Type_Tumor_vs_Normal" [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" [5] "RasType_RasPNot_vs_RasOnly" "TypeTumor.RasTypeRasP" [7] "TypeTumor.RasTypeRasP1Hit" "TypeTumor.RasTypeRasPNot" certainly we can use > res_RasP_vs_RasOnly <- results(dds,"RasType_RasP_vs_RasOnly") > res_RasP_vs_RasOnly <- results(dds,name="RasType_RasP_vs_RasOnly") > res_Tumor_vs_Normal<-results(dds,"Type_Tumor_vs_Normal") > res_Tumor_vs_Normal<-results(dds,name="Type_Tumor_vs_Normal") but I can not do the following with contrast as suggested in section 3.2: > res_Tumor_vs_Normal<-results(dds,contrast=c("Type","Tumor","Normal")) Error in results(dds, contrast = c("Type", "Tumor", "Normal")) : unused argument (contrast = c("Type", "Tumor", "Normal")) > res_RasP_vs_RasOnly <- results(dds,contrast=c("RasType","RasP","RasOnly")) Error in results(dds, contrast = c("RasType", "RasP", "RasOnly")) : unused argument (contrast = c("RasType", "RasP", "RasOnly")) also I can not get contrast like Tumor.RasP_Tumor.RasPNot: > res_Tumor.RasP_Tumor.RasPNot<-results(dds,contrast=c(0,0,0,0,0,1,0,- 1)) Error in results(dds, contrast = c(0, 0, 0, 0, 0, 1, 0, -1)) : unused argument (contrast = c(0, 0, 0, 0, 0, 1, 0, -1)) it seems the interaction terms in the design (design = ~Type + RasType + Type:RasType) changed the behavior of results()? Any idea or advice? Thanks again for your time and help! Ming From: michaelisaiahlove@gmail.com Date: Fri, 7 Mar 2014 19:12:23 -0500 Subject: Re: Questions about multi-factor contrast setting in DESeq2 To: yi02@hotmail.com CC: bioconductor@r-project.org hi Ming, To follow up on the question about contrasts, the way to perform comparisons like "RasOnly.Tumor vs RasOnly.Normal", would be a design: Type + RasType + Type:RasType where: results(dds, contrast=c("Type","Tumor","Normal")) tests for the general effect, and then the results for the interactions -- which are present in resultsNames(dds) and can be extracted using the 'name' argument to results() -- tests for an effect in a specific RasType which is different than the general effect. Mike On Fri, Mar 7, 2014 at 5:31 PM, Michael Love <michaelisaiahlove@gmail.com> wrote: Hi Ming, Exploratory data analysis might be a more fruitful approach here rather than brute force combinatorics and testing. Copying from Wolfgang's recommendation in a similar situation: "my advice here would be to put less emphasis on the testing and move straight to clustering, using one of the transformations described in the DESeq2 vignette to bring the data to a 'well-behaved' (log-like) scale." "To filter out the genes that vary not much, use the range (max-min) or IQR and a subjective cutoff (e.g. retain the top 20% of genes), then use standard clustering functions (e.g. pam from the cluster package), and other exploratory data analyses (e.g. PCA) to see the types of behaviours." You might also try constructing a heatmap, as shown in the vignette, using a subset of genes which vary the most, and then explore the grouping of samples in the hierarchical clustering on the columns. For ease of visualization, this subset should probably be in the 100s. Mike On Mar 7, 2014 2:59 PM, "Ming Yi" <yi02@hotmail.com> wrote: Hi, Mike: Thx for the info, indeed I did try the following before with the following for the data in user guide: > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ condition+type) > dds <- DESeq(dds) > resultsNames(dds) [1] "Intercept" "condition_untreated_vs_treated" [3] "type_single.read_vs_paired.end" As you can see, the contrast I can get here is overall untreated_vs_treated and overall single.read_vs_paired.end, there is no subtype contrast such as treated.single.read vs treated.paired-end etc. I would love to discuss briefly what I need here. I have a dataset which has tumors and matched normal samples from many patients, and there are subtypes of the tumors, say RasOnly, RasP, RasPNot types of tumors, of course, corresponding matched would be also with subtypes of RasOnly, RasP,RasP1Hit, RasPNot, and the metadata like this: Subject SampleName Type RasType RasTum T6745_01A 49_6745 T6745_01A Tumor RasP RasP.Tumor N6745_11A 49_6745 N6745_11A Normal RasP RasP.Normal T6761_01A 49_6761 T6761_01A Tumor RasPNot RasPNot.Tumor N6761_11A 49_6761 N6761_11A Normal RasPNot RasPNot.Normal T5930_01A 50_5930 T5930_01A Tumor RasP1Hit RasP1Hit.Tumor N5930_11A 50_5930 N5930_11A Normal RasP1Hit RasP1Hit.Normal T5932_01A 50_5932 T5932_01A Tumor RasOnly RasOnly.Tumor N5932_11A 50_5932 N5932_11A Normal RasOnly RasOnly.Normal ........ Here are the contrasts I am interested to get DEGs: RasOnly.Tumor vs RasOnly.Normal RasP.Tumor vs RasP.Normal RasP.Tumor + RasP1Hit.Tumor vs RasP.Normal+RasP1Hit.Normal RasPNot.Tumor vs RasPNot.Normal RasP1Hit.Tumor vs RasP1Hit.Normal RasP.Tumor vs RasPNot.Tumor RasP.Tumor+RasP1Hit.Tumor vs RasPNot.Tumor RasOnly.Tumor vs RasPNot.Tumor RasOnly.Normal vs RasPNot.Normal RasP.Normal vs RasPNot.Normal RasPRasP1Hit.Normal vs RasPNot.Normal, Tumor vs Normal The last item Tumor vs Normal certainly can easily use design = ~ type to deal with. But many of the contrasts listed above not easy unless use the RasTum of the metadata shown above. I did try to use design=~Type+RasType here are the commands: > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~Type+RasType); > show(resultsNames(dds)) character(0) > dds <- DESeq(dds); > resultsNames(dds) [1] "Intercept" "Type_Tumor_vs_Normal" [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" [5] "RasType_RasPNot_vs_RasOnly" as you can see, I can only derive overall subtype contrasts from this way but not something like RasP.Tumor vs RasOnly.Tumor, the overall subtype contrasts for example, RasType_RasP_vs_RasOnly, consider both tumor and normal of RasP compared with those of RasOnly, which is certainly not what we want here. user guide section 3.2 did show resCtrst <- results(ddsCtrst, contrast=c("treatment","OHT","DPN")) resCtrst <- results(ddsCtrst, contrast=c(0,0,0,0,-1,1)) So besides RasTum, if you have a better way using just design = ~Type+RasType, that would be great. I did try the following: > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~RasTum); > dds <- DESeq(dds); estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model 172 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > resultsNames(dds) [1] "Intercept" [2] "RasTum_RasOnly.Tumor_vs_RasOnly.Normal" [3] "RasTum_RasP.Normal_vs_RasOnly.Normal" [4] "RasTum_RasP.Tumor_vs_RasOnly.Normal" [5] "RasTum_RasP1Hit.Normal_vs_RasOnly.Normal" [6] "RasTum_RasP1Hit.Tumor_vs_RasOnly.Normal" [7] "RasTum_RasPNot.Normal_vs_RasOnly.Normal" [8] "RasTum_RasPNot.Tumor_vs_RasOnly.Normal" I did get many contrasts as I desired, but the contrasts RasTum_RasP.Tumor_vs_RasOnly.Normal does not make sense here to me, but it shown up there in the resultsNames(dds) . based on section 3.2, I seem be able to derived more from the above contrasts: say: for contrast RasP.Tumor vs RasP.Normal, I can do: resCtrst<-result(dds, contrast=c(0,0,-1,1,0,0,0,0); But is there any better way to do the above contrasts listed above that I desire? Thx again for your advice! best Ming From: michaelisaiahlove@gmail.com Date: Fri, 7 Mar 2014 13:23:56 -0500 Subject: Re: Questions about multi-factor contrast setting in DESeq2 To: yi02@hotmail.com CC: bioconductor@r-project.org hi Ming, I'm confused why you are not following the instructions in the vignette section 1.5 Multifactor designs? You should not and we do not recommend pasting together columns like this, nor inserting + 0 into the design. Please have a look at what we do recommend in this section. Extracting contrasts is covered in vignette section 3.2 Contrasts. First take a look at the entire vignette, as we've spent a lot of time writing the documentation to try to answer user questions. we are happy to discuss the best approach for your experiment, but first we need to hear more about your aims and experiment e.g. what hypotheses you wish to test, what kind of genes you are looking to find. It's hard for us to reverse engineer a recommendation rather than to go at it from basic aims. Mike On Fri, Mar 7, 2014 at 12:52 PM, Ming Yi <yi02@hotmail.com> wrote: Hi, Mike and All: I am testing DESeq2 for multi-factor contrast setting for my own data with more complex meta data but currently use the simpler dataset from the user guide for testing purpose, and run into some issues that need your input and advice. Here are the commands (only show some more relevant outputs): library("DESeq2") library("Biobase") library("pasilla") data("pasillaGenes") countData <- counts(pasillaGenes) colData <- pData(pasillaGenes)[,c("condition","type")] colData<-data.frame(colData,paste(colData$condition,colData$type,sep=" .")) colnames(colData)[3]<-"condition_type"; > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ condition_type) > colData(dds) DataFrame with 7 rows and 4 columns condition type condition_type sizeFactor <factor> <factor> <factor> <numeric> treated1fb treated single-read treated.single-read 1.5116926 treated2fb treated paired-end treated.paired-end 0.7843521 treated3fb treated paired-end treated.paired-end 0.8958321 untreated1fb untreated single-read untreated.single-read 1.0499961 untreated2fb untreated single-read untreated.single-read 1.6585559 untreated3fb untreated paired-end untreated.paired-end 0.7117763 untreated4fb untreated paired-end untreated.paired-end 0.783745 >dds <- DESeq(dds) estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model > resultsNames(dds) [1] "Intercept" [2] "condition_type_treated.single.read_vs_treated.paired.end" [3] "condition_type_untreated.paired.end_vs_treated.paired.end" [4] "condition_type_untreated.single.read_vs_treated.paired.end" > colData(dds) DataFrame with 7 rows and 4 columns condition type condition_type sizeFactor <factor> <factor> <factor> <numeric> treated1fb treated single-read treated.single-read 1.5116926 treated2fb treated paired-end treated.paired-end 0.7843521 treated3fb treated paired-end treated.paired-end 0.8958321 untreated1fb untreated single-read untreated.single-read 1.0499961 untreated2fb untreated single-read untreated.single-read 1.6585559 untreated3fb untreated paired-end untreated.paired-end 0.7117763 untreated4fb untreated paired-end untreated.paired-end 0.7837458 Then I tried a slight diiffermry setting of the design: > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~0+ condition_type) > dds <- DESeq(dds) estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model 580 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > resultsNames(dds) [1] "condition_typetreated.paired.end" [2] "condition_type_treated.single.read_vs_treated.paired.end" [3] "condition_type_untreated.paired.end_vs_treated.paired.end" [4] "condition_type_untreated.single.read_vs_treated.paired.end" > colData(dds) DataFrame with 7 rows and 4 columns condition type condition_type sizeFactor <factor> <factor> <factor> <numeric> treated1fb treated single-read treated.single-read 1.5116926 treated2fb treated paired-end treated.paired-end 0.7843521 treated3fb treated paired-end treated.paired-end 0.8958321 untreated1fb untreated single-read untreated.single-read 1.0499961 untreated2fb untreated single-read untreated.single-read 1.6585559 untreated3fb untreated paired-end untreated.paired-end 0.7117763 untreated4fb untreated paired-end untreated.paired-end 0.7837458 Then supposedly, I can use results(dds, "condition_type_treated.single.read_vs_treated.paired.end") to get DEGs for each contrast shown in resultsNames(dds). here are my questions: 1. I used design = ~0+ condition_type instead of design = ~ condition_type in 2nd case, try to skip the intercept so that I can easily get all possible contrasts, but seem not working the way I want. 2. I tried to get all possible contrasts: but besides the contrasts shown in resultsNames(dds) in both cases, the contrasts like untreated.single.read vs treated.single.read, untreated.paired.end vs untreated.single.read not even exists in the resultsNames(dds). also I like the contrast generally like: treated (including both treated.single.read and treated.paired-end) vs untreated (including both untreated.single-read and untreated paired-end). I know for this case, we can just to design = ~condition, but I wish to do this in the same roof of one single design model although I can do a separate design. In limma and edgeR, there is a function like: con.matrix<-makeContrasts() where one can set up any contrasts under the design at will. Is there anythign like that in DESeq2? I understand we can do design(dds) <- formula(~ condition_type), but no contrast setting can be made at will. Anything in DESeq2 can get around that? 3. Also for simple contrast, I understand one can use relevel(colData(dds)$condition,"control") kind of command to set base level, but for multiple-factors contrasts as I am after, I almost need some kind of makeContrasts() mechanism to set up contrasts at will or have to do that individually one by one, which obvioulsy would be tedious and also these contrasts won't be in a single model roof. Anything can get around like that as well? if question 2 is addressed, this one shall be no problem. Thanks in advance for your help! Appreciated very much! Best Ming ATRF/NCI-Frederick, Maryland, USA. [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Ming Yi340
hi Ming, Are you using the latest release of DESeq2, version 1.2.x? The contrast functionality was implemented in this release. You can check the help for ?results to debug, i.e. to see if there is a 'contrast' argument in your installed version. You can check your versions with sessionInfo(). You can install latest versions with biocLite("DESeq2"), but you might need to upgrade to the latest release version of R, see the installation help on the Bioc website. best Mike On Sun, Mar 9, 2014 at 1:11 PM, Ming Yi <yi02@hotmail.com> wrote: > Hi, Mike: > > Thx for your advice again and I did try what you suggested as below: > > > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = > ~Type + RasType + Type:RasType); > > > dds <- DESeq(dds); > estimating size factors > estimating dispersions > gene-wise dispersion estimates > mean-dispersion relationship > final dispersion estimates > fitting generalized linear model > 105 rows did not converge in beta, labelled in mcols(object)$betaConv. Use > larger maxit argument with nbinomWaldTest > > show(resultsNames(dds)); > > [1] "Intercept" "Type_Tumor_vs_Normal" > [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" > [5] "RasType_RasPNot_vs_RasOnly" "TypeTumor.RasTypeRasP" > [7] "TypeTumor.RasTypeRasP1Hit" "TypeTumor.RasTypeRasPNot" > > certainly we can use > > res_RasP_vs_RasOnly <- results(dds,"RasType_RasP_vs_RasOnly") > > res_RasP_vs_RasOnly <- results(dds,name="RasType_RasP_vs_RasOnly") > > res_Tumor_vs_Normal<-results(dds,"Type_Tumor_vs_Normal") > > res_Tumor_vs_Normal<-results(dds,name="Type_Tumor_vs_Normal") > > but I can not do the following with contrast as suggested in section 3.2: > > res_Tumor_vs_Normal<-results(dds,contrast=c("Type","Tumor","Normal")) > Error in results(dds, contrast = c("Type", "Tumor", "Normal")) : > unused argument (contrast = c("Type", "Tumor", "Normal")) > > res_RasP_vs_RasOnly <- > results(dds,contrast=c("RasType","RasP","RasOnly")) > Error in results(dds, contrast = c("RasType", "RasP", "RasOnly")) : > unused argument (contrast = c("RasType", "RasP", "RasOnly")) > > also I can not get contrast like Tumor.RasP_Tumor.RasPNot: > > res_Tumor.RasP_Tumor.RasPNot<-results(dds,contrast=c(0,0,0,0,0,1,0 ,-1)) > Error in results(dds, contrast = c(0, 0, 0, 0, 0, 1, 0, -1)) : > unused argument (contrast = c(0, 0, 0, 0, 0, 1, 0, -1)) > > it seems the interaction terms in the design (design = ~Type + RasType + > Type:RasType) changed the behavior of results()? > > Any idea or advice? > Thanks again for your time and help! > > Ming > > ------------------------------ > From: michaelisaiahlove@gmail.com > Date: Fri, 7 Mar 2014 19:12:23 -0500 > > Subject: Re: Questions about multi-factor contrast setting in DESeq2 > To: yi02@hotmail.com > CC: bioconductor@r-project.org > > hi Ming, > > To follow up on the question about contrasts, the way to perform > comparisons like "RasOnly.Tumor vs RasOnly.Normal", would be a design: > > Type + RasType + Type:RasType > > where: > > results(dds, contrast=c("Type","Tumor","Normal")) > > tests for the general effect, > > and then the results for the interactions -- which are present in > resultsNames(dds) and can be extracted using the 'name' argument to > results() -- tests for an effect in a specific RasType which is different > than the general effect. > > Mike > > > On Fri, Mar 7, 2014 at 5:31 PM, Michael Love <michaelisaiahlove@gmail.com>wrote: > > Hi Ming, > > Exploratory data analysis might be a more fruitful approach here rather > than brute force combinatorics and testing. > > Copying from Wolfgang's recommendation in a similar situation: > > "my advice here would be to put less emphasis on the testing and move > straight to clustering, using one of the transformations described in the > DESeq2 vignette to bring the data to a 'well-behaved' (log-like) scale." > > "To filter out the genes that vary not much, use the range (max-min) or > IQR and a subjective cutoff (e.g. retain the top 20% of genes), then use > standard clustering functions (e.g. pam from the cluster package), and > other exploratory data analyses (e.g. PCA) to see the types of behaviours." > > You might also try constructing a heatmap, as shown in the vignette, using > a subset of genes which vary the most, and then explore the grouping of > samples in the hierarchical clustering on the columns. For ease of > visualization, this subset should probably be in the 100s. > > Mike > On Mar 7, 2014 2:59 PM, "Ming Yi" <yi02@hotmail.com> wrote: > > Hi, Mike: > > Thx for the info, indeed I did try the following before with the > following for the data in user guide: > > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, > design = ~ condition+type) > > dds <- DESeq(dds) > > resultsNames(dds) > [1] "Intercept" "condition_untreated_vs_treated" > [3] "type_single.read_vs_paired.end" > > As you can see, the contrast I can get here is overall > untreated_vs_treated and overall single.read_vs_paired.end, there is no > subtype contrast such as treated.single.read vs treated.paired-end etc. > > I would love to discuss briefly what I need here. I have a dataset which > has tumors and matched normal samples from many patients, and there are > subtypes of the tumors, say RasOnly, RasP, RasPNot types of tumors, of > course, corresponding matched would be also with subtypes of RasOnly, > RasP,RasP1Hit, RasPNot, and the metadata like this: > > Subject SampleName Type RasType RasTum > T6745_01A 49_6745 T6745_01A Tumor RasP RasP.Tumor > N6745_11A 49_6745 N6745_11A Normal RasP RasP.Normal > T6761_01A 49_6761 T6761_01A Tumor RasPNot RasPNot.Tumor > N6761_11A 49_6761 N6761_11A Normal RasPNot RasPNot.Normal > T5930_01A 50_5930 T5930_01A Tumor RasP1Hit RasP1Hit.Tumor > N5930_11A 50_5930 N5930_11A Normal RasP1Hit RasP1Hit.Normal > T5932_01A 50_5932 T5932_01A Tumor RasOnly RasOnly.Tumor > N5932_11A 50_5932 N5932_11A Normal RasOnly RasOnly.Normal > ........ > > Here are the contrasts I am interested to get DEGs: > RasOnly.Tumor vs RasOnly.Normal > RasP.Tumor vs RasP.Normal > RasP.Tumor + RasP1Hit.Tumor vs RasP.Normal+RasP1Hit.Normal > RasPNot.Tumor vs RasPNot.Normal > RasP1Hit.Tumor vs RasP1Hit.Normal > RasP.Tumor vs RasPNot.Tumor > RasP.Tumor+RasP1Hit.Tumor vs RasPNot.Tumor > RasOnly.Tumor vs RasPNot.Tumor > RasOnly.Normal vs RasPNot.Normal > RasP.Normal vs RasPNot.Normal > RasPRasP1Hit.Normal vs RasPNot.Normal, > Tumor vs Normal > > The last item Tumor vs Normal certainly can easily use design = ~ type to > deal with. But many of the contrasts listed above not easy unless use the > RasTum of the metadata shown above. I did try to use design=~Type+RasType > here are the commands: > > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = > ~Type+RasType); > > show(resultsNames(dds)) > character(0) > > dds <- DESeq(dds); > > resultsNames(dds) > [1] "Intercept" "Type_Tumor_vs_Normal" > [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" > [5] "RasType_RasPNot_vs_RasOnly" > > as you can see, I can only derive overall subtype contrasts from this way > but not something like RasP.Tumor vs RasOnly.Tumor, the overall subtype > contrasts for example, RasType_RasP_vs_RasOnly, consider both tumor and > normal of RasP compared with those of RasOnly, which is certainly not what > we want here. > user guide section 3.2 did show > resCtrst <- results(ddsCtrst, contrast=c("treatment","OHT","DPN")) > resCtrst <- results(ddsCtrst, contrast=c(0,0,0,0,-1,1)) > > So besides RasTum, if you have a better way using just design = > ~Type+RasType, that would be great. > I did try the following: > > > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = > ~RasTum); > > dds <- DESeq(dds); > estimating size factors > estimating dispersions > gene-wise dispersion estimates > mean-dispersion relationship > final dispersion estimates > fitting generalized linear model > 172 rows did not converge in beta, labelled in mcols(object)$betaConv. Use > larger maxit argument with nbinomWaldTest > > resultsNames(dds) > [1] "Intercept" > [2] "RasTum_RasOnly.Tumor_vs_RasOnly.Normal" > [3] "RasTum_RasP.Normal_vs_RasOnly.Normal" > [4] "RasTum_RasP.Tumor_vs_RasOnly.Normal" > [5] "RasTum_RasP1Hit.Normal_vs_RasOnly.Normal" > [6] "RasTum_RasP1Hit.Tumor_vs_RasOnly.Normal" > [7] "RasTum_RasPNot.Normal_vs_RasOnly.Normal" > [8] "RasTum_RasPNot.Tumor_vs_RasOnly.Normal" > > I did get many contrasts as I desired, but the contrasts > RasTum_RasP.Tumor_vs_RasOnly.Normal does not make sense here to me, but it > shown up there in the resultsNames(dds) . > > based on section 3.2, I seem be able to derived more from the above > contrasts: > say: for contrast RasP.Tumor vs RasP.Normal, I can do: > resCtrst<-result(dds, contrast=c(0,0,-1,1,0,0,0,0); > > But is there any better way to do the above contrasts listed above that I > desire? > > Thx again for your advice! > best > > Ming > > > > > > > > > ------------------------------ > From: michaelisaiahlove@gmail.com > Date: Fri, 7 Mar 2014 13:23:56 -0500 > Subject: Re: Questions about multi-factor contrast setting in DESeq2 > To: yi02@hotmail.com > CC: bioconductor@r-project.org > > hi Ming, > > I'm confused why you are not following the instructions in the vignette > section 1.5 Multifactor designs? You should not and we do not recommend > pasting together columns like this, nor inserting + 0 into the design. > Please have a look at what we do recommend in this section. > > Extracting contrasts is covered in vignette section 3.2 Contrasts. First > take a look at the entire vignette, as we've spent a lot of time writing > the documentation to try to answer user questions. > > we are happy to discuss the best approach for your experiment, but first > we need to hear more about your aims and experiment e.g. what hypotheses > you wish to test, what kind of genes you are looking to find. It's hard for > us to reverse engineer a recommendation rather than to go at it from basic > aims. > > Mike > > > On Fri, Mar 7, 2014 at 12:52 PM, Ming Yi <yi02@hotmail.com> wrote: > > > Hi, Mike and All: > > I am testing DESeq2 for multi-factor contrast setting for my own data with > more complex meta data but currently use the simpler dataset from the user > guide for testing purpose, and run into some issues that need your input > and advice. Here are the commands (only show some more relevant outputs): > > library("DESeq2") > library("Biobase") > library("pasilla") > data("pasillaGenes") > countData <- counts(pasillaGenes) > colData <- pData(pasillaGenes)[,c("condition","type")] > colData<-data.frame(colData,paste(colData$condition,colData$type,sep =".")) > colnames(colData)[3]<-"condition_type"; > > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, > design = ~ condition_type) > > colData(dds) > DataFrame with 7 rows and 4 columns > condition type condition_type sizeFactor > <factor> <factor> <factor> <numeric> > treated1fb treated single-read treated.single-read 1.5116926 > treated2fb treated paired-end treated.paired-end 0.7843521 > treated3fb treated paired-end treated.paired-end 0.8958321 > untreated1fb untreated single-read untreated.single-read 1.0499961 > untreated2fb untreated single-read untreated.single-read 1.6585559 > untreated3fb untreated paired-end untreated.paired-end 0.7117763 > untreated4fb untreated paired-end untreated.paired-end 0.783745 > >dds <- DESeq(dds) > estimating size factors > estimating dispersions > gene-wise dispersion estimates > mean-dispersion relationship > final dispersion estimates > fitting generalized linear model > > resultsNames(dds) > [1] "Intercept" > [2] "condition_type_treated.single.read_vs_treated.paired.end" > [3] "condition_type_untreated.paired.end_vs_treated.paired.end" > [4] "condition_type_untreated.single.read_vs_treated.paired.end" > > colData(dds) > DataFrame with 7 rows and 4 columns > condition type condition_type sizeFactor > <factor> <factor> <factor> <numeric> > treated1fb treated single-read treated.single-read 1.5116926 > treated2fb treated paired-end treated.paired-end 0.7843521 > treated3fb treated paired-end treated.paired-end 0.8958321 > untreated1fb untreated single-read untreated.single-read 1.0499961 > untreated2fb untreated single-read untreated.single-read 1.6585559 > untreated3fb untreated paired-end untreated.paired-end 0.7117763 > untreated4fb untreated paired-end untreated.paired-end 0.7837458 > > Then I tried a slight diiffermry setting of the design: > > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, > design = ~0+ condition_type) > > dds <- DESeq(dds) > estimating size factors > estimating dispersions > gene-wise dispersion estimates > mean-dispersion relationship > final dispersion estimates > fitting generalized linear model > 580 rows did not converge in beta, labelled in mcols(object)$betaConv. Use > larger maxit argument with nbinomWaldTest > > resultsNames(dds) > [1] "condition_typetreated.paired.end" > [2] "condition_type_treated.single.read_vs_treated.paired.end" > [3] "condition_type_untreated.paired.end_vs_treated.paired.end" > [4] "condition_type_untreated.single.read_vs_treated.paired.end" > > colData(dds) > DataFrame with 7 rows and 4 columns > condition type condition_type sizeFactor > <factor> <factor> <factor> <numeric> > treated1fb treated single-read treated.single-read 1.5116926 > treated2fb treated paired-end treated.paired-end 0.7843521 > treated3fb treated paired-end treated.paired-end 0.8958321 > untreated1fb untreated single-read untreated.single-read 1.0499961 > untreated2fb untreated single-read untreated.single-read 1.6585559 > untreated3fb untreated paired-end untreated.paired-end 0.7117763 > untreated4fb untreated paired-end untreated.paired-end 0.7837458 > > Then supposedly, I can use results(dds, > "condition_type_treated.single.read_vs_treated.paired.end") to get DEGs for > each contrast shown in resultsNames(dds). > > here are my questions: > 1. I used design = ~0+ condition_type instead of design = ~ condition_type > in 2nd case, try to skip the intercept so that I can easily get > all possible contrasts, but seem not working the way I want. > 2. I tried to get all possible contrasts: but besides the contrasts shown > in resultsNames(dds) in both cases, the contrasts > like untreated.single.read vs treated.single.read, untreated.paired.end vs > untreated.single.read not even exists in the resultsNames(dds). also I like > the contrast generally like: treated (including both treated.single.read > and treated.paired-end) vs untreated (including both untreated .single-read > and untreated paired-end). I know for this case, we can just to design = > ~condition, but I wish to do this in the same roof of one single design > model although I can do a separate design. In limma and edgeR, there is a > function like: con.matrix<-makeContrasts() where one can set up > any contrasts under the design at will. Is there anythign like that in > DESeq2? I understand we can do design(dds) <- formula(~ condition_type), > but no contrast setting can be made at will. Anything in DESeq2 can get > around that? > 3. Also for simple contrast, I understand one can use > relevel(colData(dds)$condition,"control") kind of command to set base > level, but for multiple-factors contrasts as I am after, I almost need some > kind of makeContrasts() mechanism to set up contrasts at will or have to do > that individually one by one, which obvioulsy would be tedious and also > these contrasts won't be in a single model roof. Anything can get around > like that as well? if question 2 is addressed, this one shall be no problem. > > Thanks in advance for your help! Appreciated very much! > > Best > > Ming > ATRF/NCI-Frederick, > Maryland, USA. > > > > > > > > > > [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Michael Love15k
Hi, Mike: Thanks for the info! Yes, your suggestion, since I did check the function, which seems not having the contrast parameter for function results, and the sessioninfo() did show our version in our server may be a bit old. I will get the new version and test again. Thx again and best Ming > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] DESeq2_1.0.19 RcppArmadillo_0.4.000.4 Rcpp_0.11.0 [4] lattice_0.20-24 Biobase_2.22.0 GenomicRanges_1.12.5 [7] IRanges_1.18.4 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] annotate_1.38.0 AnnotationDbi_1.22.6 DBI_0.2-7 [4] genefilter_1.42.0 grid_3.0.1 locfit_1.5-9.1 [7] RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.0.1 [10] stats4_3.0.1 survival_2.37-7 tools_3.0.1 [13] XML_3.98-1.1 xtable_1.7-1 From: michaelisaiahlove@gmail.com Date: Sun, 9 Mar 2014 13:35:34 -0400 Subject: Re: Questions about multi-factor contrast setting in DESeq2 To: yi02@hotmail.com CC: bioconductor@r-project.org hi Ming, Are you using the latest release of DESeq2, version 1.2.x? The contrast functionality was implemented in this release. You can check the help for ?results to debug, i.e. to see if there is a 'contrast' argument in your installed version. You can check your versions with sessionInfo(). You can install latest versions with biocLite("DESeq2"), but you might need to upgrade to the latest release version of R, see the installation help on the Bioc website. best Mike On Sun, Mar 9, 2014 at 1:11 PM, Ming Yi <yi02@hotmail.com> wrote: Hi, Mike: Thx for your advice again and I did try what you suggested as below: > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~Type + RasType + Type:RasType); > dds <- DESeq(dds); estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model 105 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > show(resultsNames(dds)); [1] "Intercept" "Type_Tumor_vs_Normal" [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" [5] "RasType_RasPNot_vs_RasOnly" "TypeTumor.RasTypeRasP" [7] "TypeTumor.RasTypeRasP1Hit" "TypeTumor.RasTypeRasPNot" certainly we can use > res_RasP_vs_RasOnly <- results(dds,"RasType_RasP_vs_RasOnly") > res_RasP_vs_RasOnly <- results(dds,name="RasType_RasP_vs_RasOnly") > res_Tumor_vs_Normal<-results(dds,"Type_Tumor_vs_Normal") > res_Tumor_vs_Normal<-results(dds,name="Type_Tumor_vs_Normal") but I can not do the following with contrast as suggested in section 3.2: > res_Tumor_vs_Normal<-results(dds,contrast=c("Type","Tumor","Normal")) Error in results(dds, contrast = c("Type", "Tumor", "Normal")) : unused argument (contrast = c("Type", "Tumor", "Normal")) > res_RasP_vs_RasOnly <- results(dds,contrast=c("RasType","RasP","RasOnly")) Error in results(dds, contrast = c("RasType", "RasP", "RasOnly")) : unused argument (contrast = c("RasType", "RasP", "RasOnly")) also I can not get contrast like Tumor.RasP_Tumor.RasPNot: > res_Tumor.RasP_Tumor.RasPNot<-results(dds,contrast=c(0,0,0,0,0,1,0,- 1)) Error in results(dds, contrast = c(0, 0, 0, 0, 0, 1, 0, -1)) : unused argument (contrast = c(0, 0, 0, 0, 0, 1, 0, -1)) it seems the interaction terms in the design (design = ~Type + RasType + Type:RasType) changed the behavior of results()? Any idea or advice? Thanks again for your time and help! Ming From: michaelisaiahlove@gmail.com Date: Fri, 7 Mar 2014 19:12:23 -0500 Subject: Re: Questions about multi-factor contrast setting in DESeq2 To: yi02@hotmail.com CC: bioconductor@r-project.org hi Ming, To follow up on the question about contrasts, the way to perform comparisons like "RasOnly.Tumor vs RasOnly.Normal", would be a design: Type + RasType + Type:RasType where: results(dds, contrast=c("Type","Tumor","Normal")) tests for the general effect, and then the results for the interactions -- which are present in resultsNames(dds) and can be extracted using the 'name' argument to results() -- tests for an effect in a specific RasType which is different than the general effect. Mike On Fri, Mar 7, 2014 at 5:31 PM, Michael Love <michaelisaiahlove@gmail.com> wrote: Hi Ming, Exploratory data analysis might be a more fruitful approach here rather than brute force combinatorics and testing. Copying from Wolfgang's recommendation in a similar situation: "my advice here would be to put less emphasis on the testing and move straight to clustering, using one of the transformations described in the DESeq2 vignette to bring the data to a 'well-behaved' (log-like) scale." "To filter out the genes that vary not much, use the range (max-min) or IQR and a subjective cutoff (e.g. retain the top 20% of genes), then use standard clustering functions (e.g. pam from the cluster package), and other exploratory data analyses (e.g. PCA) to see the types of behaviours." You might also try constructing a heatmap, as shown in the vignette, using a subset of genes which vary the most, and then explore the grouping of samples in the hierarchical clustering on the columns. For ease of visualization, this subset should probably be in the 100s. Mike On Mar 7, 2014 2:59 PM, "Ming Yi" <yi02@hotmail.com> wrote: Hi, Mike: Thx for the info, indeed I did try the following before with the following for the data in user guide: > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ condition+type) > dds <- DESeq(dds) > resultsNames(dds) [1] "Intercept" "condition_untreated_vs_treated" [3] "type_single.read_vs_paired.end" As you can see, the contrast I can get here is overall untreated_vs_treated and overall single.read_vs_paired.end, there is no subtype contrast such as treated.single.read vs treated.paired-end etc. I would love to discuss briefly what I need here. I have a dataset which has tumors and matched normal samples from many patients, and there are subtypes of the tumors, say RasOnly, RasP, RasPNot types of tumors, of course, corresponding matched would be also with subtypes of RasOnly, RasP,RasP1Hit, RasPNot, and the metadata like this: Subject SampleName Type RasType RasTum T6745_01A 49_6745 T6745_01A Tumor RasP RasP.Tumor N6745_11A 49_6745 N6745_11A Normal RasP RasP.Normal T6761_01A 49_6761 T6761_01A Tumor RasPNot RasPNot.Tumor N6761_11A 49_6761 N6761_11A Normal RasPNot RasPNot.Normal T5930_01A 50_5930 T5930_01A Tumor RasP1Hit RasP1Hit.Tumor N5930_11A 50_5930 N5930_11A Normal RasP1Hit RasP1Hit.Normal T5932_01A 50_5932 T5932_01A Tumor RasOnly RasOnly.Tumor N5932_11A 50_5932 N5932_11A Normal RasOnly RasOnly.Normal ........ Here are the contrasts I am interested to get DEGs: RasOnly.Tumor vs RasOnly.Normal RasP.Tumor vs RasP.Normal RasP.Tumor + RasP1Hit.Tumor vs RasP.Normal+RasP1Hit.Normal RasPNot.Tumor vs RasPNot.Normal RasP1Hit.Tumor vs RasP1Hit.Normal RasP.Tumor vs RasPNot.Tumor RasP.Tumor+RasP1Hit.Tumor vs RasPNot.Tumor RasOnly.Tumor vs RasPNot.Tumor RasOnly.Normal vs RasPNot.Normal RasP.Normal vs RasPNot.Normal RasPRasP1Hit.Normal vs RasPNot.Normal, Tumor vs Normal The last item Tumor vs Normal certainly can easily use design = ~ type to deal with. But many of the contrasts listed above not easy unless use the RasTum of the metadata shown above. I did try to use design=~Type+RasType here are the commands: > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~Type+RasType); > show(resultsNames(dds)) character(0) > dds <- DESeq(dds); > resultsNames(dds) [1] "Intercept" "Type_Tumor_vs_Normal" [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" [5] "RasType_RasPNot_vs_RasOnly" as you can see, I can only derive overall subtype contrasts from this way but not something like RasP.Tumor vs RasOnly.Tumor, the overall subtype contrasts for example, RasType_RasP_vs_RasOnly, consider both tumor and normal of RasP compared with those of RasOnly, which is certainly not what we want here. user guide section 3.2 did show resCtrst <- results(ddsCtrst, contrast=c("treatment","OHT","DPN")) resCtrst <- results(ddsCtrst, contrast=c(0,0,0,0,-1,1)) So besides RasTum, if you have a better way using just design = ~Type+RasType, that would be great. I did try the following: > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~RasTum); > dds <- DESeq(dds); estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model 172 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > resultsNames(dds) [1] "Intercept" [2] "RasTum_RasOnly.Tumor_vs_RasOnly.Normal" [3] "RasTum_RasP.Normal_vs_RasOnly.Normal" [4] "RasTum_RasP.Tumor_vs_RasOnly.Normal" [5] "RasTum_RasP1Hit.Normal_vs_RasOnly.Normal" [6] "RasTum_RasP1Hit.Tumor_vs_RasOnly.Normal" [7] "RasTum_RasPNot.Normal_vs_RasOnly.Normal" [8] "RasTum_RasPNot.Tumor_vs_RasOnly.Normal" I did get many contrasts as I desired, but the contrasts RasTum_RasP.Tumor_vs_RasOnly.Normal does not make sense here to me, but it shown up there in the resultsNames(dds) . based on section 3.2, I seem be able to derived more from the above contrasts: say: for contrast RasP.Tumor vs RasP.Normal, I can do: resCtrst<-result(dds, contrast=c(0,0,-1,1,0,0,0,0); But is there any better way to do the above contrasts listed above that I desire? Thx again for your advice! best Ming From: michaelisaiahlove@gmail.com Date: Fri, 7 Mar 2014 13:23:56 -0500 Subject: Re: Questions about multi-factor contrast setting in DESeq2 To: yi02@hotmail.com CC: bioconductor@r-project.org hi Ming, I'm confused why you are not following the instructions in the vignette section 1.5 Multifactor designs? You should not and we do not recommend pasting together columns like this, nor inserting + 0 into the design. Please have a look at what we do recommend in this section. Extracting contrasts is covered in vignette section 3.2 Contrasts. First take a look at the entire vignette, as we've spent a lot of time writing the documentation to try to answer user questions. we are happy to discuss the best approach for your experiment, but first we need to hear more about your aims and experiment e.g. what hypotheses you wish to test, what kind of genes you are looking to find. It's hard for us to reverse engineer a recommendation rather than to go at it from basic aims. Mike On Fri, Mar 7, 2014 at 12:52 PM, Ming Yi <yi02@hotmail.com> wrote: Hi, Mike and All: I am testing DESeq2 for multi-factor contrast setting for my own data with more complex meta data but currently use the simpler dataset from the user guide for testing purpose, and run into some issues that need your input and advice. Here are the commands (only show some more relevant outputs): library("DESeq2") library("Biobase") library("pasilla") data("pasillaGenes") countData <- counts(pasillaGenes) colData <- pData(pasillaGenes)[,c("condition","type")] colData<-data.frame(colData,paste(colData$condition,colData$type,sep=" .")) colnames(colData)[3]<-"condition_type"; > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ condition_type) > colData(dds) DataFrame with 7 rows and 4 columns condition type condition_type sizeFactor <factor> <factor> <factor> <numeric> treated1fb treated single-read treated.single-read 1.5116926 treated2fb treated paired-end treated.paired-end 0.7843521 treated3fb treated paired-end treated.paired-end 0.8958321 untreated1fb untreated single-read untreated.single-read 1.0499961 untreated2fb untreated single-read untreated.single-read 1.6585559 untreated3fb untreated paired-end untreated.paired-end 0.7117763 untreated4fb untreated paired-end untreated.paired-end 0.783745 >dds <- DESeq(dds) estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model > resultsNames(dds) [1] "Intercept" [2] "condition_type_treated.single.read_vs_treated.paired.end" [3] "condition_type_untreated.paired.end_vs_treated.paired.end" [4] "condition_type_untreated.single.read_vs_treated.paired.end" > colData(dds) DataFrame with 7 rows and 4 columns condition type condition_type sizeFactor <factor> <factor> <factor> <numeric> treated1fb treated single-read treated.single-read 1.5116926 treated2fb treated paired-end treated.paired-end 0.7843521 treated3fb treated paired-end treated.paired-end 0.8958321 untreated1fb untreated single-read untreated.single-read 1.0499961 untreated2fb untreated single-read untreated.single-read 1.6585559 untreated3fb untreated paired-end untreated.paired-end 0.7117763 untreated4fb untreated paired-end untreated.paired-end 0.7837458 Then I tried a slight diiffermry setting of the design: > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~0+ condition_type) > dds <- DESeq(dds) estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model 580 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > resultsNames(dds) [1] "condition_typetreated.paired.end" [2] "condition_type_treated.single.read_vs_treated.paired.end" [3] "condition_type_untreated.paired.end_vs_treated.paired.end" [4] "condition_type_untreated.single.read_vs_treated.paired.end" > colData(dds) DataFrame with 7 rows and 4 columns condition type condition_type sizeFactor <factor> <factor> <factor> <numeric> treated1fb treated single-read treated.single-read 1.5116926 treated2fb treated paired-end treated.paired-end 0.7843521 treated3fb treated paired-end treated.paired-end 0.8958321 untreated1fb untreated single-read untreated.single-read 1.0499961 untreated2fb untreated single-read untreated.single-read 1.6585559 untreated3fb untreated paired-end untreated.paired-end 0.7117763 untreated4fb untreated paired-end untreated.paired-end 0.7837458 Then supposedly, I can use results(dds, "condition_type_treated.single.read_vs_treated.paired.end") to get DEGs for each contrast shown in resultsNames(dds). here are my questions: 1. I used design = ~0+ condition_type instead of design = ~ condition_type in 2nd case, try to skip the intercept so that I can easily get all possible contrasts, but seem not working the way I want. 2. I tried to get all possible contrasts: but besides the contrasts shown in resultsNames(dds) in both cases, the contrasts like untreated.single.read vs treated.single.read, untreated.paired.end vs untreated.single.read not even exists in the resultsNames(dds). also I like the contrast generally like: treated (including both treated.single.read and treated.paired-end) vs untreated (including both untreated.single-read and untreated paired-end). I know for this case, we can just to design = ~condition, but I wish to do this in the same roof of one single design model although I can do a separate design. In limma and edgeR, there is a function like: con.matrix<-makeContrasts() where one can set up any contrasts under the design at will. Is there anythign like that in DESeq2? I understand we can do design(dds) <- formula(~ condition_type), but no contrast setting can be made at will. Anything in DESeq2 can get around that? 3. Also for simple contrast, I understand one can use relevel(colData(dds)$condition,"control") kind of command to set base level, but for multiple-factors contrasts as I am after, I almost need some kind of makeContrasts() mechanism to set up contrasts at will or have to do that individually one by one, which obvioulsy would be tedious and also these contrasts won't be in a single model roof. Anything can get around like that as well? if question 2 is addressed, this one shall be no problem. Thanks in advance for your help! Appreciated very much! Best Ming ATRF/NCI-Frederick, Maryland, USA. [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Ming Yi340
Hi, Mike: Thanks for the help. Now we have updated the R and bioconductor version as well as the DESeq2. Here is sessionInfo() R version 3.0.3 (2014-03-06) Platform: x86_64-unknown-linux-gnu (64-bit) [1] DESeq2_1.2.10 All the function calls now seem working, But I still got some issues for the contrasts I desired; > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~Type + RasType + Type:RasType); Usage note: the following factors have 3 or more levels: RasType For DESeq2 versions < 1.3, if you plan on extracting results for these factors, we recommend using betaPrior=FALSE as an argument when calling DESeq(). ... > colData(dds)$RasType <- factor(colData(dds)$RasType,levels=c("RasOnl y","RasP","RasP1Hit","RasPNot")); > colData(dds)$Type <- factor(colData(dds)$Type,levels=c("Normal","Tumor")) > dds<-DESeq(dds, betaPrior=FALSE); estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing > show(resultsNames(dds)); [1] "Intercept" "Type_Tumor_vs_Normal" [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" [5] "RasType_RasPNot_vs_RasOnly" "TypeTumor.RasTypeRasP" [7] "TypeTumor.RasTypeRasP1Hit" "TypeTumor.RasTypeRasPNot" > res_RasP_vs_RasOnly <- results(dds,"RasType_RasP_vs_RasOnly") > res_RasP_vs_RasOnly <- results(dds,name="RasType_RasP_vs_RasOnly") > res_Tumor_vs_Normal<-results(dds,contrast=c("Type","Tumor","Normal")) > res_RasP_vs_RasOnly <- results(dds,contrast=c("RasType","RasP","RasOnly")) > res_Tumor.RasP_vs_Tumor.RasPNot<-results(dds,contrast=c(0,0,0,0,0,1, 0,-1)) I can get above contrast results working and got the DEGs lists. However, although I can get contrast result Tumor.RasP_vs_Tumor.RasPNot using contrast=c(0,0,0,0,0,1,0,-1) as guide indicated, some of desired contrasts I am not sure how to get: I have 4 levels of RasType: "RasOnly","RasP","RasP1Hit","RasPNot", it seems that RasOnly might be in intercept, since I did not see it in show(resultsNames(dds)). And I am interested in contrast: Tumor.RasOnly_Tumor.RasPNot, how do I get the results for this contrast? my best guess is if TypeTumor.RasTypeRasOnly is at intercept, I can potentially get using below: Res_Tumor.RasOnly_vs_Tumor.RasPNot <-results(dds, contrast=c(1,0,0,0,0,0,0,-1). However, I like to get the results of contrast: RasOnly.Normal_vs_RasPNot.Normal just to check normal contrast as background, then I am stuck, since no TypeNormal.RasTypeRasP contrast terms etc shown up in show(resultsNames(dds)). Any advice here? Thanks so much! Best Ming From: michaelisaiahlove@gmail.com Date: Sun, 9 Mar 2014 13:35:34 -0400 Subject: Re: Questions about multi-factor contrast setting in DESeq2 To: yi02@hotmail.com CC: bioconductor@r-project.org hi Ming, Are you using the latest release of DESeq2, version 1.2.x? The contrast functionality was implemented in this release. You can check the help for ?results to debug, i.e. to see if there is a 'contrast' argument in your installed version. You can check your versions with sessionInfo(). You can install latest versions with biocLite("DESeq2"), but you might need to upgrade to the latest release version of R, see the installation help on the Bioc website. best Mike On Sun, Mar 9, 2014 at 1:11 PM, Ming Yi <yi02@hotmail.com> wrote: Hi, Mike: Thx for your advice again and I did try what you suggested as below: > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~Type + RasType + Type:RasType); > dds <- DESeq(dds); estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model 105 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > show(resultsNames(dds)); [1] "Intercept" "Type_Tumor_vs_Normal" [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" [5] "RasType_RasPNot_vs_RasOnly" "TypeTumor.RasTypeRasP" [7] "TypeTumor.RasTypeRasP1Hit" "TypeTumor.RasTypeRasPNot" certainly we can use > res_RasP_vs_RasOnly <- results(dds,"RasType_RasP_vs_RasOnly") > res_RasP_vs_RasOnly <- results(dds,name="RasType_RasP_vs_RasOnly") > res_Tumor_vs_Normal<-results(dds,"Type_Tumor_vs_Normal") > res_Tumor_vs_Normal<-results(dds,name="Type_Tumor_vs_Normal") but I can not do the following with contrast as suggested in section 3.2: > res_Tumor_vs_Normal<-results(dds,contrast=c("Type","Tumor","Normal")) Error in results(dds, contrast = c("Type", "Tumor", "Normal")) : unused argument (contrast = c("Type", "Tumor", "Normal")) > res_RasP_vs_RasOnly <- results(dds,contrast=c("RasType","RasP","RasOnly")) Error in results(dds, contrast = c("RasType", "RasP", "RasOnly")) : unused argument (contrast = c("RasType", "RasP", "RasOnly")) also I can not get contrast like Tumor.RasP_Tumor.RasPNot: > res_Tumor.RasP_Tumor.RasPNot<-results(dds,contrast=c(0,0,0,0,0,1,0,- 1)) Error in results(dds, contrast = c(0, 0, 0, 0, 0, 1, 0, -1)) : unused argument (contrast = c(0, 0, 0, 0, 0, 1, 0, -1)) it seems the interaction terms in the design (design = ~Type + RasType + Type:RasType) changed the behavior of results()? Any idea or advice? Thanks again for your time and help! Ming From: michaelisaiahlove@gmail.com Date: Fri, 7 Mar 2014 19:12:23 -0500 Subject: Re: Questions about multi-factor contrast setting in DESeq2 To: yi02@hotmail.com CC: bioconductor@r-project.org hi Ming, To follow up on the question about contrasts, the way to perform comparisons like "RasOnly.Tumor vs RasOnly.Normal", would be a design: Type + RasType + Type:RasType where: results(dds, contrast=c("Type","Tumor","Normal")) tests for the general effect, and then the results for the interactions -- which are present in resultsNames(dds) and can be extracted using the 'name' argument to results() -- tests for an effect in a specific RasType which is different than the general effect. Mike On Fri, Mar 7, 2014 at 5:31 PM, Michael Love <michaelisaiahlove@gmail.com> wrote: Hi Ming, Exploratory data analysis might be a more fruitful approach here rather than brute force combinatorics and testing. Copying from Wolfgang's recommendation in a similar situation: "my advice here would be to put less emphasis on the testing and move straight to clustering, using one of the transformations described in the DESeq2 vignette to bring the data to a 'well-behaved' (log-like) scale." "To filter out the genes that vary not much, use the range (max-min) or IQR and a subjective cutoff (e.g. retain the top 20% of genes), then use standard clustering functions (e.g. pam from the cluster package), and other exploratory data analyses (e.g. PCA) to see the types of behaviours." You might also try constructing a heatmap, as shown in the vignette, using a subset of genes which vary the most, and then explore the grouping of samples in the hierarchical clustering on the columns. For ease of visualization, this subset should probably be in the 100s. Mike On Mar 7, 2014 2:59 PM, "Ming Yi" <yi02@hotmail.com> wrote: Hi, Mike: Thx for the info, indeed I did try the following before with the following for the data in user guide: > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ condition+type) > dds <- DESeq(dds) > resultsNames(dds) [1] "Intercept" "condition_untreated_vs_treated" [3] "type_single.read_vs_paired.end" As you can see, the contrast I can get here is overall untreated_vs_treated and overall single.read_vs_paired.end, there is no subtype contrast such as treated.single.read vs treated.paired-end etc. I would love to discuss briefly what I need here. I have a dataset which has tumors and matched normal samples from many patients, and there are subtypes of the tumors, say RasOnly, RasP, RasPNot types of tumors, of course, corresponding matched would be also with subtypes of RasOnly, RasP,RasP1Hit, RasPNot, and the metadata like this: Subject SampleName Type RasType RasTum T6745_01A 49_6745 T6745_01A Tumor RasP RasP.Tumor N6745_11A 49_6745 N6745_11A Normal RasP RasP.Normal T6761_01A 49_6761 T6761_01A Tumor RasPNot RasPNot.Tumor N6761_11A 49_6761 N6761_11A Normal RasPNot RasPNot.Normal T5930_01A 50_5930 T5930_01A Tumor RasP1Hit RasP1Hit.Tumor N5930_11A 50_5930 N5930_11A Normal RasP1Hit RasP1Hit.Normal T5932_01A 50_5932 T5932_01A Tumor RasOnly RasOnly.Tumor N5932_11A 50_5932 N5932_11A Normal RasOnly RasOnly.Normal ........ Here are the contrasts I am interested to get DEGs: RasOnly.Tumor vs RasOnly.Normal RasP.Tumor vs RasP.Normal RasP.Tumor + RasP1Hit.Tumor vs RasP.Normal+RasP1Hit.Normal RasPNot.Tumor vs RasPNot.Normal RasP1Hit.Tumor vs RasP1Hit.Normal RasP.Tumor vs RasPNot.Tumor RasP.Tumor+RasP1Hit.Tumor vs RasPNot.Tumor RasOnly.Tumor vs RasPNot.Tumor RasOnly.Normal vs RasPNot.Normal RasP.Normal vs RasPNot.Normal RasPRasP1Hit.Normal vs RasPNot.Normal, Tumor vs Normal The last item Tumor vs Normal certainly can easily use design = ~ type to deal with. But many of the contrasts listed above not easy unless use the RasTum of the metadata shown above. I did try to use design=~Type+RasType here are the commands: > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~Type+RasType); > show(resultsNames(dds)) character(0) > dds <- DESeq(dds); > resultsNames(dds) [1] "Intercept" "Type_Tumor_vs_Normal" [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" [5] "RasType_RasPNot_vs_RasOnly" as you can see, I can only derive overall subtype contrasts from this way but not something like RasP.Tumor vs RasOnly.Tumor, the overall subtype contrasts for example, RasType_RasP_vs_RasOnly, consider both tumor and normal of RasP compared with those of RasOnly, which is certainly not what we want here. user guide section 3.2 did show resCtrst <- results(ddsCtrst, contrast=c("treatment","OHT","DPN")) resCtrst <- results(ddsCtrst, contrast=c(0,0,0,0,-1,1)) So besides RasTum, if you have a better way using just design = ~Type+RasType, that would be great. I did try the following: > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~RasTum); > dds <- DESeq(dds); estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model 172 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > resultsNames(dds) [1] "Intercept" [2] "RasTum_RasOnly.Tumor_vs_RasOnly.Normal" [3] "RasTum_RasP.Normal_vs_RasOnly.Normal" [4] "RasTum_RasP.Tumor_vs_RasOnly.Normal" [5] "RasTum_RasP1Hit.Normal_vs_RasOnly.Normal" [6] "RasTum_RasP1Hit.Tumor_vs_RasOnly.Normal" [7] "RasTum_RasPNot.Normal_vs_RasOnly.Normal" [8] "RasTum_RasPNot.Tumor_vs_RasOnly.Normal" I did get many contrasts as I desired, but the contrasts RasTum_RasP.Tumor_vs_RasOnly.Normal does not make sense here to me, but it shown up there in the resultsNames(dds) . based on section 3.2, I seem be able to derived more from the above contrasts: say: for contrast RasP.Tumor vs RasP.Normal, I can do: resCtrst<-result(dds, contrast=c(0,0,-1,1,0,0,0,0); But is there any better way to do the above contrasts listed above that I desire? Thx again for your advice! best Ming From: michaelisaiahlove@gmail.com Date: Fri, 7 Mar 2014 13:23:56 -0500 Subject: Re: Questions about multi-factor contrast setting in DESeq2 To: yi02@hotmail.com CC: bioconductor@r-project.org hi Ming, I'm confused why you are not following the instructions in the vignette section 1.5 Multifactor designs? You should not and we do not recommend pasting together columns like this, nor inserting + 0 into the design. Please have a look at what we do recommend in this section. Extracting contrasts is covered in vignette section 3.2 Contrasts. First take a look at the entire vignette, as we've spent a lot of time writing the documentation to try to answer user questions. we are happy to discuss the best approach for your experiment, but first we need to hear more about your aims and experiment e.g. what hypotheses you wish to test, what kind of genes you are looking to find. It's hard for us to reverse engineer a recommendation rather than to go at it from basic aims. Mike On Fri, Mar 7, 2014 at 12:52 PM, Ming Yi <yi02@hotmail.com> wrote: Hi, Mike and All: I am testing DESeq2 for multi-factor contrast setting for my own data with more complex meta data but currently use the simpler dataset from the user guide for testing purpose, and run into some issues that need your input and advice. Here are the commands (only show some more relevant outputs): library("DESeq2") library("Biobase") library("pasilla") data("pasillaGenes") countData <- counts(pasillaGenes) colData <- pData(pasillaGenes)[,c("condition","type")] colData<-data.frame(colData,paste(colData$condition,colData$type,sep=" .")) colnames(colData)[3]<-"condition_type"; > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ condition_type) > colData(dds) DataFrame with 7 rows and 4 columns condition type condition_type sizeFactor <factor> <factor> <factor> <numeric> treated1fb treated single-read treated.single-read 1.5116926 treated2fb treated paired-end treated.paired-end 0.7843521 treated3fb treated paired-end treated.paired-end 0.8958321 untreated1fb untreated single-read untreated.single-read 1.0499961 untreated2fb untreated single-read untreated.single-read 1.6585559 untreated3fb untreated paired-end untreated.paired-end 0.7117763 untreated4fb untreated paired-end untreated.paired-end 0.783745 >dds <- DESeq(dds) estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model > resultsNames(dds) [1] "Intercept" [2] "condition_type_treated.single.read_vs_treated.paired.end" [3] "condition_type_untreated.paired.end_vs_treated.paired.end" [4] "condition_type_untreated.single.read_vs_treated.paired.end" > colData(dds) DataFrame with 7 rows and 4 columns condition type condition_type sizeFactor <factor> <factor> <factor> <numeric> treated1fb treated single-read treated.single-read 1.5116926 treated2fb treated paired-end treated.paired-end 0.7843521 treated3fb treated paired-end treated.paired-end 0.8958321 untreated1fb untreated single-read untreated.single-read 1.0499961 untreated2fb untreated single-read untreated.single-read 1.6585559 untreated3fb untreated paired-end untreated.paired-end 0.7117763 untreated4fb untreated paired-end untreated.paired-end 0.7837458 Then I tried a slight diiffermry setting of the design: > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~0+ condition_type) > dds <- DESeq(dds) estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model 580 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > resultsNames(dds) [1] "condition_typetreated.paired.end" [2] "condition_type_treated.single.read_vs_treated.paired.end" [3] "condition_type_untreated.paired.end_vs_treated.paired.end" [4] "condition_type_untreated.single.read_vs_treated.paired.end" > colData(dds) DataFrame with 7 rows and 4 columns condition type condition_type sizeFactor <factor> <factor> <factor> <numeric> treated1fb treated single-read treated.single-read 1.5116926 treated2fb treated paired-end treated.paired-end 0.7843521 treated3fb treated paired-end treated.paired-end 0.8958321 untreated1fb untreated single-read untreated.single-read 1.0499961 untreated2fb untreated single-read untreated.single-read 1.6585559 untreated3fb untreated paired-end untreated.paired-end 0.7117763 untreated4fb untreated paired-end untreated.paired-end 0.7837458 Then supposedly, I can use results(dds, "condition_type_treated.single.read_vs_treated.paired.end") to get DEGs for each contrast shown in resultsNames(dds). here are my questions: 1. I used design = ~0+ condition_type instead of design = ~ condition_type in 2nd case, try to skip the intercept so that I can easily get all possible contrasts, but seem not working the way I want. 2. I tried to get all possible contrasts: but besides the contrasts shown in resultsNames(dds) in both cases, the contrasts like untreated.single.read vs treated.single.read, untreated.paired.end vs untreated.single.read not even exists in the resultsNames(dds). also I like the contrast generally like: treated (including both treated.single.read and treated.paired-end) vs untreated (including both untreated.single-read and untreated paired-end). I know for this case, we can just to design = ~condition, but I wish to do this in the same roof of one single design model although I can do a separate design. In limma and edgeR, there is a function like: con.matrix<-makeContrasts() where one can set up any contrasts under the design at will. Is there anythign like that in DESeq2? I understand we can do design(dds) <- formula(~ condition_type), but no contrast setting can be made at will. Anything in DESeq2 can get around that? 3. Also for simple contrast, I understand one can use relevel(colData(dds)$condition,"control") kind of command to set base level, but for multiple-factors contrasts as I am after, I almost need some kind of makeContrasts() mechanism to set up contrasts at will or have to do that individually one by one, which obvioulsy would be tedious and also these contrasts won't be in a single model roof. Anything can get around like that as well? if question 2 is addressed, this one shall be no problem. Thanks in advance for your help! Appreciated very much! Best Ming ATRF/NCI-Frederick, Maryland, USA. [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Ming Yi340
hi Ming, On Mon, Mar 10, 2014 at 10:27 AM, Ming Yi <yi02 at="" hotmail.com=""> wrote: > > Hi, Mike: > > Thanks for the help. Now we have updated the R and bioconductor version as well as the DESeq2. Here is sessionInfo() > R version 3.0.3 (2014-03-06) > Platform: x86_64-unknown-linux-gnu (64-bit) > [1] DESeq2_1.2.10 > > All the function calls now seem working, But I still got some issues for the contrasts I desired; > > > > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~Type + RasType + Type:RasType); > Usage note: the following factors have 3 or more levels: > RasType > For DESeq2 versions < 1.3, if you plan on extracting results for > these factors, we recommend using betaPrior=FALSE as an argument > when calling DESeq(). > ... just FYI, here are mailing list threads addressing this note in version 1.2: http://permalink.gmane.org/gmane.science.biology.informatics.conductor /51749 http://permalink.gmane.org/gmane.science.biology.informatics.conductor /52331 > > > colData(dds)$RasType <- factor(colData(dds)$RasType,levels=c("RasO nly","RasP","RasP1Hit","RasPNot")); > > colData(dds)$Type <- factor(colData(dds)$Type,levels=c("Normal","Tumor")) > > dds<-DESeq(dds, betaPrior=FALSE); > > estimating size factors > estimating dispersions > gene-wise dispersion estimates > mean-dispersion relationship > final dispersion estimates > fitting model and testing > > > show(resultsNames(dds)); > [1] "Intercept" "Type_Tumor_vs_Normal" > [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" > [5] "RasType_RasPNot_vs_RasOnly" "TypeTumor.RasTypeRasP" > [7] "TypeTumor.RasTypeRasP1Hit" "TypeTumor.RasTypeRasPNot" > > > res_RasP_vs_RasOnly <- results(dds,"RasType_RasP_vs_RasOnly") > > res_RasP_vs_RasOnly <- results(dds,name="RasType_RasP_vs_RasOnly") > > res_Tumor_vs_Normal<-results(dds,contrast=c("Type","Tumor","Normal")) > > res_RasP_vs_RasOnly <- results(dds,contrast=c("RasType","RasP","RasOnly")) > > res_Tumor.RasP_vs_Tumor.RasPNot<-results(dds,contrast=c(0,0,0,0,0, 1,0,-1)) > I can get above contrast results working and got the DEGs lists. However, although I can get contrast result Tumor.RasP_vs_Tumor.RasPNot using contrast=c(0,0,0,0,0,1,0,-1) as guide indicated, some of desired contrasts I am not sure how to get: > I have 4 levels of RasType: "RasOnly","RasP","RasP1Hit","RasPNot", it seems that RasOnly might be in intercept, since I did not see it in show(resultsNames(dds)). And I am interested in contrast: Tumor.RasOnly_Tumor.RasPNot, how do I get the results for this contrast? To generate the contrast in tumor group between RasOnly and RasPNot you can consider what the specifications for these groups would be in the model matrix, and then subtract them to obtain your contrast. The model matrix columns are given by resultsNames, so: 1 "Intercept" 2 "Type_Tumor_vs_Normal" 3 "RasType_RasP_vs_RasOnly" 4 "RasType_RasP1Hit_vs_RasOnly" 5 "RasType_RasPNot_vs_RasOnly" 6 "TypeTumor.RasTypeRasP" 7 "TypeTumor.RasTypeRasP1Hit" 8 "TypeTumor.RasTypeRasPNot" the groups you want to compare would be then specified by the following model matrix rows: tumor and RasOnly = 1,1,0,0,0,0,0,0 tumor and RasPNot = 1,1,0,0,1,0,0,1 subtracting the first line from the second line (because you asked for "Tumor-RasOnly vs Tumor-RasPNot") gives: contrast=c(0,0,0,0,-1,0,0,-1) which can be supplied as a contrast to results() You can follow such steps to produce your contrasts of interest. Mike > my best guess is if TypeTumor.RasTypeRasOnly is at intercept, I can potentially get using below: > Res_Tumor.RasOnly_vs_Tumor.RasPNot <-results(dds, contrast=c(1,0,0,0,0,0,0,-1). > > However, I like to get the results of contrast: RasOnly.Normal_vs_RasPNot.Normal just to check normal contrast as background, then I am stuck, since no TypeNormal.RasTypeRasP contrast terms etc shown up in show(resultsNames(dds)). > > Any advice here? > > Thanks so much! > Best > Ming > > > > > > > > > > > > > > ________________________________ > From: michaelisaiahlove at gmail.com > > Date: Sun, 9 Mar 2014 13:35:34 -0400 > Subject: Re: Questions about multi-factor contrast setting in DESeq2 > To: yi02 at hotmail.com > CC: bioconductor at r-project.org > > hi Ming, > > Are you using the latest release of DESeq2, version 1.2.x? The contrast functionality was implemented in this release. > > You can check the help for ?results to debug, i.e. to see if there is a 'contrast' argument in your installed version. You can check your versions with sessionInfo(). You can install latest versions with biocLite("DESeq2"), but you might need to upgrade to the latest release version of R, see the installation help on the Bioc website. > > best > > Mike > > > On Sun, Mar 9, 2014 at 1:11 PM, Ming Yi <yi02 at="" hotmail.com=""> wrote: > > Hi, Mike: > > Thx for your advice again and I did try what you suggested as below: > > > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~Type + RasType + Type:RasType); > > > dds <- DESeq(dds); > estimating size factors > estimating dispersions > gene-wise dispersion estimates > mean-dispersion relationship > final dispersion estimates > fitting generalized linear model > 105 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > > show(resultsNames(dds)); > > [1] "Intercept" "Type_Tumor_vs_Normal" > [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" > [5] "RasType_RasPNot_vs_RasOnly" "TypeTumor.RasTypeRasP" > [7] "TypeTumor.RasTypeRasP1Hit" "TypeTumor.RasTypeRasPNot" > > certainly we can use > > res_RasP_vs_RasOnly <- results(dds,"RasType_RasP_vs_RasOnly") > > res_RasP_vs_RasOnly <- results(dds,name="RasType_RasP_vs_RasOnly") > > res_Tumor_vs_Normal<-results(dds,"Type_Tumor_vs_Normal") > > res_Tumor_vs_Normal<-results(dds,name="Type_Tumor_vs_Normal") > > but I can not do the following with contrast as suggested in section 3.2: > > res_Tumor_vs_Normal<-results(dds,contrast=c("Type","Tumor","Normal")) > Error in results(dds, contrast = c("Type", "Tumor", "Normal")) : > unused argument (contrast = c("Type", "Tumor", "Normal")) > > res_RasP_vs_RasOnly <- results(dds,contrast=c("RasType","RasP","RasOnly")) > Error in results(dds, contrast = c("RasType", "RasP", "RasOnly")) : > unused argument (contrast = c("RasType", "RasP", "RasOnly")) > > also I can not get contrast like Tumor.RasP_Tumor.RasPNot: > > res_Tumor.RasP_Tumor.RasPNot<-results(dds,contrast=c(0,0,0,0,0,1,0 ,-1)) > Error in results(dds, contrast = c(0, 0, 0, 0, 0, 1, 0, -1)) : > unused argument (contrast = c(0, 0, 0, 0, 0, 1, 0, -1)) > > it seems the interaction terms in the design (design = ~Type + RasType + Type:RasType) changed the behavior of results()? > > Any idea or advice? > Thanks again for your time and help! > > Ming > > ________________________________ > From: michaelisaiahlove at gmail.com > Date: Fri, 7 Mar 2014 19:12:23 -0500 > > Subject: Re: Questions about multi-factor contrast setting in DESeq2 > To: yi02 at hotmail.com > CC: bioconductor at r-project.org > > hi Ming, > > To follow up on the question about contrasts, the way to perform comparisons like "RasOnly.Tumor vs RasOnly.Normal", would be a design: > > Type + RasType + Type:RasType > > where: > > results(dds, contrast=c("Type","Tumor","Normal")) > > tests for the general effect, > > and then the results for the interactions -- which are present in resultsNames(dds) and can be extracted using the 'name' argument to results() -- tests for an effect in a specific RasType which is different than the general effect. > > Mike > > > On Fri, Mar 7, 2014 at 5:31 PM, Michael Love <michaelisaiahlove at="" gmail.com=""> wrote: > > Hi Ming, > > Exploratory data analysis might be a more fruitful approach here rather than brute force combinatorics and testing. > > Copying from Wolfgang's recommendation in a similar situation: > > "my advice here would be to put less emphasis on the testing and move straight to clustering, using one of the transformations described in the DESeq2 vignette to bring the data to a 'well-behaved' (log-like) scale." > > "To filter out the genes that vary not much, use the range (max-min) or IQR and a subjective cutoff (e.g. retain the top 20% of genes), then use standard clustering functions (e.g. pam from the cluster package), and other exploratory data analyses (e.g. PCA) to see the types of behaviours." > > You might also try constructing a heatmap, as shown in the vignette, using a subset of genes which vary the most, and then explore the grouping of samples in the hierarchical clustering on the columns. For ease of visualization, this subset should probably be in the 100s. > > Mike > > On Mar 7, 2014 2:59 PM, "Ming Yi" <yi02 at="" hotmail.com=""> wrote: > > Hi, Mike: > > Thx for the info, indeed I did try the following before with the following for the data in user guide: > > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ condition+type) > > dds <- DESeq(dds) > > resultsNames(dds) > [1] "Intercept" "condition_untreated_vs_treated" > [3] "type_single.read_vs_paired.end" > > As you can see, the contrast I can get here is overall untreated_vs_treated and overall single.read_vs_paired.end, there is no subtype contrast such as treated.single.read vs treated.paired-end etc. > > I would love to discuss briefly what I need here. I have a dataset which has tumors and matched normal samples from many patients, and there are subtypes of the tumors, say RasOnly, RasP, RasPNot types of tumors, of course, corresponding matched would be also with subtypes of RasOnly, RasP,RasP1Hit, RasPNot, and the metadata like this: > > Subject SampleName Type RasType RasTum > T6745_01A 49_6745 T6745_01A Tumor RasP RasP.Tumor > N6745_11A 49_6745 N6745_11A Normal RasP RasP.Normal > T6761_01A 49_6761 T6761_01A Tumor RasPNot RasPNot.Tumor > N6761_11A 49_6761 N6761_11A Normal RasPNot RasPNot.Normal > T5930_01A 50_5930 T5930_01A Tumor RasP1Hit RasP1Hit.Tumor > N5930_11A 50_5930 N5930_11A Normal RasP1Hit RasP1Hit.Normal > T5932_01A 50_5932 T5932_01A Tumor RasOnly RasOnly.Tumor > N5932_11A 50_5932 N5932_11A Normal RasOnly RasOnly.Normal > ........ > > Here are the contrasts I am interested to get DEGs: > RasOnly.Tumor vs RasOnly.Normal > RasP.Tumor vs RasP.Normal > RasP.Tumor + RasP1Hit.Tumor vs RasP.Normal+RasP1Hit.Normal > RasPNot.Tumor vs RasPNot.Normal > RasP1Hit.Tumor vs RasP1Hit.Normal > RasP.Tumor vs RasPNot.Tumor > RasP.Tumor+RasP1Hit.Tumor vs RasPNot.Tumor > RasOnly.Tumor vs RasPNot.Tumor > RasOnly.Normal vs RasPNot.Normal > RasP.Normal vs RasPNot.Normal > RasPRasP1Hit.Normal vs RasPNot.Normal, > Tumor vs Normal > > The last item Tumor vs Normal certainly can easily use design = ~ type to deal with. But many of the contrasts listed above not easy unless use the RasTum of the metadata shown above. I did try to use design=~Type+RasType > here are the commands: > > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~Type+RasType); > > show(resultsNames(dds)) > character(0) > > dds <- DESeq(dds); > > resultsNames(dds) > [1] "Intercept" "Type_Tumor_vs_Normal" > [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" > [5] "RasType_RasPNot_vs_RasOnly" > > as you can see, I can only derive overall subtype contrasts from this way but not something like RasP.Tumor vs RasOnly.Tumor, the overall subtype contrasts for example, RasType_RasP_vs_RasOnly, consider both tumor and normal of RasP compared with those of RasOnly, which is certainly not what we want here. > user guide section 3.2 did show > resCtrst <- results(ddsCtrst, contrast=c("treatment","OHT","DPN")) > resCtrst <- results(ddsCtrst, contrast=c(0,0,0,0,-1,1)) > > So besides RasTum, if you have a better way using just design = ~Type+RasType, that would be great. > I did try the following: > > > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~RasTum); > > dds <- DESeq(dds); > estimating size factors > estimating dispersions > gene-wise dispersion estimates > mean-dispersion relationship > final dispersion estimates > fitting generalized linear model > 172 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > > resultsNames(dds) > [1] "Intercept" > [2] "RasTum_RasOnly.Tumor_vs_RasOnly.Normal" > [3] "RasTum_RasP.Normal_vs_RasOnly.Normal" > [4] "RasTum_RasP.Tumor_vs_RasOnly.Normal" > [5] "RasTum_RasP1Hit.Normal_vs_RasOnly.Normal" > [6] "RasTum_RasP1Hit.Tumor_vs_RasOnly.Normal" > [7] "RasTum_RasPNot.Normal_vs_RasOnly.Normal" > [8] "RasTum_RasPNot.Tumor_vs_RasOnly.Normal" > > I did get many contrasts as I desired, but the contrasts RasTum_RasP.Tumor_vs_RasOnly.Normal does not make sense here to me, but it shown up there in the resultsNames(dds) . > > based on section 3.2, I seem be able to derived more from the above contrasts: > say: for contrast RasP.Tumor vs RasP.Normal, I can do: > resCtrst<-result(dds, contrast=c(0,0,-1,1,0,0,0,0); > > But is there any better way to do the above contrasts listed above that I desire? > > Thx again for your advice! > best > > Ming > > > > > > > > > ________________________________ > From: michaelisaiahlove at gmail.com > Date: Fri, 7 Mar 2014 13:23:56 -0500 > Subject: Re: Questions about multi-factor contrast setting in DESeq2 > To: yi02 at hotmail.com > CC: bioconductor at r-project.org > > hi Ming, > > I'm confused why you are not following the instructions in the vignette section 1.5 Multifactor designs? You should not and we do not recommend pasting together columns like this, nor inserting + 0 into the design. Please have a look at what we do recommend in this section. > > Extracting contrasts is covered in vignette section 3.2 Contrasts. First take a look at the entire vignette, as we've spent a lot of time writing the documentation to try to answer user questions. > > we are happy to discuss the best approach for your experiment, but first we need to hear more about your aims and experiment e.g. what hypotheses you wish to test, what kind of genes you are looking to find. It's hard for us to reverse engineer a recommendation rather than to go at it from basic aims. > > Mike > > > On Fri, Mar 7, 2014 at 12:52 PM, Ming Yi <yi02 at="" hotmail.com=""> wrote: > > > Hi, Mike and All: > > I am testing DESeq2 for multi-factor contrast setting for my own data with more complex meta data but currently use the simpler dataset from the user guide for testing purpose, and run into some issues that need your input and advice. Here are the commands (only show some more relevant outputs): > > library("DESeq2") > library("Biobase") > library("pasilla") > data("pasillaGenes") > countData <- counts(pasillaGenes) > colData <- pData(pasillaGenes)[,c("condition","type")] > colData<-data.frame(colData,paste(colData$condition,colData$type,sep =".")) > colnames(colData)[3]<-"condition_type"; > > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ condition_type) > > colData(dds) > DataFrame with 7 rows and 4 columns > condition type condition_type sizeFactor > <factor> <factor> <factor> <numeric> > treated1fb treated single-read treated.single-read 1.5116926 > treated2fb treated paired-end treated.paired-end 0.7843521 > treated3fb treated paired-end treated.paired-end 0.8958321 > untreated1fb untreated single-read untreated.single-read 1.0499961 > untreated2fb untreated single-read untreated.single-read 1.6585559 > untreated3fb untreated paired-end untreated.paired-end 0.7117763 > untreated4fb untreated paired-end untreated.paired-end 0.783745 > >dds <- DESeq(dds) > estimating size factors > estimating dispersions > gene-wise dispersion estimates > mean-dispersion relationship > final dispersion estimates > fitting generalized linear model > > resultsNames(dds) > [1] "Intercept" > [2] "condition_type_treated.single.read_vs_treated.paired.end" > [3] "condition_type_untreated.paired.end_vs_treated.paired.end" > [4] "condition_type_untreated.single.read_vs_treated.paired.end" > > colData(dds) > DataFrame with 7 rows and 4 columns > condition type condition_type sizeFactor > <factor> <factor> <factor> <numeric> > treated1fb treated single-read treated.single-read 1.5116926 > treated2fb treated paired-end treated.paired-end 0.7843521 > treated3fb treated paired-end treated.paired-end 0.8958321 > untreated1fb untreated single-read untreated.single-read 1.0499961 > untreated2fb untreated single-read untreated.single-read 1.6585559 > untreated3fb untreated paired-end untreated.paired-end 0.7117763 > untreated4fb untreated paired-end untreated.paired-end 0.7837458 > > Then I tried a slight diiffermry setting of the design: > > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~0+ condition_type) > > dds <- DESeq(dds) > estimating size factors > estimating dispersions > gene-wise dispersion estimates > mean-dispersion relationship > final dispersion estimates > fitting generalized linear model > 580 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > > resultsNames(dds) > [1] "condition_typetreated.paired.end" > [2] "condition_type_treated.single.read_vs_treated.paired.end" > [3] "condition_type_untreated.paired.end_vs_treated.paired.end" > [4] "condition_type_untreated.single.read_vs_treated.paired.end" > > colData(dds) > DataFrame with 7 rows and 4 columns > condition type condition_type sizeFactor > <factor> <factor> <factor> <numeric> > treated1fb treated single-read treated.single-read 1.5116926 > treated2fb treated paired-end treated.paired-end 0.7843521 > treated3fb treated paired-end treated.paired-end 0.8958321 > untreated1fb untreated single-read untreated.single-read 1.0499961 > untreated2fb untreated single-read untreated.single-read 1.6585559 > untreated3fb untreated paired-end untreated.paired-end 0.7117763 > untreated4fb untreated paired-end untreated.paired-end 0.7837458 > > Then supposedly, I can use results(dds, "condition_type_treated.single.read_vs_treated.paired.end") to get DEGs for each contrast shown in resultsNames(dds). > > here are my questions: > 1. I used design = ~0+ condition_type instead of design = ~ condition_type in 2nd case, try to skip the intercept so that I can easily get all possible contrasts, but seem not working the way I want. > 2. I tried to get all possible contrasts: but besides the contrasts shown in resultsNames(dds) in both cases, the contrasts like untreated.single.read vs treated.single.read, untreated.paired.end vs untreated.single.read not even exists in the resultsNames(dds). also I like the contrast generally like: treated (including both treated.single.read and treated.paired-end) vs untreated (including both untreated.single-read and untreated paired-end). I know for this case, we can just to design = ~condition, but I wish to do this in the same roof of one single design model although I can do a separate design. In limma and edgeR, there is a function like: con.matrix<-makeContrasts() where one can set up any contrasts under the design at will. Is there anythign like that in DESeq2? I understand we can do design(dds) <- formula(~ condition_type), but no contrast setting can be made at will. Anything in DESeq2 can get around that? > 3. Also for simple contrast, I understand one can use relevel(colData(dds)$condition,"control") kind of command to set base level, but for multiple-factors contrasts as I am after, I almost need some kind of makeContrasts() mechanism to set up contrasts at will or have to do that individually one by one, which obvioulsy would be tedious and also these contrasts won't be in a single model roof. Anything can get around like that as well? if question 2 is addressed, this one shall be no problem. > > Thanks in advance for your help! Appreciated very much! > > Best > > Ming > ATRF/NCI-Frederick, > Maryland, USA. > > > > > > > > > >
ADD REPLYlink written 3.7 years ago by Michael Love15k
Hi, Mike: Thx a lot for the input and advice, which seems a great idea. Appreciated very much! I also come up a way using combined terms of Type and RasType, which seems rather easy to derive the desired contrasts: > targets<-data.frame(targets,paste(targets$RasType, targets$Type,sep=".")); > colnames(targets)[ncol(targets)]<-"RasTum"; ... > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~RasTum); >colData(dds)$RasTum<-relevel(colData(dds)$RasTum,"RasP1Hit.Normal") > dds<-DESeq(dds, betaPrior=FALSE); estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing > show(resultsNames(dds)); [1] "Intercept" [2] "RasTum_RasOnly.Normal_vs_RasP1Hit.Normal" [3] "RasTum_RasOnly.Tumor_vs_RasP1Hit.Normal" [4] "RasTum_RasP.Normal_vs_RasP1Hit.Normal" [5] "RasTum_RasP.Tumor_vs_RasP1Hit.Normal" [6] "RasTum_RasP1Hit.Tumor_vs_RasP1Hit.Normal" [7] "RasTum_RasPNot.Normal_vs_RasP1Hit.Normal" [8] "RasTum_RasPNot.Tumor_vs_RasP1Hit.Normal" then for contrast: RasOnly.Tumor_vs_RasPNot.Tumor, res <- results(dds, contrast=c(0,0,1,0,0,0,0,-1)); for contrast: RasOnly.Normal_vs_RasPNot.Normal res <- results(dds, contrast=c(0,1,0,0,0,0,-1,0)); I will check if these two ways would give the same results. Thanks again and best Ming > From: michaelisaiahlove@gmail.com > Date: Mon, 10 Mar 2014 19:15:20 -0400 > Subject: Re: Questions about multi-factor contrast setting in DESeq2 > To: yi02@hotmail.com > CC: bioconductor@r-project.org > > hi Ming, > > > On Mon, Mar 10, 2014 at 10:27 AM, Ming Yi <yi02@hotmail.com> wrote: > > > > Hi, Mike: > > > > Thanks for the help. Now we have updated the R and bioconductor version as well as the DESeq2. Here is sessionInfo() > > R version 3.0.3 (2014-03-06) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > [1] DESeq2_1.2.10 > > > > All the function calls now seem working, But I still got some issues for the contrasts I desired; > > > > > > > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~Type + RasType + Type:RasType); > > Usage note: the following factors have 3 or more levels: > > RasType > > For DESeq2 versions < 1.3, if you plan on extracting results for > > these factors, we recommend using betaPrior=FALSE as an argument > > when calling DESeq(). > > ... > > just FYI, here are mailing list threads addressing this note in version 1.2: > http://permalink.gmane.org/gmane.science.biology.informatics.conduct or/51749 > http://permalink.gmane.org/gmane.science.biology.informatics.conduct or/52331 > > > > > > colData(dds)$RasType <- factor(colData(dds)$RasType,levels=c("Ra sOnly","RasP","RasP1Hit","RasPNot")); > > > colData(dds)$Type <- factor(colData(dds)$Type,levels=c("Normal","Tumor")) > > > dds<-DESeq(dds, betaPrior=FALSE); > > > > estimating size factors > > estimating dispersions > > gene-wise dispersion estimates > > mean-dispersion relationship > > final dispersion estimates > > fitting model and testing > > > > > show(resultsNames(dds)); > > [1] "Intercept" "Type_Tumor_vs_Normal" > > [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" > > [5] "RasType_RasPNot_vs_RasOnly" "TypeTumor.RasTypeRasP" > > [7] "TypeTumor.RasTypeRasP1Hit" "TypeTumor.RasTypeRasPNot" > > > > > res_RasP_vs_RasOnly <- results(dds,"RasType_RasP_vs_RasOnly") > > > res_RasP_vs_RasOnly <- results(dds,name="RasType_RasP_vs_RasOnly") > > > res_Tumor_vs_Normal<-results(dds,contrast=c("Type","Tumor","Normal")) > > > res_RasP_vs_RasOnly <- results(dds,contrast=c("RasType","RasP","RasOnly")) > > > res_Tumor.RasP_vs_Tumor.RasPNot<-results(dds,contrast=c(0,0,0,0, 0,1,0,-1)) > > I can get above contrast results working and got the DEGs lists. However, although I can get contrast result Tumor.RasP_vs_Tumor.RasPNot using contrast=c(0,0,0,0,0,1,0,-1) as guide indicated, some of desired contrasts I am not sure how to get: > > I have 4 levels of RasType: "RasOnly","RasP","RasP1Hit","RasPNot", it seems that RasOnly might be in intercept, since I did not see it in show(resultsNames(dds)). And I am interested in contrast: Tumor.RasOnly_Tumor.RasPNot, how do I get the results for this contrast? > > To generate the contrast in tumor group between RasOnly and RasPNot > you can consider what the specifications for these groups would be in > the model matrix, and then subtract them to obtain your contrast. > > The model matrix columns are given by resultsNames, so: > > 1 "Intercept" > 2 "Type_Tumor_vs_Normal" > 3 "RasType_RasP_vs_RasOnly" > 4 "RasType_RasP1Hit_vs_RasOnly" > 5 "RasType_RasPNot_vs_RasOnly" > 6 "TypeTumor.RasTypeRasP" > 7 "TypeTumor.RasTypeRasP1Hit" > 8 "TypeTumor.RasTypeRasPNot" > > the groups you want to compare would be then specified by the > following model matrix rows: > > tumor and RasOnly = 1,1,0,0,0,0,0,0 > tumor and RasPNot = 1,1,0,0,1,0,0,1 > > subtracting the first line from the second line (because you asked for > "Tumor-RasOnly vs Tumor-RasPNot") gives: > > contrast=c(0,0,0,0,-1,0,0,-1) > > which can be supplied as a contrast to results() > > You can follow such steps to produce your contrasts of interest. > > Mike > > > my best guess is if TypeTumor.RasTypeRasOnly is at intercept, I can potentially get using below: > > Res_Tumor.RasOnly_vs_Tumor.RasPNot <-results(dds, contrast=c(1,0,0,0,0,0,0,-1). > > > > However, I like to get the results of contrast: RasOnly.Normal_vs_RasPNot.Normal just to check normal contrast as background, then I am stuck, since no TypeNormal.RasTypeRasP contrast terms etc shown up in show(resultsNames(dds)). > > > > Any advice here? > > > > Thanks so much! > > Best > > Ming > > > > > > > > > > > > > > > > > > > > > > > > > > > > ________________________________ > > From: michaelisaiahlove@gmail.com > > > > Date: Sun, 9 Mar 2014 13:35:34 -0400 > > Subject: Re: Questions about multi-factor contrast setting in DESeq2 > > To: yi02@hotmail.com > > CC: bioconductor@r-project.org > > > > hi Ming, > > > > Are you using the latest release of DESeq2, version 1.2.x? The contrast functionality was implemented in this release. > > > > You can check the help for ?results to debug, i.e. to see if there is a 'contrast' argument in your installed version. You can check your versions with sessionInfo(). You can install latest versions with biocLite("DESeq2"), but you might need to upgrade to the latest release version of R, see the installation help on the Bioc website. > > > > best > > > > Mike > > > > > > On Sun, Mar 9, 2014 at 1:11 PM, Ming Yi <yi02@hotmail.com> wrote: > > > > Hi, Mike: > > > > Thx for your advice again and I did try what you suggested as below: > > > > > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~Type + RasType + Type:RasType); > > > > > dds <- DESeq(dds); > > estimating size factors > > estimating dispersions > > gene-wise dispersion estimates > > mean-dispersion relationship > > final dispersion estimates > > fitting generalized linear model > > 105 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > > > show(resultsNames(dds)); > > > > [1] "Intercept" "Type_Tumor_vs_Normal" > > [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" > > [5] "RasType_RasPNot_vs_RasOnly" "TypeTumor.RasTypeRasP" > > [7] "TypeTumor.RasTypeRasP1Hit" "TypeTumor.RasTypeRasPNot" > > > > certainly we can use > > > res_RasP_vs_RasOnly <- results(dds,"RasType_RasP_vs_RasOnly") > > > res_RasP_vs_RasOnly <- results(dds,name="RasType_RasP_vs_RasOnly") > > > res_Tumor_vs_Normal<-results(dds,"Type_Tumor_vs_Normal") > > > res_Tumor_vs_Normal<-results(dds,name="Type_Tumor_vs_Normal") > > > > but I can not do the following with contrast as suggested in section 3.2: > > > res_Tumor_vs_Normal<-results(dds,contrast=c("Type","Tumor","Normal")) > > Error in results(dds, contrast = c("Type", "Tumor", "Normal")) : > > unused argument (contrast = c("Type", "Tumor", "Normal")) > > > res_RasP_vs_RasOnly <- results(dds,contrast=c("RasType","RasP","RasOnly")) > > Error in results(dds, contrast = c("RasType", "RasP", "RasOnly")) : > > unused argument (contrast = c("RasType", "RasP", "RasOnly")) > > > > also I can not get contrast like Tumor.RasP_Tumor.RasPNot: > > > res_Tumor.RasP_Tumor.RasPNot<-results(dds,contrast=c(0,0,0,0,0,1 ,0,-1)) > > Error in results(dds, contrast = c(0, 0, 0, 0, 0, 1, 0, -1)) : > > unused argument (contrast = c(0, 0, 0, 0, 0, 1, 0, -1)) > > > > it seems the interaction terms in the design (design = ~Type + RasType + Type:RasType) changed the behavior of results()? > > > > Any idea or advice? > > Thanks again for your time and help! > > > > Ming > > > > ________________________________ > > From: michaelisaiahlove@gmail.com > > Date: Fri, 7 Mar 2014 19:12:23 -0500 > > > > Subject: Re: Questions about multi-factor contrast setting in DESeq2 > > To: yi02@hotmail.com > > CC: bioconductor@r-project.org > > > > hi Ming, > > > > To follow up on the question about contrasts, the way to perform comparisons like "RasOnly.Tumor vs RasOnly.Normal", would be a design: > > > > Type + RasType + Type:RasType > > > > where: > > > > results(dds, contrast=c("Type","Tumor","Normal")) > > > > tests for the general effect, > > > > and then the results for the interactions -- which are present in resultsNames(dds) and can be extracted using the 'name' argument to results() -- tests for an effect in a specific RasType which is different than the general effect. > > > > Mike > > > > > > On Fri, Mar 7, 2014 at 5:31 PM, Michael Love <michaelisaiahlove@gmail.com> wrote: > > > > Hi Ming, > > > > Exploratory data analysis might be a more fruitful approach here rather than brute force combinatorics and testing. > > > > Copying from Wolfgang's recommendation in a similar situation: > > > > "my advice here would be to put less emphasis on the testing and move straight to clustering, using one of the transformations described in the DESeq2 vignette to bring the data to a 'well-behaved' (log-like) scale." > > > > "To filter out the genes that vary not much, use the range (max- min) or IQR and a subjective cutoff (e.g. retain the top 20% of genes), then use standard clustering functions (e.g. pam from the cluster package), and other exploratory data analyses (e.g. PCA) to see the types of behaviours." > > > > You might also try constructing a heatmap, as shown in the vignette, using a subset of genes which vary the most, and then explore the grouping of samples in the hierarchical clustering on the columns. For ease of visualization, this subset should probably be in the 100s. > > > > Mike > > > > On Mar 7, 2014 2:59 PM, "Ming Yi" <yi02@hotmail.com> wrote: > > > > Hi, Mike: > > > > Thx for the info, indeed I did try the following before with the following for the data in user guide: > > > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ condition+type) > > > dds <- DESeq(dds) > > > resultsNames(dds) > > [1] "Intercept" "condition_untreated_vs_treated" > > [3] "type_single.read_vs_paired.end" > > > > As you can see, the contrast I can get here is overall untreated_vs_treated and overall single.read_vs_paired.end, there is no subtype contrast such as treated.single.read vs treated.paired-end etc. > > > > I would love to discuss briefly what I need here. I have a dataset which has tumors and matched normal samples from many patients, and there are subtypes of the tumors, say RasOnly, RasP, RasPNot types of tumors, of course, corresponding matched would be also with subtypes of RasOnly, RasP,RasP1Hit, RasPNot, and the metadata like this: > > > > Subject SampleName Type RasType RasTum > > T6745_01A 49_6745 T6745_01A Tumor RasP RasP.Tumor > > N6745_11A 49_6745 N6745_11A Normal RasP RasP.Normal > > T6761_01A 49_6761 T6761_01A Tumor RasPNot RasPNot.Tumor > > N6761_11A 49_6761 N6761_11A Normal RasPNot RasPNot.Normal > > T5930_01A 50_5930 T5930_01A Tumor RasP1Hit RasP1Hit.Tumor > > N5930_11A 50_5930 N5930_11A Normal RasP1Hit RasP1Hit.Normal > > T5932_01A 50_5932 T5932_01A Tumor RasOnly RasOnly.Tumor > > N5932_11A 50_5932 N5932_11A Normal RasOnly RasOnly.Normal > > ........ > > > > Here are the contrasts I am interested to get DEGs: > > RasOnly.Tumor vs RasOnly.Normal > > RasP.Tumor vs RasP.Normal > > RasP.Tumor + RasP1Hit.Tumor vs RasP.Normal+RasP1Hit.Normal > > RasPNot.Tumor vs RasPNot.Normal > > RasP1Hit.Tumor vs RasP1Hit.Normal > > RasP.Tumor vs RasPNot.Tumor > > RasP.Tumor+RasP1Hit.Tumor vs RasPNot.Tumor > > RasOnly.Tumor vs RasPNot.Tumor > > RasOnly.Normal vs RasPNot.Normal > > RasP.Normal vs RasPNot.Normal > > RasPRasP1Hit.Normal vs RasPNot.Normal, > > Tumor vs Normal > > > > The last item Tumor vs Normal certainly can easily use design = ~ type to deal with. But many of the contrasts listed above not easy unless use the RasTum of the metadata shown above. I did try to use design=~Type+RasType > > here are the commands: > > > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~Type+RasType); > > > show(resultsNames(dds)) > > character(0) > > > dds <- DESeq(dds); > > > resultsNames(dds) > > [1] "Intercept" "Type_Tumor_vs_Normal" > > [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" > > [5] "RasType_RasPNot_vs_RasOnly" > > > > as you can see, I can only derive overall subtype contrasts from this way but not something like RasP.Tumor vs RasOnly.Tumor, the overall subtype contrasts for example, RasType_RasP_vs_RasOnly, consider both tumor and normal of RasP compared with those of RasOnly, which is certainly not what we want here. > > user guide section 3.2 did show > > resCtrst <- results(ddsCtrst, contrast=c("treatment","OHT","DPN")) > > resCtrst <- results(ddsCtrst, contrast=c(0,0,0,0,-1,1)) > > > > So besides RasTum, if you have a better way using just design = ~Type+RasType, that would be great. > > I did try the following: > > > > > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~RasTum); > > > dds <- DESeq(dds); > > estimating size factors > > estimating dispersions > > gene-wise dispersion estimates > > mean-dispersion relationship > > final dispersion estimates > > fitting generalized linear model > > 172 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > > > resultsNames(dds) > > [1] "Intercept" > > [2] "RasTum_RasOnly.Tumor_vs_RasOnly.Normal" > > [3] "RasTum_RasP.Normal_vs_RasOnly.Normal" > > [4] "RasTum_RasP.Tumor_vs_RasOnly.Normal" > > [5] "RasTum_RasP1Hit.Normal_vs_RasOnly.Normal" > > [6] "RasTum_RasP1Hit.Tumor_vs_RasOnly.Normal" > > [7] "RasTum_RasPNot.Normal_vs_RasOnly.Normal" > > [8] "RasTum_RasPNot.Tumor_vs_RasOnly.Normal" > > > > I did get many contrasts as I desired, but the contrasts RasTum_RasP.Tumor_vs_RasOnly.Normal does not make sense here to me, but it shown up there in the resultsNames(dds) . > > > > based on section 3.2, I seem be able to derived more from the above contrasts: > > say: for contrast RasP.Tumor vs RasP.Normal, I can do: > > resCtrst<-result(dds, contrast=c(0,0,-1,1,0,0,0,0); > > > > But is there any better way to do the above contrasts listed above that I desire? > > > > Thx again for your advice! > > best > > > > Ming > > > > > > > > > > > > > > > > > > ________________________________ > > From: michaelisaiahlove@gmail.com > > Date: Fri, 7 Mar 2014 13:23:56 -0500 > > Subject: Re: Questions about multi-factor contrast setting in DESeq2 > > To: yi02@hotmail.com > > CC: bioconductor@r-project.org > > > > hi Ming, > > > > I'm confused why you are not following the instructions in the vignette section 1.5 Multifactor designs? You should not and we do not recommend pasting together columns like this, nor inserting + 0 into the design. Please have a look at what we do recommend in this section. > > > > Extracting contrasts is covered in vignette section 3.2 Contrasts. First take a look at the entire vignette, as we've spent a lot of time writing the documentation to try to answer user questions. > > > > we are happy to discuss the best approach for your experiment, but first we need to hear more about your aims and experiment e.g. what hypotheses you wish to test, what kind of genes you are looking to find. It's hard for us to reverse engineer a recommendation rather than to go at it from basic aims. > > > > Mike > > > > > > On Fri, Mar 7, 2014 at 12:52 PM, Ming Yi <yi02@hotmail.com> wrote: > > > > > > Hi, Mike and All: > > > > I am testing DESeq2 for multi-factor contrast setting for my own data with more complex meta data but currently use the simpler dataset from the user guide for testing purpose, and run into some issues that need your input and advice. Here are the commands (only show some more relevant outputs): > > > > library("DESeq2") > > library("Biobase") > > library("pasilla") > > data("pasillaGenes") > > countData <- counts(pasillaGenes) > > colData <- pData(pasillaGenes)[,c("condition","type")] > > colData<-data.frame(colData,paste(colData$condition,colData$type,s ep=".")) > > colnames(colData)[3]<-"condition_type"; > > > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ condition_type) > > > colData(dds) > > DataFrame with 7 rows and 4 columns > > condition type condition_type sizeFactor > > <factor> <factor> <factor> <numeric> > > treated1fb treated single-read treated.single-read 1.5116926 > > treated2fb treated paired-end treated.paired-end 0.7843521 > > treated3fb treated paired-end treated.paired-end 0.8958321 > > untreated1fb untreated single-read untreated.single-read 1.0499961 > > untreated2fb untreated single-read untreated.single-read 1.6585559 > > untreated3fb untreated paired-end untreated.paired-end 0.7117763 > > untreated4fb untreated paired-end untreated.paired-end 0.783745 > > >dds <- DESeq(dds) > > estimating size factors > > estimating dispersions > > gene-wise dispersion estimates > > mean-dispersion relationship > > final dispersion estimates > > fitting generalized linear model > > > resultsNames(dds) > > [1] "Intercept" > > [2] "condition_type_treated.single.read_vs_treated.paired.end" > > [3] "condition_type_untreated.paired.end_vs_treated.paired.end" > > [4] "condition_type_untreated.single.read_vs_treated.paired.end" > > > colData(dds) > > DataFrame with 7 rows and 4 columns > > condition type condition_type sizeFactor > > <factor> <factor> <factor> <numeric> > > treated1fb treated single-read treated.single-read 1.5116926 > > treated2fb treated paired-end treated.paired-end 0.7843521 > > treated3fb treated paired-end treated.paired-end 0.8958321 > > untreated1fb untreated single-read untreated.single-read 1.0499961 > > untreated2fb untreated single-read untreated.single-read 1.6585559 > > untreated3fb untreated paired-end untreated.paired-end 0.7117763 > > untreated4fb untreated paired-end untreated.paired-end 0.7837458 > > > > Then I tried a slight diiffermry setting of the design: > > > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~0+ condition_type) > > > dds <- DESeq(dds) > > estimating size factors > > estimating dispersions > > gene-wise dispersion estimates > > mean-dispersion relationship > > final dispersion estimates > > fitting generalized linear model > > 580 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > > > resultsNames(dds) > > [1] "condition_typetreated.paired.end" > > [2] "condition_type_treated.single.read_vs_treated.paired.end" > > [3] "condition_type_untreated.paired.end_vs_treated.paired.end" > > [4] "condition_type_untreated.single.read_vs_treated.paired.end" > > > colData(dds) > > DataFrame with 7 rows and 4 columns > > condition type condition_type sizeFactor > > <factor> <factor> <factor> <numeric> > > treated1fb treated single-read treated.single-read 1.5116926 > > treated2fb treated paired-end treated.paired-end 0.7843521 > > treated3fb treated paired-end treated.paired-end 0.8958321 > > untreated1fb untreated single-read untreated.single-read 1.0499961 > > untreated2fb untreated single-read untreated.single-read 1.6585559 > > untreated3fb untreated paired-end untreated.paired-end 0.7117763 > > untreated4fb untreated paired-end untreated.paired-end 0.7837458 > > > > Then supposedly, I can use results(dds, "condition_type_treated.single.read_vs_treated.paired.end") to get DEGs for each contrast shown in resultsNames(dds). > > > > here are my questions: > > 1. I used design = ~0+ condition_type instead of design = ~ condition_type in 2nd case, try to skip the intercept so that I can easily get all possible contrasts, but seem not working the way I want. > > 2. I tried to get all possible contrasts: but besides the contrasts shown in resultsNames(dds) in both cases, the contrasts like untreated.single.read vs treated.single.read, untreated.paired.end vs untreated.single.read not even exists in the resultsNames(dds). also I like the contrast generally like: treated (including both treated.single.read and treated.paired-end) vs untreated (including both untreated.single-read and untreated paired-end). I know for this case, we can just to design = ~condition, but I wish to do this in the same roof of one single design model although I can do a separate design. In limma and edgeR, there is a function like: con.matrix<-makeContrasts() where one can set up any contrasts under the design at will. Is there anythign like that in DESeq2? I understand we can do design(dds) <- formula(~ condition_type), but no contrast setting can be made at will. Anything in DESeq2 can get around that? > > 3. Also for simple contrast, I understand one can use relevel(colData(dds)$condition,"control") kind of command to set base level, but for multiple-factors contrasts as I am after, I almost need some kind of makeContrasts() mechanism to set up contrasts at will or have to do that individually one by one, which obvioulsy would be tedious and also these contrasts won't be in a single model roof. Anything can get around like that as well? if question 2 is addressed, this one shall be no problem. > > > > Thanks in advance for your help! Appreciated very much! > > > > Best > > > > Ming > > ATRF/NCI-Frederick, > > Maryland, USA. > > > > > > > > > > > > > > > > > > > > [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Ming Yi340
On Sun, Mar 9, 2014 at 1:11 PM, Ming Yi <yi02@hotmail.com> wrote: > Hi, Mike: > > Thx for your advice again and I did try what you suggested as below: > > > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = > ~Type + RasType + Type:RasType); > > dds <- DESeq(dds); > estimating size factors > estimating dispersions > gene-wise dispersion estimates > mean-dispersion relationship > final dispersion estimates > fitting generalized linear model > 105 rows did not converge in beta, labelled in mcols(object)$betaConv. Use > larger maxit argument with nbinomWaldTest > > show(resultsNames(dds)); > [1] "Intercept" "Type_Tumor_vs_Normal" > [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" > [5] "RasType_RasPNot_vs_RasOnly" "TypeTumor.RasTypeRasP" > [7] "TypeTumor.RasTypeRasP1Hit" "TypeTumor.RasTypeRasPNot" > > certainly we can use > > res_RasP_vs_RasOnly <- results(dds,"RasType_RasP_vs_RasOnly") > > res_RasP_vs_RasOnly <- results(dds,name="RasType_RasP_vs_RasOnly") > > res_Tumor_vs_Normal<-results(dds,"Type_Tumor_vs_Normal") > > res_Tumor_vs_Normal<-results(dds,name="Type_Tumor_vs_Normal") > > but I can not do the following with contrast as suggested in section 3.2: > > res_Tumor_vs_Normal<-results(dds,contrast=c("Type","Tumor","Normal")) > Error in results(dds, contrast = c("Type", "Tumor", "Normal")) : > unused argument (contrast = c("Type", "Tumor", "Normal")) > > res_RasP_vs_RasOnly <- > results(dds,contrast=c("RasType","RasP","RasOnly")) > Error in results(dds, contrast = c("RasType", "RasP", "RasOnly")) : > unused argument (contrast = c("RasType", "RasP", "RasOnly")) > > also I can not get contrast like Tumor.RasP_Tumor.RasPNot: > > res_Tumor.RasP_Tumor.RasPNot<-results(dds,contrast=c(0,0,0,0,0,1,0 ,-1)) > Error in results(dds, contrast = c(0, 0, 0, 0, 0, 1, 0, -1)) : > unused argument (contrast = c(0, 0, 0, 0, 0, 1, 0, -1)) > > it seems the interaction terms in the design (design = ~Type + RasType + > Type:RasType) changed the behavior of results()? > > Hi, Ming. Be sure to include the output of sessionInfo() when you report errors. The information from that command tells us what version of R and OS you are using and what packages and package versions you have loaded. Sean [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Sean Davis21k
Hi, Sean: Thx for the reminder. I usually included those info, just too many info for a single email. And turns out to be the key this time. Thx and best Ming Date: Sun, 9 Mar 2014 16:29:01 -0400 Subject: Re: [BioC] Questions about multi-factor contrast setting in DESeq2 From: sdavis2@mail.nih.gov To: yi02@hotmail.com CC: michaelisaiahlove@gmail.com; bioconductor@r-project.org On Sun, Mar 9, 2014 at 1:11 PM, Ming Yi <yi02@hotmail.com> wrote: Hi, Mike: Thx for your advice again and I did try what you suggested as below: > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~Type + RasType + Type:RasType); > dds <- DESeq(dds); estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model 105 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > show(resultsNames(dds)); [1] "Intercept" "Type_Tumor_vs_Normal" [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" [5] "RasType_RasPNot_vs_RasOnly" "TypeTumor.RasTypeRasP" [7] "TypeTumor.RasTypeRasP1Hit" "TypeTumor.RasTypeRasPNot" certainly we can use > res_RasP_vs_RasOnly <- results(dds,"RasType_RasP_vs_RasOnly") > res_RasP_vs_RasOnly <- results(dds,name="RasType_RasP_vs_RasOnly") > res_Tumor_vs_Normal<-results(dds,"Type_Tumor_vs_Normal") > res_Tumor_vs_Normal<-results(dds,name="Type_Tumor_vs_Normal") but I can not do the following with contrast as suggested in section 3.2: > res_Tumor_vs_Normal<-results(dds,contrast=c("Type","Tumor","Normal")) Error in results(dds, contrast = c("Type", "Tumor", "Normal")) : unused argument (contrast = c("Type", "Tumor", "Normal")) > res_RasP_vs_RasOnly <- results(dds,contrast=c("RasType","RasP","RasOnly")) Error in results(dds, contrast = c("RasType", "RasP", "RasOnly")) : unused argument (contrast = c("RasType", "RasP", "RasOnly")) also I can not get contrast like Tumor.RasP_Tumor.RasPNot: > res_Tumor.RasP_Tumor.RasPNot<-results(dds,contrast=c(0,0,0,0,0,1,0,- 1)) Error in results(dds, contrast = c(0, 0, 0, 0, 0, 1, 0, -1)) : unused argument (contrast = c(0, 0, 0, 0, 0, 1, 0, -1)) it seems the interaction terms in the design (design = ~Type + RasType + Type:RasType) changed the behavior of results()? Hi, Ming. Be sure to include the output of sessionInfo() when you report errors. The information from that command tells us what version of R and OS you are using and what packages and package versions you have loaded. Sean [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Ming Yi340
Hi, Mike: Thx for the info and advice. I generally agreed that what you suggested is one of the many ways to explore and then diasect the data for genes that behaved in certain ways. But what I am looking for here is more of non-biased approach looking for a certain refined contrasts and then on top of that, we can explore further those genes. another reason is: in limma or edgeR, it seems relatively easy to set contrasts like what I desired using makeContrasts() function, and so I thought it shall be straightforward to do the same in DESeq2 as well. Any way, Thanks again for your input and advice! appreciated very much! best Ming Date: Fri, 7 Mar 2014 17:31:26 -0500 Subject: RE: Questions about multi-factor contrast setting in DESeq2 From: michaelisaiahlove@gmail.com To: yi02@hotmail.com CC: bioconductor@r-project.org Hi Ming, Exploratory data analysis might be a more fruitful approach here rather than brute force combinatorics and testing. Copying from Wolfgang's recommendation in a similar situation: "my advice here would be to put less emphasis on the testing and move straight to clustering, using one of the transformations described in the DESeq2 vignette to bring the data to a 'well-behaved' (log-like) scale." "To filter out the genes that vary not much, use the range (max-min) or IQR and a subjective cutoff (e.g. retain the top 20% of genes), then use standard clustering functions (e.g. pam from the cluster package), and other exploratory data analyses (e.g. PCA) to see the types of behaviours." You might also try constructing a heatmap, as shown in the vignette, using a subset of genes which vary the most, and then explore the grouping of samples in the hierarchical clustering on the columns. For ease of visualization, this subset should probably be in the 100s. Mike On Mar 7, 2014 2:59 PM, "Ming Yi" <yi02@hotmail.com> wrote: Hi, Mike: Thx for the info, indeed I did try the following before with the following for the data in user guide: > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ condition+type) > dds <- DESeq(dds) > resultsNames(dds) [1] "Intercept" "condition_untreated_vs_treated" [3] "type_single.read_vs_paired.end" As you can see, the contrast I can get here is overall untreated_vs_treated and overall single.read_vs_paired.end, there is no subtype contrast such as treated.single.read vs treated.paired-end etc. I would love to discuss briefly what I need here. I have a dataset which has tumors and matched normal samples from many patients, and there are subtypes of the tumors, say RasOnly, RasP, RasPNot types of tumors, of course, corresponding matched would be also with subtypes of RasOnly, RasP,RasP1Hit, RasPNot, and the metadata like this: Subject SampleName Type RasType RasTum T6745_01A 49_6745 T6745_01A Tumor RasP RasP.Tumor N6745_11A 49_6745 N6745_11A Normal RasP RasP.Normal T6761_01A 49_6761 T6761_01A Tumor RasPNot RasPNot.Tumor N6761_11A 49_6761 N6761_11A Normal RasPNot RasPNot.Normal T5930_01A 50_5930 T5930_01A Tumor RasP1Hit RasP1Hit.Tumor N5930_11A 50_5930 N5930_11A Normal RasP1Hit RasP1Hit.Normal T5932_01A 50_5932 T5932_01A Tumor RasOnly RasOnly.Tumor N5932_11A 50_5932 N5932_11A Normal RasOnly RasOnly.Normal ........ Here are the contrasts I am interested to get DEGs: RasOnly.Tumor vs RasOnly.Normal RasP.Tumor vs RasP.Normal RasP.Tumor + RasP1Hit.Tumor vs RasP.Normal+RasP1Hit.Normal RasPNot.Tumor vs RasPNot.Normal RasP1Hit.Tumor vs RasP1Hit.Normal RasP.Tumor vs RasPNot.Tumor RasP.Tumor+RasP1Hit.Tumor vs RasPNot.Tumor RasOnly.Tumor vs RasPNot.Tumor RasOnly.Normal vs RasPNot.Normal RasP.Normal vs RasPNot.Normal RasPRasP1Hit.Normal vs RasPNot.Normal, Tumor vs Normal The last item Tumor vs Normal certainly can easily use design = ~ type to deal with. But many of the contrasts listed above not easy unless use the RasTum of the metadata shown above. I did try to use design=~Type+RasType here are the commands: > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~Type+RasType); > show(resultsNames(dds)) character(0) > dds <- DESeq(dds); > resultsNames(dds) [1] "Intercept" "Type_Tumor_vs_Normal" [3] "RasType_RasP_vs_RasOnly" "RasType_RasP1Hit_vs_RasOnly" [5] "RasType_RasPNot_vs_RasOnly" as you can see, I can only derive overall subtype contrasts from this way but not something like RasP.Tumor vs RasOnly.Tumor, the overall subtype contrasts for example, RasType_RasP_vs_RasOnly, consider both tumor and normal of RasP compared with those of RasOnly, which is certainly not what we want here. user guide section 3.2 did show resCtrst <- results(ddsCtrst, contrast=c("treatment","OHT","DPN")) resCtrst <- results(ddsCtrst, contrast=c(0,0,0,0,-1,1)) So besides RasTum, if you have a better way using just design = ~Type+RasType, that would be great. I did try the following: > dds <- DESeqDataSetFromMatrix(countData = countD,colData = colD,design = ~RasTum); > dds <- DESeq(dds); estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model 172 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > resultsNames(dds) [1] "Intercept" [2] "RasTum_RasOnly.Tumor_vs_RasOnly.Normal" [3] "RasTum_RasP.Normal_vs_RasOnly.Normal" [4] "RasTum_RasP.Tumor_vs_RasOnly.Normal" [5] "RasTum_RasP1Hit.Normal_vs_RasOnly.Normal" [6] "RasTum_RasP1Hit.Tumor_vs_RasOnly.Normal" [7] "RasTum_RasPNot.Normal_vs_RasOnly.Normal" [8] "RasTum_RasPNot.Tumor_vs_RasOnly.Normal" I did get many contrasts as I desired, but the contrasts RasTum_RasP.Tumor_vs_RasOnly.Normal does not make sense here to me, but it shown up there in the resultsNames(dds) . based on section 3.2, I seem be able to derived more from the above contrasts: say: for contrast RasP.Tumor vs RasP.Normal, I can do: resCtrst<-result(dds, contrast=c(0,0,-1,1,0,0,0,0); But is there any better way to do the above contrasts listed above that I desire? Thx again for your advice! best Ming From: michaelisaiahlove@gmail.com Date: Fri, 7 Mar 2014 13:23:56 -0500 Subject: Re: Questions about multi-factor contrast setting in DESeq2 To: yi02@hotmail.com CC: bioconductor@r-project.org hi Ming, I'm confused why you are not following the instructions in the vignette section 1.5 Multifactor designs? You should not and we do not recommend pasting together columns like this, nor inserting + 0 into the design. Please have a look at what we do recommend in this section. Extracting contrasts is covered in vignette section 3.2 Contrasts. First take a look at the entire vignette, as we've spent a lot of time writing the documentation to try to answer user questions. we are happy to discuss the best approach for your experiment, but first we need to hear more about your aims and experiment e.g. what hypotheses you wish to test, what kind of genes you are looking to find. It's hard for us to reverse engineer a recommendation rather than to go at it from basic aims. Mike On Fri, Mar 7, 2014 at 12:52 PM, Ming Yi <yi02@hotmail.com> wrote: Hi, Mike and All: I am testing DESeq2 for multi-factor contrast setting for my own data with more complex meta data but currently use the simpler dataset from the user guide for testing purpose, and run into some issues that need your input and advice. Here are the commands (only show some more relevant outputs): library("DESeq2") library("Biobase") library("pasilla") data("pasillaGenes") countData <- counts(pasillaGenes) colData <- pData(pasillaGenes)[,c("condition","type")] colData<-data.frame(colData,paste(colData$condition,colData$type,sep=" .")) colnames(colData)[3]<-"condition_type"; > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ condition_type) > colData(dds) DataFrame with 7 rows and 4 columns condition type condition_type sizeFactor <factor> <factor> <factor> <numeric> treated1fb treated single-read treated.single-read 1.5116926 treated2fb treated paired-end treated.paired-end 0.7843521 treated3fb treated paired-end treated.paired-end 0.8958321 untreated1fb untreated single-read untreated.single-read 1.0499961 untreated2fb untreated single-read untreated.single-read 1.6585559 untreated3fb untreated paired-end untreated.paired-end 0.7117763 untreated4fb untreated paired-end untreated.paired-end 0.783745 >dds <- DESeq(dds) estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model > resultsNames(dds) [1] "Intercept" [2] "condition_type_treated.single.read_vs_treated.paired.end" [3] "condition_type_untreated.paired.end_vs_treated.paired.end" [4] "condition_type_untreated.single.read_vs_treated.paired.end" > colData(dds) DataFrame with 7 rows and 4 columns condition type condition_type sizeFactor <factor> <factor> <factor> <numeric> treated1fb treated single-read treated.single-read 1.5116926 treated2fb treated paired-end treated.paired-end 0.7843521 treated3fb treated paired-end treated.paired-end 0.8958321 untreated1fb untreated single-read untreated.single-read 1.0499961 untreated2fb untreated single-read untreated.single-read 1.6585559 untreated3fb untreated paired-end untreated.paired-end 0.7117763 untreated4fb untreated paired-end untreated.paired-end 0.7837458 Then I tried a slight diiffermry setting of the design: > dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~0+ condition_type) > dds <- DESeq(dds) estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting generalized linear model 580 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest > resultsNames(dds) [1] "condition_typetreated.paired.end" [2] "condition_type_treated.single.read_vs_treated.paired.end" [3] "condition_type_untreated.paired.end_vs_treated.paired.end" [4] "condition_type_untreated.single.read_vs_treated.paired.end" > colData(dds) DataFrame with 7 rows and 4 columns condition type condition_type sizeFactor <factor> <factor> <factor> <numeric> treated1fb treated single-read treated.single-read 1.5116926 treated2fb treated paired-end treated.paired-end 0.7843521 treated3fb treated paired-end treated.paired-end 0.8958321 untreated1fb untreated single-read untreated.single-read 1.0499961 untreated2fb untreated single-read untreated.single-read 1.6585559 untreated3fb untreated paired-end untreated.paired-end 0.7117763 untreated4fb untreated paired-end untreated.paired-end 0.7837458 Then supposedly, I can use results(dds, "condition_type_treated.single.read_vs_treated.paired.end") to get DEGs for each contrast shown in resultsNames(dds). here are my questions: 1. I used design = ~0+ condition_type instead of design = ~ condition_type in 2nd case, try to skip the intercept so that I can easily get all possible contrasts, but seem not working the way I want. 2. I tried to get all possible contrasts: but besides the contrasts shown in resultsNames(dds) in both cases, the contrasts like untreated.single.read vs treated.single.read, untreated.paired.end vs untreated.single.read not even exists in the resultsNames(dds). also I like the contrast generally like: treated (including both treated.single.read and treated.paired-end) vs untreated (including both untreated.single-read and untreated paired-end). I know for this case, we can just to design = ~condition, but I wish to do this in the same roof of one single design model although I can do a separate design. In limma and edgeR, there is a function like: con.matrix<-makeContrasts() where one can set up any contrasts under the design at will. Is there anythign like that in DESeq2? I understand we can do design(dds) <- formula(~ condition_type), but no contrast setting can be made at will. Anything in DESeq2 can get around that? 3. Also for simple contrast, I understand one can use relevel(colData(dds)$condition,"control") kind of command to set base level, but for multiple-factors contrasts as I am after, I almost need some kind of makeContrasts() mechanism to set up contrasts at will or have to do that individually one by one, which obvioulsy would be tedious and also these contrasts won't be in a single model roof. Anything can get around like that as well? if question 2 is addressed, this one shall be no problem. Thanks in advance for your help! Appreciated very much! Best Ming ATRF/NCI-Frederick, Maryland, USA. [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Ming Yi340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 272 users visited in the last hour