DESeq2 basemean and log2foldchange

0

Entering edit mode

rob yang ▴ 50

@rob-yang-6681

Last seen 9.7 years ago

Hello DESeq community, I wanted to ask how basemean and log2foldchange are calculated. 1) From mcol(mcol(dds_fit)), the basemean is denoted as "basemean over all rows". But I can't pin down what "all rows" this means. For a single transcript, my manual mean calculation of all counts across all replicates, all conditions do not equal to the DESeq reported basemean for a single transcript. 2) log2foldchange seems to be changing with the contrasting variable. This is a little surprising since I'd expect log2foldchange to be a static number when comparing two conditions, and the contrasting variables only dictate testing for significance, ie., p-value. Thank you in advance. -Rob [[alternative HTML version deleted]]

DESeq DESeq • 12k views

ADD COMMENT • link updated 9.8 years ago by Michael Love 42k • written 9.8 years ago by rob yang ▴ 50

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 14 months ago

United States

Hi, On Fri, Aug 1, 2014 at 11:20 AM, rob yang <nextgame at="" hotmail.com=""> wrote: > Hello DESeq community, > > I wanted to ask how basemean and log2foldchange are calculated. > > 1) From mcol(mcol(dds_fit)), the basemean is denoted as "basemean over all rows". But I can't pin down what "all rows" this means. I reckon it should really say "columns" instead of "rows", but they use the "rowMeans" and "rowVars" functions, so ... perhaps some mismatch in thinking. Anyway, you got the right idea. > For a single transcript, my manual mean calculation of all counts across all replicates, all conditions do not equal to the DESeq reported basemean for a single transcript. You'd have to show us what your manual calculations are. Take a look at the `DESeq2:::getBaseMeansAndVariances` function to see how they do it. You will see that it's really just the rowMeans of all the counts, ie: baseMean = unname(rowMeans(counts(object, normalized = TRUE))) > 2) log2foldchange seems to be changing with the contrasting variable. This is a little surprising since I'd expect log2foldchange to be a static number when comparing two conditions, and the contrasting variables only dictate testing for significance, ie., p-value. What version of DESeq2 are you using? If I recall correctly, there was an issue in changing logFC estimates when factors were releveled, but I think this issue has been fixed (or highly mitigated, at least) in recent versions -- if you are running the current version of DESeq2, I think this problem should be gone.. HTH, -steve -- Steve Lianoglou Computational Biologist Genentech

ADD COMMENT • link 9.8 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Michael Love 42k

@mikelove

Last seen 13 hours ago

United States

Hi Rob, On Aug 1, 2014 2:20 PM, "rob yang" <nextgame@hotmail.com> wrote: > > Hello DESeq community, > > I wanted to ask how basemean and log2foldchange are calculated. > > 1) From mcol(mcol(dds_fit)), the basemean is denoted as "basemean over all rows". But I can't pin down what "all rows" this means. For a single transcript, my manual mean calculation of all counts across all replicates, all conditions do not equal to the DESeq reported basemean for a single transcript. > Yes that is kind of funny, I should change it to say "columns" or "samples". It is just as Steve described, the mean of normalized counts for all samples in the dds. > 2) log2foldchange seems to be changing with the contrasting variable. This is a little surprising since I'd expect log2foldchange to be a static number when comparing two conditions, and the contrasting variables only dictate testing for significance, ie., p-value. > I don't follow. Can you say more what you mean by changing with contrasting variable Mike > Thank you in advance. > > -Rob > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD COMMENT • link 9.8 years ago Michael Love 42k

0

Entering edit mode

Thanks Steve and Mike. Ah, yes, thanks for the clarification that it is the mean of the NORMALIZED counts, it all makes sense now. I was comparing to the mean of raw counts. I guess- interpret the the "rows" being samples, but yes the word "samples" clarifies things definitively. After thinking a bit on the LFC question, I think I have a logical error in constructing the contrast, and would like to get some help. My design is 2 cell lines (C1, C2) and 4 treatments (T1,T2,T3,TVeh). ---My intention is to compare C1.T1 vs. C2.T1, and ask for the cell- specific effect of treatment T1--- Normally this is broken down into two steps (below) but I wondered about 1-step comparison using interaction terms: 1) first find sig DEs of (C1.T1 over C1.TVeh), and conversely (C2.T1 over C2.TVeh). 2) then compare these delta DEs (over vehicle) from their respective cell lines, to find delta delta DEs. So for the LFC, I want to take the straight normalized counts fold change: log2(C1.T1/C2.T1). It is the p-value that will change depending what hypotheses I am testing. Now as I am typing it out loud, I realize that the LFC and p-value are both dependent on the contrasting terms that I put in, which brings to my massive confusion with interaction terms. inputFormula=formula(~celltype+treatment+celltype:treatment) dds_fit=DESeqDataSetFromMatrix(countData=inputData,colData=inputDesign ,design=inputFormula) dds_fit=nbinomWaldTest(dds_fit,betaPrior=TRUE,maxit=local_maxit,useOpt im=TRUE) res1=results(dds_fit, contrast=list("celltypeC1.treatmentT1","celltypeC2.treatmentT1")) res2=results(dds_fit, contrast=list(c("treatmentT1","celltypeC2.treatmentT1"), c("treatmentTVeh","celltypeC2.treatmentTVeh"))) res3=results(dds_fit, contrast=list("celltypeC1.treatmentT1","celltypeC1.treatmentTVeh")) My interpretation is that: res1 is saying, contrasting C1-specific T1 effect over C2-specific T1 effect that is different from main effect of T1. res2 is saying, contrasting T1 effect over veh in cell type C2 that is different from any other cell types (C1 in this case) res3 is saying, contrasting C1-specific T1 effect over C1-specific TVeh effect that is different from those of C2-specific counter parts To me, all 3 of the above contrasts answer my intended question of C1.T1 vs C2.T1. Could someone help me with answering my intended question. And also corrections to my interpretations to the above contrast will also be very much appreciated. -Rob Date: Fri, 1 Aug 2014 19:38:12 -0400 Subject: Re: [BioC] DESeq2 basemean and log2foldchange From: michaelisaiahlove@gmail.com To: nextgame@hotmail.com CC: bioconductor@r-project.org Hi Rob, On Aug 1, 2014 2:20 PM, "rob yang" <nextgame@hotmail.com> wrote: > > Hello DESeq community, > > I wanted to ask how basemean and log2foldchange are calculated. > > 1) From mcol(mcol(dds_fit)), the basemean is denoted as "basemean over all rows". But I can't pin down what "all rows" this means. For a single transcript, my manual mean calculation of all counts across all replicates, all conditions do not equal to the DESeq reported basemean for a single transcript. > Yes that is kind of funny, I should change it to say "columns" or "samples". It is just as Steve described, the mean of normalized counts for all samples in the dds. > 2) log2foldchange seems to be changing with the contrasting variable. This is a little surprising since I'd expect log2foldchange to be a static number when comparing two conditions, and the contrasting variables only dictate testing for significance, ie., p-value. > I don't follow. Can you say more what you mean by changing with contrasting variable Mike > Thank you in advance. > > -Rob > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 9.8 years ago rob yang ▴ 50

0

Entering edit mode

hi Rob, As you have a clear base level of treatment (vehicle), I'd recommend using a standard model matrix with base levels of C1 and TVeh. I will make a note to include more documentation on this option. dds$celltype <- relevel(dds$celltype, "C1") dds$treatment <- relevel(dds$treatment, "TVeh") dds <- DESeq(dds, modelMatrixType="standard") After this, you can test if the T1 effect is different in celltype 2 with the interaction: results(dds, name="celltypeC2.treatmentT1") The T1 effect in celltype 1 (the main effect, as celltype 1 is the base level) is: results(dds, contrast=c("treatment","T1","TVeh")) The T1 effect in celltype 2 is the main effect above plus the interaction: results(dds, contrast=list(c("treatment_T1_vs_TVeh","celltypeC2.treatm entT1"),character())) Mike On Fri, Aug 1, 2014 at 10:57 PM, rob yang <nextgame at="" hotmail.com=""> wrote: > Thanks Steve and Mike. > > Ah, yes, thanks for the clarification that it is the mean of the NORMALIZED > counts, it all makes sense now. I was comparing to the mean of raw counts. I > guess- interpret the the "rows" being samples, but yes the word "samples" > clarifies things definitively. > > After thinking a bit on the LFC question, I think I have a logical error in > constructing the contrast, and would like to get some help. > > My design is 2 cell lines (C1, C2) and 4 treatments (T1,T2,T3,TVeh). > > ---My intention is to compare C1.T1 vs. C2.T1, and ask for the cell- specific > effect of treatment T1--- > > Normally this is broken down into two steps (below) but I wondered about > 1-step comparison using interaction terms: > 1) first find sig DEs of (C1.T1 over C1.TVeh), and conversely (C2.T1 over > C2.TVeh). > 2) then compare these delta DEs (over vehicle) from their respective cell > lines, to find delta delta DEs. > > So for the LFC, I want to take the straight normalized counts fold change: > log2(C1.T1/C2.T1). It is the p-value that will change depending what > hypotheses I am testing. Now as I am typing it out loud, I realize that the > LFC and p-value are both dependent on the contrasting terms that I put in, > which brings to my massive confusion with interaction terms. > > inputFormula=formula(~celltype+treatment+celltype:treatment) > dds_fit=DESeqDataSetFromMatrix(countData=inputData,colData=inputDesi gn,design=inputFormula) > dds_fit=nbinomWaldTest(dds_fit,betaPrior=TRUE,maxit=local_maxit,useO ptim=TRUE) > > res1=results(dds_fit, > contrast=list("celltypeC1.treatmentT1","celltypeC2.treatmentT1")) > res2=results(dds_fit, > contrast=list(c("treatmentT1","celltypeC2.treatmentT1"), > > c("treatmentTVeh","celltypeC2.treatmentTVeh"))) > res3=results(dds_fit, > contrast=list("celltypeC1.treatmentT1","celltypeC1.treatmentTVeh")) > > > My interpretation is that: > res1 is saying, contrasting C1-specific T1 effect over C2-specific T1 effect > that is different from main effect of T1. > res2 is saying, contrasting T1 effect over veh in cell type C2 that is > different from any other cell types (C1 in this case) > res3 is saying, contrasting C1-specific T1 effect over C1-specific TVeh > effect that is different from those of C2-specific counter parts > > To me, all 3 of the above contrasts answer my intended question of C1.T1 vs > C2.T1. > > Could someone help me with answering my intended question. And also > corrections to my interpretations to the above contrast will also be very > much appreciated. > > -Rob > > ________________________________ > Date: Fri, 1 Aug 2014 19:38:12 -0400 > Subject: Re: [BioC] DESeq2 basemean and log2foldchange > From: michaelisaiahlove at gmail.com > To: nextgame at hotmail.com > CC: bioconductor at r-project.org > > > Hi Rob, > On Aug 1, 2014 2:20 PM, "rob yang" <nextgame at="" hotmail.com=""> wrote: >> >> Hello DESeq community, >> >> I wanted to ask how basemean and log2foldchange are calculated. >> >> 1) From mcol(mcol(dds_fit)), the basemean is denoted as "basemean over all >> rows". But I can't pin down what "all rows" this means. For a single >> transcript, my manual mean calculation of all counts across all replicates, >> all conditions do not equal to the DESeq reported basemean for a single >> transcript. >> > > Yes that is kind of funny, I should change it to say "columns" or "samples". > It is just as Steve described, the mean of normalized counts for all samples > in the dds. > >> 2) log2foldchange seems to be changing with the contrasting variable. This >> is a little surprising since I'd expect log2foldchange to be a static number >> when comparing two conditions, and the contrasting variables only dictate >> testing for significance, ie., p-value. >> > > I don't follow. Can you say more what you mean by changing with contrasting > variable > > Mike > >> Thank you in advance. >> >> -Rob >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 9.8 years ago Michael Love 42k

0

Entering edit mode

Thanks very much Mike. This has been very very helpful. Let me strike the iron while it's hot once for all, please bear with me one more time. > As you have a clear base level of treatment (vehicle), I'd recommend > using a standard model matrix with base levels of C1 and TVeh. I will > make a note to include more documentation on this option. > > dds$celltype <- relevel(dds$celltype, "C1") > dds$treatment <- relevel(dds$treatment, "TVeh") > dds <- DESeq(dds, modelMatrixType="standard") This was very clear in the current documentation (v1.4.5). This was indeed what I did, but didn't include in the original post to cut down the length. > > After this, you can test if the T1 effect is different in celltype 2 > with the interaction: > > results(dds, name="celltypeC2.treatmentT1") Got it. My mistake was to ignore the implicitly defined base levels, and was doing: results(dds,contrast=list("celltypeC2.treatmentT1","celltypeC1.treatme ntT1")) Out of curiosity, what does my contrast statement above do, if sensible at all? > > The T1 effect in celltype 1 (the main effect, as celltype 1 is the > base level) is: > > results(dds, contrast=c("treatment","T1","TVeh")) Clear. > > The T1 effect in celltype 2 is the main effect above plus the interaction: > > results(dds, contrast=list(c("treatment_T1_vs_TVeh","celltypeC2.trea tmentT1"),character())) syntax-wise, will this be? results(dds,contrast=list(c("treatmentT1","celltypeC2.treatmentT1"),ch aracter())) additional questions: 1) to compare baseline (Veh) DEs b/w two cell lines: results(dds,name="celltypeC2") or results(dds,contrast=list(c("celltypeC2","celltypeC2.treatmentVeh"), c("celltypeC1","celltypeC1.treatmentVeh")) 2) if I have 3 treatments in total, T1, T2, T3, plus TVeh, and I want to compare T1 vs. T2 in C2, would i do: results(dds,contrast=list("celltypeC2.treatmentT1","celltypeC.treatmen tT2")) -Rob [[alternative HTML version deleted]]

ADD REPLY • link 9.8 years ago rob yang ▴ 50

0

Entering edit mode

On Sun, Aug 3, 2014 at 12:19 AM, rob yang <nextgame at="" hotmail.com=""> wrote: > Thanks very much Mike. This has been very very helpful. Let me strike the > iron while it's hot once for all, please bear with me one more time. > >> As you have a clear base level of treatment (vehicle), I'd recommend >> using a standard model matrix with base levels of C1 and TVeh. I will >> make a note to include more documentation on this option. >> >> dds$celltype <- relevel(dds$celltype, "C1") >> dds$treatment <- relevel(dds$treatment, "TVeh") >> dds <- DESeq(dds, modelMatrixType="standard") > > This was very clear in the current documentation (v1.4.5). This was indeed > what I did, but didn't include in the original post to cut down the length. > >> >> After this, you can test if the T1 effect is different in celltype 2 >> with the interaction: >> >> results(dds, name="celltypeC2.treatmentT1") > Got it. My mistake was to ignore the implicitly defined base levels, and was > doing: > results(dds,contrast=list("celltypeC2.treatmentT1","celltypeC1.treat mentT1")) > Out of curiosity, what does my contrast statement above do, if sensible at > all? > You won't have an interaction term "celltypeC1.treatmentT1" if you use the code I suggested above, i.e. dds <- DESeq(dds, modelMatrixType="standard") because C1 is the base level. > >> >> The T1 effect in celltype 1 (the main effect, as celltype 1 is the >> base level) is: >> >> results(dds, contrast=c("treatment","T1","TVeh")) > Clear. > > >> >> The T1 effect in celltype 2 is the main effect above plus the interaction: >> >> results(dds, >> contrast=list(c("treatment_T1_vs_TVeh","celltypeC2.treatmentT1"),ch aracter())) > syntax-wise, will this be? > results(dds,contrast=list(c("treatmentT1","celltypeC2.treatmentT1"), character())) > > additional questions: > 1) to compare baseline (Veh) DEs b/w two cell lines: > results(dds,name="celltypeC2") or > results(dds,contrast=list(c("celltypeC2","celltypeC2.treatmentVeh"), > > c("celltypeC1","celltypeC1.treatmentVeh")) > this is just results(dds, contrast=c("celltype","C2","C1")) or equivalently results(dds, name="celltype_C2_vs_C1") > 2) if I have 3 treatments in total, T1, T2, T3, plus TVeh, and I want to > compare T1 vs. T2 in C2, would i do: > results(dds,contrast=list("celltypeC2.treatmentT1","celltypeC.treatm entT2")) > This is the main effect for T1 plus the interaction term for C2 and T1 over the main effect for T2 plus the interaction term for C2 and T2. results(dds, contrast=list(c("treatment_T1_vs_TVeh","celltypeC2.treatmentT1"), c("treatment_T2_vs_TVeh","celltypeC2.treatmentT2"))) Mike > -Rob > >

ADD REPLY • link 9.8 years ago Michael Love 42k

0

Entering edit mode

Great, crystal clear! Everything makes sense now. Thanks Mike. -Rob > From: michaelisaiahlove@gmail.com > Date: Sun, 3 Aug 2014 21:51:06 -0400 > Subject: Re: [BioC] DESeq2 basemean and log2foldchange > To: nextgame@hotmail.com > CC: bioconductor@r-project.org > > On Sun, Aug 3, 2014 at 12:19 AM, rob yang <nextgame@hotmail.com> wrote: > > Thanks very much Mike. This has been very very helpful. Let me strike the > > iron while it's hot once for all, please bear with me one more time. > > > >> As you have a clear base level of treatment (vehicle), I'd recommend > >> using a standard model matrix with base levels of C1 and TVeh. I will > >> make a note to include more documentation on this option. > >> > >> dds$celltype <- relevel(dds$celltype, "C1") > >> dds$treatment <- relevel(dds$treatment, "TVeh") > >> dds <- DESeq(dds, modelMatrixType="standard") > > > > This was very clear in the current documentation (v1.4.5). This was indeed > > what I did, but didn't include in the original post to cut down the length. > > > >> > >> After this, you can test if the T1 effect is different in celltype 2 > >> with the interaction: > >> > >> results(dds, name="celltypeC2.treatmentT1") > > Got it. My mistake was to ignore the implicitly defined base levels, and was > > doing: > > results(dds,contrast=list("celltypeC2.treatmentT1","celltypeC1.tre atmentT1")) > > Out of curiosity, what does my contrast statement above do, if sensible at > > all? > > > > You won't have an interaction term "celltypeC1.treatmentT1" if you use > the code I suggested above, i.e. > > dds <- DESeq(dds, modelMatrixType="standard") > > because C1 is the base level. > > > > > >> > >> The T1 effect in celltype 1 (the main effect, as celltype 1 is the > >> base level) is: > >> > >> results(dds, contrast=c("treatment","T1","TVeh")) > > Clear. > > > > > >> > >> The T1 effect in celltype 2 is the main effect above plus the interaction: > >> > >> results(dds, > >> contrast=list(c("treatment_T1_vs_TVeh","celltypeC2.treatmentT1"), character())) > > syntax-wise, will this be? > > results(dds,contrast=list(c("treatmentT1","celltypeC2.treatmentT1" ),character())) > > > > additional questions: > > 1) to compare baseline (Veh) DEs b/w two cell lines: > > results(dds,name="celltypeC2") or > > results(dds,contrast=list(c("celltypeC2","celltypeC2.treatmentVeh"), > > > > c("celltypeC1","celltypeC1.treatmentVeh")) > > > > this is just > > results(dds, contrast=c("celltype","C2","C1")) > > or equivalently > > results(dds, name="celltype_C2_vs_C1") > > > 2) if I have 3 treatments in total, T1, T2, T3, plus TVeh, and I want to > > compare T1 vs. T2 in C2, would i do: > > results(dds,contrast=list("celltypeC2.treatmentT1","celltypeC.trea tmentT2")) > > > > This is the main effect for T1 plus the interaction term for C2 and T1 > over the main effect for T2 plus the interaction term for C2 and T2. > > results(dds, contrast=list(c("treatment_T1_vs_TVeh","celltypeC2.treatmentT1"), > c("treatment_T2_vs_TVeh","celltypeC2.treatmentT2"))) > > > Mike > > > -Rob > > > > [[alternative HTML version deleted]]

ADD REPLY • link 9.8 years ago rob yang ▴ 50

Login before adding your answer.