Question: phyloseq/DESeq gives negative transformed values
0
gravatar for Michael Love
5.1 years ago by
Michael Love23k
United States
Michael Love23k wrote:
hi Sophie, On Mon, May 5, 2014 at 5:24 PM, Sophie Josephine Weiss <sophie.weiss at="" colorado.edu=""> wrote: > Makes sense, thanks for your help. In the DESeq manual, it looks like all > we need to do for e.g. clustering, or pcoa, is the estimateSizeFactors. Is > this correct? > > Or would it be also ok to use the values from estimateDispersions with the > negatives set to zero or constant shifted? I would do the above, but it > looks like McMurdie et al. do both for their clustering simulation - so > thought I would ask. I don't follow your question. If you want to do clustering or PCA using DESeq2, I assume you are applying one of the two transformations we have implemented, as described in the vignette. If size factors are not already estimated, both transformations will estimate them internally, likewise for dispersions. These transformations return objects with matrices in the assay slot which are appropriate for calculating distances or PCA. We show demonstrations of both calculating distance and PCA in the vignette. Please look the vignette over again, as this is the recommended usage. These transformed values have been corrected for size factor. You should not use any DESeq2 functions on the matrix of transformed values. The transformation is the last step within DESeq2, then we assume the user is doing something "downstream" with these values. Mike > > Thanks again! > Sophie > > > On Thu, Apr 24, 2014 at 6:54 AM, Wolfgang Huber <whuber at="" embl.de=""> wrote: >> >> Hi Sophie >> >> as this issue comes up periodically, let me point out that >> >> log (cx) = log(c) + log(x) >> >> That means, if you think of ?x? as your data matrix and ?c? as a single >> positive number, you can always add or subtract a constant to your >> transformed data, for instance, to make it more agreeable to you by having >> all positive signs, and all that amounts to is an overall scaling >> (multiplication) of the data on the untransformed scale. >> An analogous idea applies to the rlog or vst transformations of DESeq2. >> >> A reasonable distance metric between samples or genes should probably not >> depend on such an overall constant c. >> >> Best wishes >> Wolfgang >> >> >> >> >> >> >> On 23 Apr 2014, at 23:44, Sophie Josephine Weiss >> <sophie.weiss at="" colorado.edu=""> wrote: >> >> > Thanks Michael, >> > The entire dataset (attached code and .biom) is negatives - there was an >> > error of "out of vertex space" as described >> > here<http: seqanswers.com="" forums="" showthread.php?p="18620">, >> > so I tried setting maxk=300 as suggested. >> > Commands are below. >> > Thanks again! >> > Sophie >> > >> > source("http://bioconductor.org/biocLite.R") >> > biocLite("phyloseq") >> > biocLite("DESeq") >> > >> > library("phyloseq") >> > library("DESeq") >> > library("biom") >> > >> > file = "~/Downloads/study_449_closed_reference_otu_table.biom" >> > x = import_biom(file) >> > source("~/Downloads/deseq_varstab.R") >> > DESeq_data = deseq_varstab(x, method = "blind", sharingMode = "maximum", >> > fitType = "local", locfit_extra_args=list(maxk=300)) >> > write_biom(make_biom(DESeq_data at otu_table >> > ),"~/Desktop/449_Costello_DESeq.biom.tsv") >> > >> > >> > On Sat, Apr 19, 2014 at 11:29 AM, Michael Love >> > <michaelisaiahlove at="" gmail.com="">wrote: >> > >> >> hi Sophie, >> >> >> >> You are getting negative values from the transformation for the >> >> reasons I mentioned earlier, the transformation is log2-like. >> >> >> >> If you want to do something downstream of our software which requires >> >> non-negative values, below is some example code of how to threshold >> >> negative values for a matrix in R. >> >> >> >> The question of what is the best distance to use for taxa counts, or >> >> whether ANOVA on variance stabilized data is a good idea for taxa >> >> counts, depends on the properties of the data, and this is an area of >> >> active research. As I don't have experience analyzing this kind of >> >> data, I don't want to make any guesses. >> >> >> >>> m <- matrix(-2:5, ncol=2) >> >>> m >> >> [,1] [,2] >> >> [1,] -2 2 >> >> [2,] -1 3 >> >> [3,] 0 4 >> >> [4,] 1 5 >> >>> m[m < 0] <- 0 >> >>> m >> >> [,1] [,2] >> >> [1,] 0 2 >> >> [2,] 0 3 >> >> [3,] 0 4 >> >> [4,] 1 5 >> >> >> >> On Fri, Apr 18, 2014 at 3:32 PM, Sophie Josephine Weiss >> >> <sophie.weiss at="" colorado.edu=""> wrote: >> >>> Hi Mike, >> >>> Could you please check whether I am running this correctly? I have >> >> double >> >>> checked all the parameters, but for some reason, I am getting >> >>> negatives >> >>> using the R script on the attached .biom dataset. There are no >> >> replicates >> >>> in this microbial dataset. >> >>> Thanks for your advice, >> >>> Sophie >> >>> >> >>> >> >>> On Wed, Apr 16, 2014 at 4:02 PM, Sophie Josephine Weiss >> >>> <sophie.weiss at="" colorado.edu=""> wrote: >> >>>> >> >>>> Thanks Mike, that is what I thought. What if we wanted to perform >> >> kruskal >> >>>> wallis, or is it possible to perform anova on the variance- stabilized >> >>>> matrix? >> >>>> >> >>>> >> >>>> On Wed, Apr 16, 2014 at 2:29 PM, Michael Love >> >>>> <michaelisaiahlove at="" gmail.com=""> wrote: >> >>>>> >> >>>>> hi Sophie, >> >>>>> >> >>>>> We recommend using the standard DESeq() function for differential >> >>>>> expression. >> >>>>> >> >>>>> This is mentioned in the first line of the vignette section on >> >>>>> transformations: >> >>>>> >> >>>>> "In order to test for diff erential expression, we operate on raw >> >>>>> counts and use discrete distributions as >> >>>>> described in the previous section" >> >>>>> >> >>>>> Also, in the McMurdie and Holmes, they are using the DESeq() >> >>>>> function, >> >>>>> as shown in their supplemental material: >> >>>>> >> >>>>> >> >>>>> >> >> >> >> http://joey711.github.io/waste-not-supplemental/simulation- differential-abundance/simulation-differential-abundance-server.html >> >>>>> >> >>>>> On Wed, Apr 16, 2014 at 3:22 PM, Sophie Josephine Weiss >> >>>>> <sophie.weiss at="" colorado.edu=""> wrote: >> >>>>>> Please help with this? Thanks again. >> >>>>>> >> >>>>>> >> >>>>>> On Mon, Apr 14, 2014 at 6:02 PM, Sophie Josephine Weiss >> >>>>>> <sophie.weiss at="" colorado.edu=""> wrote: >> >>>>>>> >> >>>>>>> Thanks again Mike - would it be ok to do chi-2 and other >> >> significance >> >>>>>>> tests on the DESeq transformed datasets using independent code, or >> >> is >> >>>>>>> it >> >>>>>>> necessary to do the differential expression tests strictly within >> >>>>>>> DESeq2? >> >>>>>>> >> >>>>>>> Sophie >> >>>>>>> >> >>>>>>> >> >>>>>>> On Mon, Apr 14, 2014 at 5:41 PM, Michael Love >> >>>>>>> <michaelisaiahlove at="" gmail.com=""> wrote: >> >>>>>>>> >> >>>>>>>> hi Sophie, >> >>>>>>>> >> >>>>>>>> The VST code is the same in DESeq and DESeq2. The estimation of >> >>>>>>>> dispersion is slightly different (details are in the vignette >> >>>>>>>> "Changes >> >>>>>>>> from DESeq to DESeq2"), but the fitted line (which is used by the >> >>>>>>>> VST) >> >>>>>>>> should be very similar. >> >>>>>>>> >> >>>>>>>> Mike >> >>>>>>>> >> >>>>>>>> On Mon, Apr 14, 2014 at 6:27 PM, Sophie Josephine Weiss >> >>>>>>>> <sophie.weiss at="" colorado.edu=""> wrote: >> >>>>>>>>> Hi Mike, >> >>>>>>>>> The McMurdie and Holmes paper uses DESeq for matrix >> >> normalization - >> >>>>>>>>> do >> >>>>>>>>> you >> >>>>>>>>> think that is ok, or would it be better to use DESeq 2? >> >>>>>>>>> Thanks again, >> >>>>>>>>> Sophie >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Mon, Apr 14, 2014 at 3:40 PM, Michael Love >> >>>>>>>>> <michaelisaiahlove at="" gmail.com=""> >> >>>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>> hi Sophie, >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> On Mon, Apr 14, 2014 at 1:15 PM, Sophie Josephine Weiss >> >>>>>>>>>> <sophie.weiss at="" colorado.edu=""> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>> Hi Mike, >> >>>>>>>>>>> Thanks for the references. By "threshold at 0" do you mean >> >> set >> >>>>>>>>>>> any >> >>>>>>>>>>> negative values equal to 0? >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> yes. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> Do you think this is the best approach? >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> I haven't explored this area, and would defer to the McMurdie >> >> and >> >>>>>>>>>> Holmes paper for the best combinations of distance and >> >>>>>>>>>> transformation. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> Thanks again, >> >>>>>>>>>>> Sophie >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> On Mon, Apr 14, 2014 at 11:01 AM, Michael Love >> >>>>>>>>>>> <michaelisaiahlove at="" gmail.com=""> wrote: >> >>>>>>>>>>>> >> >>>>>>>>>>>> I tried poking around here >> >>>>>>>>>>>> http://joey711.github.io/phyloseq/distance >> >>>>>>>>>>>> but couldn't see if the authors did anything for distances >> >>>>>>>>>>>> requiring >> >>>>>>>>>>>> non-negative data. It appears >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >> >> >> http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjourn al.pcbi.1003531 >> >>>>>>>>>>>> that VST was tested with Bray-Curtis distance. I think the >> >>>>>>>>>>>> distance >> >>>>>>>>>>>> is >> >>>>>>>>>>>> designed for counts, but you could always threshold at 0 to >> >>>>>>>>>>>> insist >> >>>>>>>>>>>> that the >> >>>>>>>>>>>> log2-like quantity act more like a count. >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> On Mon, Apr 14, 2014 at 12:23 PM, Sophie Josephine Weiss >> >>>>>>>>>>>> <sophie.weiss at="" colorado.edu=""> wrote: >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Hi Mike, >> >>>>>>>>>>>>> Thanks for explaining more. I am used to working with >> >>>>>>>>>>>>> rarefied >> >>>>>>>>>>>>> microbial datasets, that is why. Instead of rarefying I >> >> would >> >>>>>>>>>>>>> like to use >> >>>>>>>>>>>>> the DESeq method. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> How would you then suggest going about calculating >> >> bray-curtis >> >>>>>>>>>>>>> distance, or summarized taxa diagrams with these new >> >>>>>>>>>>>>> transformed >> >>>>>>>>>>>>> matrices >> >>>>>>>>>>>>> with negative values? >> >>>>>>>>>>>>> Thanks again, >> >>>>>>>>>>>>> Sophie >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On Mon, Apr 14, 2014 at 7:17 AM, Michael Love >> >>>>>>>>>>>>> <michaelisaiahlove at="" gmail.com=""> wrote: >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> hi Sophie, >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Can you explain why you don't want negative values in the >> >>>>>>>>>>>>>> transformed >> >>>>>>>>>>>>>> values? Adding one to the raw counts is not sufficient. I >> >>>>>>>>>>>>>> should >> >>>>>>>>>>>>>> have said >> >>>>>>>>>>>>>> in my previous email, "the expected counts on the common >> >>>>>>>>>>>>>> scale". >> >>>>>>>>>>>>>> If the >> >>>>>>>>>>>>>> size factor for a sample is 2, then an expected count of 1 >> >>>>>>>>>>>>>> leads >> >>>>>>>>>>>>>> to an >> >>>>>>>>>>>>>> expected count of 1/2 on the common scale (after accounting >> >>>>>>>>>>>>>> for >> >>>>>>>>>>>>>> size >> >>>>>>>>>>>>>> factors). >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> On Sun, Apr 13, 2014 at 11:50 PM, Sophie Josephine Weiss >> >>>>>>>>>>>>>> <sophie.weiss at="" colorado.edu=""> wrote: >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Hi Mike, >> >>>>>>>>>>>>>>> Thanks for your reply! Ok, makes sense, but I added 1 to >> >>>>>>>>>>>>>>> all my >> >>>>>>>>>>>>>>> matrix values, so the lowest value in the matrix is 1 - >> >>>>>>>>>>>>>>> there >> >>>>>>>>>>>>>>> are still >> >>>>>>>>>>>>>>> negatives? >> >>>>>>>>>>>>>>> Thanks again, >> >>>>>>>>>>>>>>> Sophie >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> On Sun, Apr 13, 2014 at 9:01 PM, Michael Love >> >>>>>>>>>>>>>>> <michaelisaiahlove at="" gmail.com=""> wrote: >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> hi Sophie, >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> The transformations in DESeq and DESeq2 are log2-like >> >>>>>>>>>>>>>>>> transformations. If the expected count is between 0 and >> >> 1, >> >>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>> values can be >> >>>>>>>>>>>>>>>> negative, this does not indicate a problem. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Mike >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> On Sun, Apr 13, 2014 at 5:17 PM, Sophie Josephine Weiss >> >>>>>>>>>>>>>>>> <sophie.weiss at="" colorado.edu=""> wrote: >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Hello, >> >>>>>>>>>>>>>>>>> I have microbiome data with no replicates, from >> >> different >> >>>>>>>>>>>>>>>>> conditions. I am >> >>>>>>>>>>>>>>>>> trying to transform the data using the DESeq method, as >> >>>>>>>>>>>>>>>>> described >> >>>>>>>>>>>>>>>>> in >> >>>>>>>>>>>>>>>>> McMurdie and Holmes 2014. >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> The attached file is the definition I am using, as per >> >> the >> >>>>>>>>>>>>>>>>> supplemental >> >>>>>>>>>>>>>>>>> info in McMurdie and Holmes 2014, and the .biom file I >> >> am >> >>>>>>>>>>>>>>>>> using. >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Thank you for your help, >> >>>>>>>>>>>>>>>>> Sophie >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> _______________________________________________ >> >>>>>>>>>>>>>>>>> Bioconductor mailing list >> >>>>>>>>>>>>>>>>> Bioconductor at r-project.org >> >>>>>>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>>>>>>>>>>>>>>>> Search the archives: >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>> >> >>>> >> >>>> >> >>> >> >> >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> >
ADD COMMENTlink modified 5.1 years ago by Sophie Josephine Weiss130 • written 5.1 years ago by Michael Love23k
Answer: phyloseq/DESeq gives negative transformed values
0
gravatar for Sophie Josephine Weiss
5.1 years ago by
Sophie Josephine Weiss130 wrote:
Makes sense, thanks for your help. In the DESeq manual, it looks like all we need to do for e.g. clustering, or pcoa, is the estimateSizeFactors. Is this correct? Or would it be also ok to use the values from estimateDispersions with the negatives set to zero or constant shifted? I would do the above, but it looks like McMurdie et al. do both for their clustering simulation - so thought I would ask. Thanks again! Sophie On Thu, Apr 24, 2014 at 6:54 AM, Wolfgang Huber <whuber@embl.de> wrote: > Hi Sophie > > as this issue comes up periodically, let me point out that > > log (cx) = log(c) + log(x) > > That means, if you think of ‘x’ as your data matrix and ‘c’ as a single > positive number, you can always add or subtract a constant to your > transformed data, for instance, to make it more agreeable to you by having > all positive signs, and all that amounts to is an overall scaling > (multiplication) of the data on the untransformed scale. > An analogous idea applies to the rlog or vst transformations of DESeq2. > > A reasonable distance metric between samples or genes should probably not > depend on such an overall constant c. > > Best wishes > Wolfgang > > > > > > > On 23 Apr 2014, at 23:44, Sophie Josephine Weiss < > Sophie.Weiss@colorado.edu> wrote: > > > Thanks Michael, > > The entire dataset (attached code and .biom) is negatives - there was an > > error of "out of vertex space" as described > > here<http: seqanswers.com="" forums="" showthread.php?p="18620">, > > so I tried setting maxk=300 as suggested. > > Commands are below. > > Thanks again! > > Sophie > > > > source("http://bioconductor.org/biocLite.R") > > biocLite("phyloseq") > > biocLite("DESeq") > > > > library("phyloseq") > > library("DESeq") > > library("biom") > > > > file = "~/Downloads/study_449_closed_reference_otu_table.biom" > > x = import_biom(file) > > source("~/Downloads/deseq_varstab.R") > > DESeq_data = deseq_varstab(x, method = "blind", sharingMode = "maximum", > > fitType = "local", locfit_extra_args=list(maxk=300)) > > write_biom(make_biom(DESeq_data@otu_table > > ),"~/Desktop/449_Costello_DESeq.biom.tsv") > > > > > > On Sat, Apr 19, 2014 at 11:29 AM, Michael Love > > <michaelisaiahlove@gmail.com>wrote: > > > >> hi Sophie, > >> > >> You are getting negative values from the transformation for the > >> reasons I mentioned earlier, the transformation is log2-like. > >> > >> If you want to do something downstream of our software which requires > >> non-negative values, below is some example code of how to threshold > >> negative values for a matrix in R. > >> > >> The question of what is the best distance to use for taxa counts, or > >> whether ANOVA on variance stabilized data is a good idea for taxa > >> counts, depends on the properties of the data, and this is an area of > >> active research. As I don't have experience analyzing this kind of > >> data, I don't want to make any guesses. > >> > >>> m <- matrix(-2:5, ncol=2) > >>> m > >> [,1] [,2] > >> [1,] -2 2 > >> [2,] -1 3 > >> [3,] 0 4 > >> [4,] 1 5 > >>> m[m < 0] <- 0 > >>> m > >> [,1] [,2] > >> [1,] 0 2 > >> [2,] 0 3 > >> [3,] 0 4 > >> [4,] 1 5 > >> > >> On Fri, Apr 18, 2014 at 3:32 PM, Sophie Josephine Weiss > >> <sophie.weiss@colorado.edu> wrote: > >>> Hi Mike, > >>> Could you please check whether I am running this correctly? I have > >> double > >>> checked all the parameters, but for some reason, I am getting negatives > >>> using the R script on the attached .biom dataset. There are no > >> replicates > >>> in this microbial dataset. > >>> Thanks for your advice, > >>> Sophie > >>> > >>> > >>> On Wed, Apr 16, 2014 at 4:02 PM, Sophie Josephine Weiss > >>> <sophie.weiss@colorado.edu> wrote: > >>>> > >>>> Thanks Mike, that is what I thought. What if we wanted to perform > >> kruskal > >>>> wallis, or is it possible to perform anova on the variance- stabilized > >>>> matrix? > >>>> > >>>> > >>>> On Wed, Apr 16, 2014 at 2:29 PM, Michael Love > >>>> <michaelisaiahlove@gmail.com> wrote: > >>>>> > >>>>> hi Sophie, > >>>>> > >>>>> We recommend using the standard DESeq() function for differential > >>>>> expression. > >>>>> > >>>>> This is mentioned in the first line of the vignette section on > >>>>> transformations: > >>>>> > >>>>> "In order to test for diff erential expression, we operate on raw > >>>>> counts and use discrete distributions as > >>>>> described in the previous section" > >>>>> > >>>>> Also, in the McMurdie and Holmes, they are using the DESeq() > function, > >>>>> as shown in their supplemental material: > >>>>> > >>>>> > >>>>> > >> > http://joey711.github.io/waste-not-supplemental/simulation- differential-abundance/simulation-differential-abundance-server.html > >>>>> > >>>>> On Wed, Apr 16, 2014 at 3:22 PM, Sophie Josephine Weiss > >>>>> <sophie.weiss@colorado.edu> wrote: > >>>>>> Please help with this? Thanks again. > >>>>>> > >>>>>> > >>>>>> On Mon, Apr 14, 2014 at 6:02 PM, Sophie Josephine Weiss > >>>>>> <sophie.weiss@colorado.edu> wrote: > >>>>>>> > >>>>>>> Thanks again Mike - would it be ok to do chi-2 and other > >> significance > >>>>>>> tests on the DESeq transformed datasets using independent code, or > >> is > >>>>>>> it > >>>>>>> necessary to do the differential expression tests strictly within > >>>>>>> DESeq2? > >>>>>>> > >>>>>>> Sophie > >>>>>>> > >>>>>>> > >>>>>>> On Mon, Apr 14, 2014 at 5:41 PM, Michael Love > >>>>>>> <michaelisaiahlove@gmail.com> wrote: > >>>>>>>> > >>>>>>>> hi Sophie, > >>>>>>>> > >>>>>>>> The VST code is the same in DESeq and DESeq2. The estimation of > >>>>>>>> dispersion is slightly different (details are in the vignette > >>>>>>>> "Changes > >>>>>>>> from DESeq to DESeq2"), but the fitted line (which is used by the > >>>>>>>> VST) > >>>>>>>> should be very similar. > >>>>>>>> > >>>>>>>> Mike > >>>>>>>> > >>>>>>>> On Mon, Apr 14, 2014 at 6:27 PM, Sophie Josephine Weiss > >>>>>>>> <sophie.weiss@colorado.edu> wrote: > >>>>>>>>> Hi Mike, > >>>>>>>>> The McMurdie and Holmes paper uses DESeq for matrix > >> normalization - > >>>>>>>>> do > >>>>>>>>> you > >>>>>>>>> think that is ok, or would it be better to use DESeq 2? > >>>>>>>>> Thanks again, > >>>>>>>>> Sophie > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Mon, Apr 14, 2014 at 3:40 PM, Michael Love > >>>>>>>>> <michaelisaiahlove@gmail.com> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> hi Sophie, > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Mon, Apr 14, 2014 at 1:15 PM, Sophie Josephine Weiss > >>>>>>>>>> <sophie.weiss@colorado.edu> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hi Mike, > >>>>>>>>>>> Thanks for the references. By "threshold at 0" do you mean > >> set > >>>>>>>>>>> any > >>>>>>>>>>> negative values equal to 0? > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> yes. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Do you think this is the best approach? > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> I haven't explored this area, and would defer to the McMurdie > >> and > >>>>>>>>>> Holmes paper for the best combinations of distance and > >>>>>>>>>> transformation. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Thanks again, > >>>>>>>>>>> Sophie > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Mon, Apr 14, 2014 at 11:01 AM, Michael Love > >>>>>>>>>>> <michaelisaiahlove@gmail.com> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> I tried poking around here > >>>>>>>>>>>> http://joey711.github.io/phyloseq/distance > >>>>>>>>>>>> but couldn't see if the authors did anything for distances > >>>>>>>>>>>> requiring > >>>>>>>>>>>> non-negative data. It appears > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >> > http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.p cbi.1003531 > >>>>>>>>>>>> that VST was tested with Bray-Curtis distance. I think the > >>>>>>>>>>>> distance > >>>>>>>>>>>> is > >>>>>>>>>>>> designed for counts, but you could always threshold at 0 to > >>>>>>>>>>>> insist > >>>>>>>>>>>> that the > >>>>>>>>>>>> log2-like quantity act more like a count. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Mon, Apr 14, 2014 at 12:23 PM, Sophie Josephine Weiss > >>>>>>>>>>>> <sophie.weiss@colorado.edu> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hi Mike, > >>>>>>>>>>>>> Thanks for explaining more. I am used to working with > >>>>>>>>>>>>> rarefied > >>>>>>>>>>>>> microbial datasets, that is why. Instead of rarefying I > >> would > >>>>>>>>>>>>> like to use > >>>>>>>>>>>>> the DESeq method. > >>>>>>>>>>>>> > >>>>>>>>>>>>> How would you then suggest going about calculating > >> bray-curtis > >>>>>>>>>>>>> distance, or summarized taxa diagrams with these new > >>>>>>>>>>>>> transformed > >>>>>>>>>>>>> matrices > >>>>>>>>>>>>> with negative values? > >>>>>>>>>>>>> Thanks again, > >>>>>>>>>>>>> Sophie > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Mon, Apr 14, 2014 at 7:17 AM, Michael Love > >>>>>>>>>>>>> <michaelisaiahlove@gmail.com> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> hi Sophie, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Can you explain why you don't want negative values in the > >>>>>>>>>>>>>> transformed > >>>>>>>>>>>>>> values? Adding one to the raw counts is not sufficient. I > >>>>>>>>>>>>>> should > >>>>>>>>>>>>>> have said > >>>>>>>>>>>>>> in my previous email, "the expected counts on the common > >>>>>>>>>>>>>> scale". > >>>>>>>>>>>>>> If the > >>>>>>>>>>>>>> size factor for a sample is 2, then an expected count of 1 > >>>>>>>>>>>>>> leads > >>>>>>>>>>>>>> to an > >>>>>>>>>>>>>> expected count of 1/2 on the common scale (after accounting > >>>>>>>>>>>>>> for > >>>>>>>>>>>>>> size > >>>>>>>>>>>>>> factors). > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Sun, Apr 13, 2014 at 11:50 PM, Sophie Josephine Weiss > >>>>>>>>>>>>>> <sophie.weiss@colorado.edu> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi Mike, > >>>>>>>>>>>>>>> Thanks for your reply! Ok, makes sense, but I added 1 to > >>>>>>>>>>>>>>> all my > >>>>>>>>>>>>>>> matrix values, so the lowest value in the matrix is 1 - > >>>>>>>>>>>>>>> there > >>>>>>>>>>>>>>> are still > >>>>>>>>>>>>>>> negatives? > >>>>>>>>>>>>>>> Thanks again, > >>>>>>>>>>>>>>> Sophie > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Sun, Apr 13, 2014 at 9:01 PM, Michael Love > >>>>>>>>>>>>>>> <michaelisaiahlove@gmail.com> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> hi Sophie, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The transformations in DESeq and DESeq2 are log2-like > >>>>>>>>>>>>>>>> transformations. If the expected count is between 0 and > >> 1, > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>> values can be > >>>>>>>>>>>>>>>> negative, this does not indicate a problem. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Mike > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Sun, Apr 13, 2014 at 5:17 PM, Sophie Josephine Weiss > >>>>>>>>>>>>>>>> <sophie.weiss@colorado.edu> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>>>> I have microbiome data with no replicates, from > >> different > >>>>>>>>>>>>>>>>> conditions. I am > >>>>>>>>>>>>>>>>> trying to transform the data using the DESeq method, as > >>>>>>>>>>>>>>>>> described > >>>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>> McMurdie and Holmes 2014. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> The attached file is the definition I am using, as per > >> the > >>>>>>>>>>>>>>>>> supplemental > >>>>>>>>>>>>>>>>> info in McMurdie and Holmes 2014, and the .biom file I > >> am > >>>>>>>>>>>>>>>>> using. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Thank you for your help, > >>>>>>>>>>>>>>>>> Sophie > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>>>>>>> Bioconductor mailing list > >>>>>>>>>>>>>>>>> Bioconductor@r-project.org > >>>>>>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>>>>>>>>>>>>>>> Search the archives: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>> > >>>> > >>> > >> > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]
ADD COMMENTlink written 5.1 years ago by Sophie Josephine Weiss130
Answer: phyloseq/DESeq gives negative transformed values
0
gravatar for Sophie Josephine Weiss
5.1 years ago by
Sophie Josephine Weiss130 wrote:
Hi Mike, Sorry, I was using DESeq. However, I am switching to DESeq2, and was running the rlog transformation - it seems to take a long time (>5hrs on a laptop), even on an ordinary-sized dataset - is this usual? Thanks again, Sophie On Mon, May 5, 2014 at 6:16 PM, Michael Love <michaelisaiahlove@gmail.com>wrote: > hi Sophie, > > On Mon, May 5, 2014 at 5:24 PM, Sophie Josephine Weiss > <sophie.weiss@colorado.edu> wrote: > > Makes sense, thanks for your help. In the DESeq manual, it looks like > all > > we need to do for e.g. clustering, or pcoa, is the estimateSizeFactors. > Is > > this correct? > > > > Or would it be also ok to use the values from estimateDispersions with > the > > negatives set to zero or constant shifted? I would do the above, but it > > looks like McMurdie et al. do both for their clustering simulation - so > > thought I would ask. > > I don't follow your question. If you want to do clustering or PCA > using DESeq2, I assume you are applying one of the two transformations > we have implemented, as described in the vignette. If size factors are > not already estimated, both transformations will estimate them > internally, likewise for dispersions. > > These transformations return objects with matrices in the assay slot > which are appropriate for calculating distances or PCA. We show > demonstrations of both calculating distance and PCA in the vignette. > > Please look the vignette over again, as this is the recommended usage. > These transformed values have been corrected for size factor. > > You should not use any DESeq2 functions on the matrix of transformed > values. The transformation is the last step within DESeq2, then we > assume the user is doing something "downstream" with these values. > > Mike > > > > > Thanks again! > > Sophie > > > > > > On Thu, Apr 24, 2014 at 6:54 AM, Wolfgang Huber <whuber@embl.de> wrote: > >> > >> Hi Sophie > >> > >> as this issue comes up periodically, let me point out that > >> > >> log (cx) = log(c) + log(x) > >> > >> That means, if you think of ‘x’ as your data matrix and ‘c’ as a single > >> positive number, you can always add or subtract a constant to your > >> transformed data, for instance, to make it more agreeable to you by > having > >> all positive signs, and all that amounts to is an overall scaling > >> (multiplication) of the data on the untransformed scale. > >> An analogous idea applies to the rlog or vst transformations of DESeq2. > >> > >> A reasonable distance metric between samples or genes should probably > not > >> depend on such an overall constant c. > >> > >> Best wishes > >> Wolfgang > >> > >> > >> > >> > >> > >> > >> On 23 Apr 2014, at 23:44, Sophie Josephine Weiss > >> <sophie.weiss@colorado.edu> wrote: > >> > >> > Thanks Michael, > >> > The entire dataset (attached code and .biom) is negatives - there was > an > >> > error of "out of vertex space" as described > >> > here<http: seqanswers.com="" forums="" showthread.php?p="18620">, > >> > so I tried setting maxk=300 as suggested. > >> > Commands are below. > >> > Thanks again! > >> > Sophie > >> > > >> > source("http://bioconductor.org/biocLite.R") > >> > biocLite("phyloseq") > >> > biocLite("DESeq") > >> > > >> > library("phyloseq") > >> > library("DESeq") > >> > library("biom") > >> > > >> > file = "~/Downloads/study_449_closed_reference_otu_table.biom" > >> > x = import_biom(file) > >> > source("~/Downloads/deseq_varstab.R") > >> > DESeq_data = deseq_varstab(x, method = "blind", sharingMode = > "maximum", > >> > fitType = "local", locfit_extra_args=list(maxk=300)) > >> > write_biom(make_biom(DESeq_data@otu_table > >> > ),"~/Desktop/449_Costello_DESeq.biom.tsv") > >> > > >> > > >> > On Sat, Apr 19, 2014 at 11:29 AM, Michael Love > >> > <michaelisaiahlove@gmail.com>wrote: > >> > > >> >> hi Sophie, > >> >> > >> >> You are getting negative values from the transformation for the > >> >> reasons I mentioned earlier, the transformation is log2-like. > >> >> > >> >> If you want to do something downstream of our software which requires > >> >> non-negative values, below is some example code of how to threshold > >> >> negative values for a matrix in R. > >> >> > >> >> The question of what is the best distance to use for taxa counts, or > >> >> whether ANOVA on variance stabilized data is a good idea for taxa > >> >> counts, depends on the properties of the data, and this is an area of > >> >> active research. As I don't have experience analyzing this kind of > >> >> data, I don't want to make any guesses. > >> >> > >> >>> m <- matrix(-2:5, ncol=2) > >> >>> m > >> >> [,1] [,2] > >> >> [1,] -2 2 > >> >> [2,] -1 3 > >> >> [3,] 0 4 > >> >> [4,] 1 5 > >> >>> m[m < 0] <- 0 > >> >>> m > >> >> [,1] [,2] > >> >> [1,] 0 2 > >> >> [2,] 0 3 > >> >> [3,] 0 4 > >> >> [4,] 1 5 > >> >> > >> >> On Fri, Apr 18, 2014 at 3:32 PM, Sophie Josephine Weiss > >> >> <sophie.weiss@colorado.edu> wrote: > >> >>> Hi Mike, > >> >>> Could you please check whether I am running this correctly? I have > >> >> double > >> >>> checked all the parameters, but for some reason, I am getting > >> >>> negatives > >> >>> using the R script on the attached .biom dataset. There are no > >> >> replicates > >> >>> in this microbial dataset. > >> >>> Thanks for your advice, > >> >>> Sophie > >> >>> > >> >>> > >> >>> On Wed, Apr 16, 2014 at 4:02 PM, Sophie Josephine Weiss > >> >>> <sophie.weiss@colorado.edu> wrote: > >> >>>> > >> >>>> Thanks Mike, that is what I thought. What if we wanted to perform > >> >> kruskal > >> >>>> wallis, or is it possible to perform anova on the > variance-stabilized > >> >>>> matrix? > >> >>>> > >> >>>> > >> >>>> On Wed, Apr 16, 2014 at 2:29 PM, Michael Love > >> >>>> <michaelisaiahlove@gmail.com> wrote: > >> >>>>> > >> >>>>> hi Sophie, > >> >>>>> > >> >>>>> We recommend using the standard DESeq() function for differential > >> >>>>> expression. > >> >>>>> > >> >>>>> This is mentioned in the first line of the vignette section on > >> >>>>> transformations: > >> >>>>> > >> >>>>> "In order to test for diff erential expression, we operate on raw > >> >>>>> counts and use discrete distributions as > >> >>>>> described in the previous section" > >> >>>>> > >> >>>>> Also, in the McMurdie and Holmes, they are using the DESeq() > >> >>>>> function, > >> >>>>> as shown in their supplemental material: > >> >>>>> > >> >>>>> > >> >>>>> > >> >> > >> >> > http://joey711.github.io/waste-not-supplemental/simulation- differential-abundance/simulation-differential-abundance-server.html > >> >>>>> > >> >>>>> On Wed, Apr 16, 2014 at 3:22 PM, Sophie Josephine Weiss > >> >>>>> <sophie.weiss@colorado.edu> wrote: > >> >>>>>> Please help with this? Thanks again. > >> >>>>>> > >> >>>>>> > >> >>>>>> On Mon, Apr 14, 2014 at 6:02 PM, Sophie Josephine Weiss > >> >>>>>> <sophie.weiss@colorado.edu> wrote: > >> >>>>>>> > >> >>>>>>> Thanks again Mike - would it be ok to do chi-2 and other > >> >> significance > >> >>>>>>> tests on the DESeq transformed datasets using independent code, > or > >> >> is > >> >>>>>>> it > >> >>>>>>> necessary to do the differential expression tests strictly > within > >> >>>>>>> DESeq2? > >> >>>>>>> > >> >>>>>>> Sophie > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> On Mon, Apr 14, 2014 at 5:41 PM, Michael Love > >> >>>>>>> <michaelisaiahlove@gmail.com> wrote: > >> >>>>>>>> > >> >>>>>>>> hi Sophie, > >> >>>>>>>> > >> >>>>>>>> The VST code is the same in DESeq and DESeq2. The estimation of > >> >>>>>>>> dispersion is slightly different (details are in the vignette > >> >>>>>>>> "Changes > >> >>>>>>>> from DESeq to DESeq2"), but the fitted line (which is used by > the > >> >>>>>>>> VST) > >> >>>>>>>> should be very similar. > >> >>>>>>>> > >> >>>>>>>> Mike > >> >>>>>>>> > >> >>>>>>>> On Mon, Apr 14, 2014 at 6:27 PM, Sophie Josephine Weiss > >> >>>>>>>> <sophie.weiss@colorado.edu> wrote: > >> >>>>>>>>> Hi Mike, > >> >>>>>>>>> The McMurdie and Holmes paper uses DESeq for matrix > >> >> normalization - > >> >>>>>>>>> do > >> >>>>>>>>> you > >> >>>>>>>>> think that is ok, or would it be better to use DESeq 2? > >> >>>>>>>>> Thanks again, > >> >>>>>>>>> Sophie > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> On Mon, Apr 14, 2014 at 3:40 PM, Michael Love > >> >>>>>>>>> <michaelisaiahlove@gmail.com> > >> >>>>>>>>> wrote: > >> >>>>>>>>>> > >> >>>>>>>>>> hi Sophie, > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> On Mon, Apr 14, 2014 at 1:15 PM, Sophie Josephine Weiss > >> >>>>>>>>>> <sophie.weiss@colorado.edu> wrote: > >> >>>>>>>>>>> > >> >>>>>>>>>>> Hi Mike, > >> >>>>>>>>>>> Thanks for the references. By "threshold at 0" do you mean > >> >> set > >> >>>>>>>>>>> any > >> >>>>>>>>>>> negative values equal to 0? > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> yes. > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> Do you think this is the best approach? > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> I haven't explored this area, and would defer to the McMurdie > >> >> and > >> >>>>>>>>>> Holmes paper for the best combinations of distance and > >> >>>>>>>>>> transformation. > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> Thanks again, > >> >>>>>>>>>>> Sophie > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> On Mon, Apr 14, 2014 at 11:01 AM, Michael Love > >> >>>>>>>>>>> <michaelisaiahlove@gmail.com> wrote: > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> I tried poking around here > >> >>>>>>>>>>>> http://joey711.github.io/phyloseq/distance > >> >>>>>>>>>>>> but couldn't see if the authors did anything for distances > >> >>>>>>>>>>>> requiring > >> >>>>>>>>>>>> non-negative data. It appears > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >> > >> >> > http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.p cbi.1003531 > >> >>>>>>>>>>>> that VST was tested with Bray-Curtis distance. I think the > >> >>>>>>>>>>>> distance > >> >>>>>>>>>>>> is > >> >>>>>>>>>>>> designed for counts, but you could always threshold at 0 to > >> >>>>>>>>>>>> insist > >> >>>>>>>>>>>> that the > >> >>>>>>>>>>>> log2-like quantity act more like a count. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> On Mon, Apr 14, 2014 at 12:23 PM, Sophie Josephine Weiss > >> >>>>>>>>>>>> <sophie.weiss@colorado.edu> wrote: > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> Hi Mike, > >> >>>>>>>>>>>>> Thanks for explaining more. I am used to working with > >> >>>>>>>>>>>>> rarefied > >> >>>>>>>>>>>>> microbial datasets, that is why. Instead of rarefying I > >> >> would > >> >>>>>>>>>>>>> like to use > >> >>>>>>>>>>>>> the DESeq method. > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> How would you then suggest going about calculating > >> >> bray-curtis > >> >>>>>>>>>>>>> distance, or summarized taxa diagrams with these new > >> >>>>>>>>>>>>> transformed > >> >>>>>>>>>>>>> matrices > >> >>>>>>>>>>>>> with negative values? > >> >>>>>>>>>>>>> Thanks again, > >> >>>>>>>>>>>>> Sophie > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> On Mon, Apr 14, 2014 at 7:17 AM, Michael Love > >> >>>>>>>>>>>>> <michaelisaiahlove@gmail.com> wrote: > >> >>>>>>>>>>>>>> > >> >>>>>>>>>>>>>> hi Sophie, > >> >>>>>>>>>>>>>> > >> >>>>>>>>>>>>>> Can you explain why you don't want negative values in the > >> >>>>>>>>>>>>>> transformed > >> >>>>>>>>>>>>>> values? Adding one to the raw counts is not sufficient. > I > >> >>>>>>>>>>>>>> should > >> >>>>>>>>>>>>>> have said > >> >>>>>>>>>>>>>> in my previous email, "the expected counts on the common > >> >>>>>>>>>>>>>> scale". > >> >>>>>>>>>>>>>> If the > >> >>>>>>>>>>>>>> size factor for a sample is 2, then an expected count of > 1 > >> >>>>>>>>>>>>>> leads > >> >>>>>>>>>>>>>> to an > >> >>>>>>>>>>>>>> expected count of 1/2 on the common scale (after > accounting > >> >>>>>>>>>>>>>> for > >> >>>>>>>>>>>>>> size > >> >>>>>>>>>>>>>> factors). > >> >>>>>>>>>>>>>> > >> >>>>>>>>>>>>>> > >> >>>>>>>>>>>>>> On Sun, Apr 13, 2014 at 11:50 PM, Sophie Josephine Weiss > >> >>>>>>>>>>>>>> <sophie.weiss@colorado.edu> wrote: > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> Hi Mike, > >> >>>>>>>>>>>>>>> Thanks for your reply! Ok, makes sense, but I added 1 > to > >> >>>>>>>>>>>>>>> all my > >> >>>>>>>>>>>>>>> matrix values, so the lowest value in the matrix is 1 - > >> >>>>>>>>>>>>>>> there > >> >>>>>>>>>>>>>>> are still > >> >>>>>>>>>>>>>>> negatives? > >> >>>>>>>>>>>>>>> Thanks again, > >> >>>>>>>>>>>>>>> Sophie > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> On Sun, Apr 13, 2014 at 9:01 PM, Michael Love > >> >>>>>>>>>>>>>>> <michaelisaiahlove@gmail.com> wrote: > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> hi Sophie, > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> The transformations in DESeq and DESeq2 are log2-like > >> >>>>>>>>>>>>>>>> transformations. If the expected count is between 0 and > >> >> 1, > >> >>>>>>>>>>>>>>>> the > >> >>>>>>>>>>>>>>>> values can be > >> >>>>>>>>>>>>>>>> negative, this does not indicate a problem. > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> Mike > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> On Sun, Apr 13, 2014 at 5:17 PM, Sophie Josephine Weiss > >> >>>>>>>>>>>>>>>> <sophie.weiss@colorado.edu> wrote: > >> >>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>> Hello, > >> >>>>>>>>>>>>>>>>> I have microbiome data with no replicates, from > >> >> different > >> >>>>>>>>>>>>>>>>> conditions. I am > >> >>>>>>>>>>>>>>>>> trying to transform the data using the DESeq method, > as > >> >>>>>>>>>>>>>>>>> described > >> >>>>>>>>>>>>>>>>> in > >> >>>>>>>>>>>>>>>>> McMurdie and Holmes 2014. > >> >>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>> The attached file is the definition I am using, as per > >> >> the > >> >>>>>>>>>>>>>>>>> supplemental > >> >>>>>>>>>>>>>>>>> info in McMurdie and Holmes 2014, and the .biom file I > >> >> am > >> >>>>>>>>>>>>>>>>> using. > >> >>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>> Thank you for your help, > >> >>>>>>>>>>>>>>>>> Sophie > >> >>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>> _______________________________________________ > >> >>>>>>>>>>>>>>>>> Bioconductor mailing list > >> >>>>>>>>>>>>>>>>> Bioconductor@r-project.org > >> >>>>>>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >>>>>>>>>>>>>>>>> Search the archives: > >> >>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>> > >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>> > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>> > >> >>>> > >> >>> > >> >> > >> > _______________________________________________ > >> > Bioconductor mailing list > >> > Bioconductor@r-project.org > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > Search the archives: > >> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > [[alternative HTML version deleted]]
ADD COMMENTlink written 5.1 years ago by Sophie Josephine Weiss130
Answer: phyloseq/DESeq gives negative transformed values
0
gravatar for Michael Love
5.1 years ago by
Michael Love23k
United States
Michael Love23k wrote:
Hi Sophie, that's not usual, it's typically less than a minute in my testing. I wonder if you are using version 1.4? You should always include the output of sessionInfo() on software issues, so the maintainer will know what versions of software you are using. Alternatively you can just use the VST if the rlog is slow. Mike Hi Mike, Sorry, I was using DESeq. However, I am switching to DESeq2, and was running the rlog transformation - it seems to take a long time (>5hrs on a laptop), even on an ordinary-sized dataset - is this usual? Thanks again, Sophie On Mon, May 5, 2014 at 6:16 PM, Michael Love <michaelisaiahlove@gmail.com>wrote: > hi Sophie, > > On Mon, May 5, 2014 at 5:24 PM, Sophie Josephine Weiss > <sophie.weiss@colorado.edu> wrote: > > Makes sense, thanks for your help. In the DESeq manual, it looks like > all > > we need to do for e.g. clustering, or pcoa, is the estimateSizeFactors. > Is > > this correct? > > > > Or would it be also ok to use the values from estimateDispersions with > the > > negatives set to zero or constant shifted? I would do the above, but it > > looks like McMurdie et al. do both for their clustering simulation - so > > thought I would ask. > > I don't follow your question. If you want to do clustering or PCA > using DESeq2, I assume you are applying one of the two transformations > we have implemented, as described in the vignette. If size factors are > not already estimated, both transformations will estimate them > internally, likewise for dispersions. > > These transformations return objects with matrices in the assay slot > which are appropriate for calculating distances or PCA. We show > demonstrations of both calculating distance and PCA in the vignette. > > Please look the vignette over again, as this is the recommended usage. > These transformed values have been corrected for size factor. > > You should not use any DESeq2 functions on the matrix of transformed > values. The transformation is the last step within DESeq2, then we > assume the user is doing something "downstream" with these values. > > Mike > > > > > Thanks again! > > Sophie > > > > > > On Thu, Apr 24, 2014 at 6:54 AM, Wolfgang Huber <whuber@embl.de> wrote: > >> > >> Hi Sophie > >> > >> as this issue comes up periodically, let me point out that > >> > >> log (cx) = log(c) + log(x) > >> > >> That means, if you think of ‘x’ as your data matrix and ‘c’ as a single > >> positive number, you can always add or subtract a constant to your > >> transformed data, for instance, to make it more agreeable to you by > having > >> all positive signs, and all that amounts to is an overall scaling > >> (multiplication) of the data on the untransformed scale. > >> An analogous idea applies to the rlog or vst transformations of DESeq2. > >> > >> A reasonable distance metric between samples or genes should probably > not > >> depend on such an overall constant c. > >> > >> Best wishes > >> Wolfgang > >> > >> > >> > >> > >> > >> > >> On 23 Apr 2014, at 23:44, Sophie Josephine Weiss > >> <sophie.weiss@colorado.edu> wrote: > >> > >> > Thanks Michael, > >> > The entire dataset (attached code and .biom) is negatives - there was > an > >> > error of "out of vertex space" as described > >> > here<http: seqanswers.com="" forums="" showthread.php?p="18620">, > >> > so I tried setting maxk=300 as suggested. > >> > Commands are below. > >> > Thanks again! > >> > Sophie > >> > > >> > source("http://bioconductor.org/biocLite.R") > >> > biocLite("phyloseq") > >> > biocLite("DESeq") > >> > > >> > library("phyloseq") > >> > library("DESeq") > >> > library("biom") > >> > > >> > file = "~/Downloads/study_449_closed_reference_otu_table.biom" > >> > x = import_biom(file) > >> > source("~/Downloads/deseq_varstab.R") > >> > DESeq_data = deseq_varstab(x, method = "blind", sharingMode = > "maximum", > >> > fitType = "local", locfit_extra_args=list(maxk=300)) > >> > write_biom(make_biom(DESeq_data@otu_table > >> > ),"~/Desktop/449_Costello_DESeq.biom.tsv") > >> > > >> > > >> > On Sat, Apr 19, 2014 at 11:29 AM, Michael Love > >> > <michaelisaiahlove@gmail.com>wrote: > >> > > >> >> hi Sophie, > >> >> > >> >> You are getting negative values from the transformation for the > >> >> reasons I mentioned earlier, the transformation is log2-like. > >> >> > >> >> If you want to do something downstream of our software which requires > >> >> non-negative values, below is some example code of how to threshold > >> >> negative values for a matrix in R. > >> >> > >> >> The question of what is the best distance to use for taxa counts, or > >> >> whether ANOVA on variance stabilized data is a good idea for taxa > >> >> counts, depends on the properties of the data, and this is an area of > >> >> active research. As I don't have experience analyzing this kind of > >> >> data, I don't want to make any guesses. > >> >> > >> >>> m <- matrix(-2:5, ncol=2) > >> >>> m > >> >> [,1] [,2] > >> >> [1,] -2 2 > >> >> [2,] -1 3 > >> >> [3,] 0 4 > >> >> [4,] 1 5 > >> >>> m[m < 0] <- 0 > >> >>> m > >> >> [,1] [,2] > >> >> [1,] 0 2 > >> >> [2,] 0 3 > >> >> [3,] 0 4 > >> >> [4,] 1 5 > >> >> > >> >> On Fri, Apr 18, 2014 at 3:32 PM, Sophie Josephine Weiss > >> >> <sophie.weiss@colorado.edu> wrote: > >> >>> Hi Mike, > >> >>> Could you please check whether I am running this correctly? I have > >> >> double > >> >>> checked all the parameters, but for some reason, I am getting > >> >>> negatives > >> >>> using the R script on the attached .biom dataset. There are no > >> >> replicates > >> >>> in this microbial dataset. > >> >>> Thanks for your advice, > >> >>> Sophie > >> >>> > >> >>> > >> >>> On Wed, Apr 16, 2014 at 4:02 PM, Sophie Josephine Weiss > >> >>> <sophie.weiss@colorado.edu> wrote: > >> >>>> > >> >>>> Thanks Mike, that is what I thought. What if we wanted to perform > >> >> kruskal > >> >>>> wallis, or is it possible to perform anova on the > variance-stabilized > >> >>>> matrix? > >> >>>> > >> >>>> > >> >>>> On Wed, Apr 16, 2014 at 2:29 PM, Michael Love > >> >>>> <michaelisaiahlove@gmail.com> wrote: > >> >>>>> > >> >>>>> hi Sophie, > >> >>>>> > >> >>>>> We recommend using the standard DESeq() function for differential > >> >>>>> expression. > >> >>>>> > >> >>>>> This is mentioned in the first line of the vignette section on > >> >>>>> transformations: > >> >>>>> > >> >>>>> "In order to test for diff erential expression, we operate on raw > >> >>>>> counts and use discrete distributions as > >> >>>>> described in the previous section" > >> >>>>> > >> >>>>> Also, in the McMurdie and Holmes, they are using the DESeq() > >> >>>>> function, > >> >>>>> as shown in their supplemental material: > >> >>>>> > >> >>>>> > >> >>>>> > >> >> > >> >> > http://joey711.github.io/waste-not-supplemental/simulation- differential-abundance/simulation-differential-abundance-server.html > >> >>>>> > >> >>>>> On Wed, Apr 16, 2014 at 3:22 PM, Sophie Josephine Weiss > >> >>>>> <sophie.weiss@colorado.edu> wrote: > >> >>>>>> Please help with this? Thanks again. > >> >>>>>> > >> >>>>>> > >> >>>>>> On Mon, Apr 14, 2014 at 6:02 PM, Sophie Josephine Weiss > >> >>>>>> <sophie.weiss@colorado.edu> wrote: > >> >>>>>>> > >> >>>>>>> Thanks again Mike - would it be ok to do chi-2 and other > >> >> significance > >> >>>>>>> tests on the DESeq transformed datasets using independent code, > or > >> >> is > >> >>>>>>> it > >> >>>>>>> necessary to do the differential expression tests strictly > within > >> >>>>>>> DESeq2? > >> >>>>>>> > >> >>>>>>> Sophie > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> On Mon, Apr 14, 2014 at 5:41 PM, Michael Love > >> >>>>>>> <michaelisaiahlove@gmail.com> wrote: > >> >>>>>>>> > >> >>>>>>>> hi Sophie, > >> >>>>>>>> > >> >>>>>>>> The VST code is the same in DESeq and DESeq2. The estimation of > >> >>>>>>>> dispersion is slightly different (details are in the vignette > >> >>>>>>>> "Changes > >> >>>>>>>> from DESeq to DESeq2"), but the fitted line (which is used by > the > >> >>>>>>>> VST) > >> >>>>>>>> should be very similar. > >> >>>>>>>> > >> >>>>>>>> Mike > >> >>>>>>>> > >> >>>>>>>> On Mon, Apr 14, 2014 at 6:27 PM, Sophie Josephine Weiss > >> >>>>>>>> <sophie.weiss@colorado.edu> wrote: > >> >>>>>>>>> Hi Mike, > >> >>>>>>>>> The McMurdie and Holmes paper uses DESeq for matrix > >> >> normalization - > >> >>>>>>>>> do > >> >>>>>>>>> you > >> >>>>>>>>> think that is ok, or would it be better to use DESeq 2? > >> >>>>>>>>> Thanks again, > >> >>>>>>>>> Sophie > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> On Mon, Apr 14, 2014 at 3:40 PM, Michael Love > >> >>>>>>>>> <michaelisaiahlove@gmail.com> > >> >>>>>>>>> wrote: > >> >>>>>>>>>> > >> >>>>>>>>>> hi Sophie, > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> On Mon, Apr 14, 2014 at 1:15 PM, Sophie Josephine Weiss > >> >>>>>>>>>> <sophie.weiss@colorado.edu> wrote: > >> >>>>>>>>>>> > >> >>>>>>>>>>> Hi Mike, > >> >>>>>>>>>>> Thanks for the references. By "threshold at 0" do you mean > >> >> set > >> >>>>>>>>>>> any > >> >>>>>>>>>>> negative values equal to 0? > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> yes. > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> Do you think this is the best approach? > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> I haven't explored this area, and would defer to the McMurdie > >> >> and > >> >>>>>>>>>> Holmes paper for the best combinations of distance and > >> >>>>>>>>>> transformation. > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> Thanks again, > >> >>>>>>>>>>> Sophie > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> On Mon, Apr 14, 2014 at 11:01 AM, Michael Love > >> >>>>>>>>>>> <michaelisaiahlove@gmail.com> wrote: > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> I tried poking around here > >> >>>>>>>>>>>> http://joey711.github.io/phyloseq/distance > >> >>>>>>>>>>>> but couldn't see if the authors did anything for distances > >> >>>>>>>>>>>> requiring > >> >>>>>>>>>>>> non-negative data. It appears > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >> > >> >> > http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.p cbi.1003531 > >> >>>>>>>>>>>> that VST was tested with Bray-Curtis distance. I think the > >> >>>>>>>>>>>> distance > >> >>>>>>>>>>>> is > >> >>>>>>>>>>>> designed for counts, but you could always threshold at 0 to > >> >>>>>>>>>>>> insist > >> >>>>>>>>>>>> that the > >> >>>>>>>>>>>> log2-like quantity act more like a count. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> On Mon, Apr 14, 2014 at 12:23 PM, Sophie Josephine Weiss > >> >>>>>>>>>>>> <sophie.weiss@colorado.edu> wrote: > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> Hi Mike, > >> >>>>>>>>>>>>> Thanks for explaining more. I am used to working with > >> >>>>>>>>>>>>> rarefied > >> >>>>>>>>>>>>> microbial datasets, that is why. Instead of rarefying I > >> >> would > >> >>>>>>>>>>>>> like to use > >> >>>>>>>>>>>>> the DESeq method. > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> How would you then suggest going about calculating > >> >> bray-curtis > >> >>>>>>>>>>>>> distance, or summarized taxa diagrams with these new > >> >>>>>>>>>>>>> transformed > >> >>>>>>>>>>>>> matrices > >> >>>>>>>>>>>>> with negative values? > >> >>>>>>>>>>>>> Thanks again, > >> >>>>>>>>>>>>> Sophie > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> On Mon, Apr 14, 2014 at 7:17 AM, Michael Love > >> >>>>>>>>>>>>> <michaelisaiahlove@gmail.com> wrote: > >> >>>>>>>>>>>>>> > >> >>>>>>>>>>>>>> hi Sophie, > >> >>>>>>>>>>>>>> > >> >>>>>>>>>>>>>> Can you explain why you don't want negative values in the > >> >>>>>>>>>>>>>> transformed > >> >>>>>>>>>>>>>> values? Adding one to the raw counts is not sufficient. > I > >> >>>>>>>>>>>>>> should > >> >>>>>>>>>>>>>> have said > >> >>>>>>>>>>>>>> in my previous email, "the expected counts on the common > >> >>>>>>>>>>>>>> scale". > >> >>>>>>>>>>>>>> If the > >> >>>>>>>>>>>>>> size factor for a sample is 2, then an expected count of > 1 > >> >>>>>>>>>>>>>> leads > >> >>>>>>>>>>>>>> to an > >> >>>>>>>>>>>>>> expected count of 1/2 on the common scale (after > accounting > >> >>>>>>>>>>>>>> for > >> >>>>>>>>>>>>>> size > >> >>>>>>>>>>>>>> factors). > >> >>>>>>>>>>>>>> > >> >>>>>>>>>>>>>> > >> >>>>>>>>>>>>>> On Sun, Apr 13, 2014 at 11:50 PM, Sophie Josephine Weiss > >> >>>>>>>>>>>>>> <sophie.weiss@colorado.edu> wrote: > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> Hi Mike, > >> >>>>>>>>>>>>>>> Thanks for your reply! Ok, makes sense, but I added 1 > to > >> >>>>>>>>>>>>>>> all my > >> >>>>>>>>>>>>>>> matrix values, so the lowest value in the matrix is 1 - > >> >>>>>>>>>>>>>>> there > >> >>>>>>>>>>>>>>> are still > >> >>>>>>>>>>>>>>> negatives? > >> >>>>>>>>>>>>>>> Thanks again, > >> >>>>>>>>>>>>>>> Sophie > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> On Sun, Apr 13, 2014 at 9:01 PM, Michael Love > >> >>>>>>>>>>>>>>> <michaelisaiahlove@gmail.com> wrote: > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> hi Sophie, > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> The transformations in DESeq and DESeq2 are log2-like > >> >>>>>>>>>>>>>>>> transformations. If the expected count is between 0 and > >> >> 1, > >> >>>>>>>>>>>>>>>> the > >> >>>>>>>>>>>>>>>> values can be > >> >>>>>>>>>>>>>>>> negative, this does not indicate a problem. > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> Mike > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> On Sun, Apr 13, 2014 at 5:17 PM, Sophie Josephine Weiss > >> >>>>>>>>>>>>>>>> <sophie.weiss@colorado.edu> wrote: > >> >>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>> Hello, > >> >>>>>>>>>>>>>>>>> I have microbiome data with no replicates, from > >> >> different > >> >>>>>>>>>>>>>>>>> conditions. I am > >> >>>>>>>>>>>>>>>>> trying to transform the data using the DESeq method, > as > >> >>>>>>>>>>>>>>>>> described > >> >>>>>>>>>>>>>>>>> in > >> >>>>>>>>>>>>>>>>> McMurdie and Holmes 2014. > >> >>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>> The attached file is the definition I am using, as per > >> >> the > >> >>>>>>>>>>>>>>>>> supplemental > >> >>>>>>>>>>>>>>>>> info in McMurdie and Holmes 2014, and the .biom file I > >> >> am > >> >>>>>>>>>>>>>>>>> using. > >> >>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>> Thank you for your help, > >> >>>>>>>>>>>>>>>>> Sophie > >> >>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>> _______________________________________________ > >> >>>>>>>>>>>>>>>>> Bioconductor mailing list > >> >>>>>>>>>>>>>>>>> Bioconductor@r-project.org > >> >>>>>>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >>>>>>>>>>>>>>>>> Search the archives: > >> >>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>> > >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>> > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>> > >> >>>> > >> >>> > >> >> > >> > _______________________________________________ > >> > Bioconductor mailing list > >> > Bioconductor@r-project.org > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > Search the archives: > >> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > [[alternative HTML version deleted]]
ADD COMMENTlink written 5.1 years ago by Michael Love23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 138 users visited in the last hour