Question: phyloseq/DESeq gives negative transformed values

0

Michael Love ♦

**25k**wrote:hi Sophie,
On Mon, May 5, 2014 at 5:24 PM, Sophie Josephine Weiss
<sophie.weiss at="" colorado.edu=""> wrote:
> Makes sense, thanks for your help. In the DESeq manual, it looks
like all
> we need to do for e.g. clustering, or pcoa, is the
estimateSizeFactors. Is
> this correct?
>
> Or would it be also ok to use the values from estimateDispersions
with the
> negatives set to zero or constant shifted? I would do the above,
but it
> looks like McMurdie et al. do both for their clustering simulation -
so
> thought I would ask.
I don't follow your question. If you want to do clustering or PCA
using DESeq2, I assume you are applying one of the two transformations
we have implemented, as described in the vignette. If size factors are
not already estimated, both transformations will estimate them
internally, likewise for dispersions.
These transformations return objects with matrices in the assay slot
which are appropriate for calculating distances or PCA. We show
demonstrations of both calculating distance and PCA in the vignette.
Please look the vignette over again, as this is the recommended usage.
These transformed values have been corrected for size factor.
You should not use any DESeq2 functions on the matrix of transformed
values. The transformation is the last step within DESeq2, then we
assume the user is doing something "downstream" with these values.
Mike
>
> Thanks again!
> Sophie
>
>
> On Thu, Apr 24, 2014 at 6:54 AM, Wolfgang Huber <whuber at="" embl.de="">
wrote:
>>
>> Hi Sophie
>>
>> as this issue comes up periodically, let me point out that
>>
>> log (cx) = log(c) + log(x)
>>
>> That means, if you think of ?x? as your data matrix and ?c? as a
single
>> positive number, you can always add or subtract a constant to your
>> transformed data, for instance, to make it more agreeable to you by
having
>> all positive signs, and all that amounts to is an overall scaling
>> (multiplication) of the data on the untransformed scale.
>> An analogous idea applies to the rlog or vst transformations of
DESeq2.
>>
>> A reasonable distance metric between samples or genes should
probably not
>> depend on such an overall constant c.
>>
>> Best wishes
>> Wolfgang
>>
>>
>>
>>
>>
>>
>> On 23 Apr 2014, at 23:44, Sophie Josephine Weiss
>> <sophie.weiss at="" colorado.edu=""> wrote:
>>
>> > Thanks Michael,
>> > The entire dataset (attached code and .biom) is negatives - there
was an
>> > error of "out of vertex space" as described
>> > here<http: seqanswers.com="" forums="" showthread.php?p="18620">,
>> > so I tried setting maxk=300 as suggested.
>> > Commands are below.
>> > Thanks again!
>> > Sophie
>> >
>> > source("http://bioconductor.org/biocLite.R")
>> > biocLite("phyloseq")
>> > biocLite("DESeq")
>> >
>> > library("phyloseq")
>> > library("DESeq")
>> > library("biom")
>> >
>> > file = "~/Downloads/study_449_closed_reference_otu_table.biom"
>> > x = import_biom(file)
>> > source("~/Downloads/deseq_varstab.R")
>> > DESeq_data = deseq_varstab(x, method = "blind", sharingMode =
"maximum",
>> > fitType = "local", locfit_extra_args=list(maxk=300))
>> > write_biom(make_biom(DESeq_data at otu_table
>> > ),"~/Desktop/449_Costello_DESeq.biom.tsv")
>> >
>> >
>> > On Sat, Apr 19, 2014 at 11:29 AM, Michael Love
>> > <michaelisaiahlove at="" gmail.com="">wrote:
>> >
>> >> hi Sophie,
>> >>
>> >> You are getting negative values from the transformation for the
>> >> reasons I mentioned earlier, the transformation is log2-like.
>> >>
>> >> If you want to do something downstream of our software which
requires
>> >> non-negative values, below is some example code of how to
threshold
>> >> negative values for a matrix in R.
>> >>
>> >> The question of what is the best distance to use for taxa
counts, or
>> >> whether ANOVA on variance stabilized data is a good idea for
taxa
>> >> counts, depends on the properties of the data, and this is an
area of
>> >> active research. As I don't have experience analyzing this kind
of
>> >> data, I don't want to make any guesses.
>> >>
>> >>> m <- matrix(-2:5, ncol=2)
>> >>> m
>> >> [,1] [,2]
>> >> [1,] -2 2
>> >> [2,] -1 3
>> >> [3,] 0 4
>> >> [4,] 1 5
>> >>> m[m < 0] <- 0
>> >>> m
>> >> [,1] [,2]
>> >> [1,] 0 2
>> >> [2,] 0 3
>> >> [3,] 0 4
>> >> [4,] 1 5
>> >>
>> >> On Fri, Apr 18, 2014 at 3:32 PM, Sophie Josephine Weiss
>> >> <sophie.weiss at="" colorado.edu=""> wrote:
>> >>> Hi Mike,
>> >>> Could you please check whether I am running this correctly? I
have
>> >> double
>> >>> checked all the parameters, but for some reason, I am getting
>> >>> negatives
>> >>> using the R script on the attached .biom dataset. There are no
>> >> replicates
>> >>> in this microbial dataset.
>> >>> Thanks for your advice,
>> >>> Sophie
>> >>>
>> >>>
>> >>> On Wed, Apr 16, 2014 at 4:02 PM, Sophie Josephine Weiss
>> >>> <sophie.weiss at="" colorado.edu=""> wrote:
>> >>>>
>> >>>> Thanks Mike, that is what I thought. What if we wanted to
perform
>> >> kruskal
>> >>>> wallis, or is it possible to perform anova on the variance-
stabilized
>> >>>> matrix?
>> >>>>
>> >>>>
>> >>>> On Wed, Apr 16, 2014 at 2:29 PM, Michael Love
>> >>>> <michaelisaiahlove at="" gmail.com=""> wrote:
>> >>>>>
>> >>>>> hi Sophie,
>> >>>>>
>> >>>>> We recommend using the standard DESeq() function for
differential
>> >>>>> expression.
>> >>>>>
>> >>>>> This is mentioned in the first line of the vignette section
on
>> >>>>> transformations:
>> >>>>>
>> >>>>> "In order to test for diff erential expression, we operate on
raw
>> >>>>> counts and use discrete distributions as
>> >>>>> described in the previous section"
>> >>>>>
>> >>>>> Also, in the McMurdie and Holmes, they are using the DESeq()
>> >>>>> function,
>> >>>>> as shown in their supplemental material:
>> >>>>>
>> >>>>>
>> >>>>>
>> >>
>> >> http://joey711.github.io/waste-not-supplemental/simulation-
differential-abundance/simulation-differential-abundance-server.html
>> >>>>>
>> >>>>> On Wed, Apr 16, 2014 at 3:22 PM, Sophie Josephine Weiss
>> >>>>> <sophie.weiss at="" colorado.edu=""> wrote:
>> >>>>>> Please help with this? Thanks again.
>> >>>>>>
>> >>>>>>
>> >>>>>> On Mon, Apr 14, 2014 at 6:02 PM, Sophie Josephine Weiss
>> >>>>>> <sophie.weiss at="" colorado.edu=""> wrote:
>> >>>>>>>
>> >>>>>>> Thanks again Mike - would it be ok to do chi-2 and other
>> >> significance
>> >>>>>>> tests on the DESeq transformed datasets using independent
code, or
>> >> is
>> >>>>>>> it
>> >>>>>>> necessary to do the differential expression tests strictly
within
>> >>>>>>> DESeq2?
>> >>>>>>>
>> >>>>>>> Sophie
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Mon, Apr 14, 2014 at 5:41 PM, Michael Love
>> >>>>>>> <michaelisaiahlove at="" gmail.com=""> wrote:
>> >>>>>>>>
>> >>>>>>>> hi Sophie,
>> >>>>>>>>
>> >>>>>>>> The VST code is the same in DESeq and DESeq2. The
estimation of
>> >>>>>>>> dispersion is slightly different (details are in the
vignette
>> >>>>>>>> "Changes
>> >>>>>>>> from DESeq to DESeq2"), but the fitted line (which is used
by the
>> >>>>>>>> VST)
>> >>>>>>>> should be very similar.
>> >>>>>>>>
>> >>>>>>>> Mike
>> >>>>>>>>
>> >>>>>>>> On Mon, Apr 14, 2014 at 6:27 PM, Sophie Josephine Weiss
>> >>>>>>>> <sophie.weiss at="" colorado.edu=""> wrote:
>> >>>>>>>>> Hi Mike,
>> >>>>>>>>> The McMurdie and Holmes paper uses DESeq for matrix
>> >> normalization -
>> >>>>>>>>> do
>> >>>>>>>>> you
>> >>>>>>>>> think that is ok, or would it be better to use DESeq 2?
>> >>>>>>>>> Thanks again,
>> >>>>>>>>> Sophie
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Mon, Apr 14, 2014 at 3:40 PM, Michael Love
>> >>>>>>>>> <michaelisaiahlove at="" gmail.com="">
>> >>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> hi Sophie,
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Mon, Apr 14, 2014 at 1:15 PM, Sophie Josephine Weiss
>> >>>>>>>>>> <sophie.weiss at="" colorado.edu=""> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>> Hi Mike,
>> >>>>>>>>>>> Thanks for the references. By "threshold at 0" do you
mean
>> >> set
>> >>>>>>>>>>> any
>> >>>>>>>>>>> negative values equal to 0?
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> yes.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Do you think this is the best approach?
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> I haven't explored this area, and would defer to the
McMurdie
>> >> and
>> >>>>>>>>>> Holmes paper for the best combinations of distance and
>> >>>>>>>>>> transformation.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Thanks again,
>> >>>>>>>>>>> Sophie
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Mon, Apr 14, 2014 at 11:01 AM, Michael Love
>> >>>>>>>>>>> <michaelisaiahlove at="" gmail.com=""> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I tried poking around here
>> >>>>>>>>>>>> http://joey711.github.io/phyloseq/distance
>> >>>>>>>>>>>> but couldn't see if the authors did anything for
distances
>> >>>>>>>>>>>> requiring
>> >>>>>>>>>>>> non-negative data. It appears
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>
>> >> http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjourn
al.pcbi.1003531
>> >>>>>>>>>>>> that VST was tested with Bray-Curtis distance. I think
the
>> >>>>>>>>>>>> distance
>> >>>>>>>>>>>> is
>> >>>>>>>>>>>> designed for counts, but you could always threshold at
0 to
>> >>>>>>>>>>>> insist
>> >>>>>>>>>>>> that the
>> >>>>>>>>>>>> log2-like quantity act more like a count.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Mon, Apr 14, 2014 at 12:23 PM, Sophie Josephine
Weiss
>> >>>>>>>>>>>> <sophie.weiss at="" colorado.edu=""> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Hi Mike,
>> >>>>>>>>>>>>> Thanks for explaining more. I am used to working
with
>> >>>>>>>>>>>>> rarefied
>> >>>>>>>>>>>>> microbial datasets, that is why. Instead of
rarefying I
>> >> would
>> >>>>>>>>>>>>> like to use
>> >>>>>>>>>>>>> the DESeq method.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> How would you then suggest going about calculating
>> >> bray-curtis
>> >>>>>>>>>>>>> distance, or summarized taxa diagrams with these new
>> >>>>>>>>>>>>> transformed
>> >>>>>>>>>>>>> matrices
>> >>>>>>>>>>>>> with negative values?
>> >>>>>>>>>>>>> Thanks again,
>> >>>>>>>>>>>>> Sophie
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Mon, Apr 14, 2014 at 7:17 AM, Michael Love
>> >>>>>>>>>>>>> <michaelisaiahlove at="" gmail.com=""> wrote:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> hi Sophie,
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Can you explain why you don't want negative values
in the
>> >>>>>>>>>>>>>> transformed
>> >>>>>>>>>>>>>> values? Adding one to the raw counts is not
sufficient. I
>> >>>>>>>>>>>>>> should
>> >>>>>>>>>>>>>> have said
>> >>>>>>>>>>>>>> in my previous email, "the expected counts on the
common
>> >>>>>>>>>>>>>> scale".
>> >>>>>>>>>>>>>> If the
>> >>>>>>>>>>>>>> size factor for a sample is 2, then an expected
count of 1
>> >>>>>>>>>>>>>> leads
>> >>>>>>>>>>>>>> to an
>> >>>>>>>>>>>>>> expected count of 1/2 on the common scale (after
accounting
>> >>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>> size
>> >>>>>>>>>>>>>> factors).
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Sun, Apr 13, 2014 at 11:50 PM, Sophie Josephine
Weiss
>> >>>>>>>>>>>>>> <sophie.weiss at="" colorado.edu=""> wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Hi Mike,
>> >>>>>>>>>>>>>>> Thanks for your reply! Ok, makes sense, but I
added 1 to
>> >>>>>>>>>>>>>>> all my
>> >>>>>>>>>>>>>>> matrix values, so the lowest value in the matrix is
1 -
>> >>>>>>>>>>>>>>> there
>> >>>>>>>>>>>>>>> are still
>> >>>>>>>>>>>>>>> negatives?
>> >>>>>>>>>>>>>>> Thanks again,
>> >>>>>>>>>>>>>>> Sophie
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On Sun, Apr 13, 2014 at 9:01 PM, Michael Love
>> >>>>>>>>>>>>>>> <michaelisaiahlove at="" gmail.com=""> wrote:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> hi Sophie,
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> The transformations in DESeq and DESeq2 are
log2-like
>> >>>>>>>>>>>>>>>> transformations. If the expected count is between
0 and
>> >> 1,
>> >>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>> values can be
>> >>>>>>>>>>>>>>>> negative, this does not indicate a problem.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Mike
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> On Sun, Apr 13, 2014 at 5:17 PM, Sophie Josephine
Weiss
>> >>>>>>>>>>>>>>>> <sophie.weiss at="" colorado.edu=""> wrote:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Hello,
>> >>>>>>>>>>>>>>>>> I have microbiome data with no replicates, from
>> >> different
>> >>>>>>>>>>>>>>>>> conditions. I am
>> >>>>>>>>>>>>>>>>> trying to transform the data using the DESeq
method, as
>> >>>>>>>>>>>>>>>>> described
>> >>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>> McMurdie and Holmes 2014.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> The attached file is the definition I am using,
as per
>> >> the
>> >>>>>>>>>>>>>>>>> supplemental
>> >>>>>>>>>>>>>>>>> info in McMurdie and Holmes 2014, and the .biom
file I
>> >> am
>> >>>>>>>>>>>>>>>>> using.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Thank you for your help,
>> >>>>>>>>>>>>>>>>> Sophie
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> _______________________________________________
>> >>>>>>>>>>>>>>>>> Bioconductor mailing list
>> >>>>>>>>>>>>>>>>> Bioconductor at r-project.org
>> >>>>>>>>>>>>>>>>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >>>>>>>>>>>>>>>>> Search the archives:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>

ADD COMMENT
• link
•
modified 5.5 years ago
by
Sophie Josephine Weiss •

**130**• written 5.5 years ago by Michael Love ♦**25k**