Question: TMM Normalization: DESeqDataSet: some values in assay are negative
0
gravatar for neekonsu
4 months ago by
neekonsu0
neekonsu0 wrote:

Hi all,

I am working with the publicly available RNA-seq data from the GTEx database at https://storage.googleapis.com/gtexanalysisv7/rnaseqdata/GTExAnalysis2016-01-15v7RNASeQCv1.1.8genereads.gct.gz

I have normalized the count data using EdgeR's calcNormFactors() and cpm(x, log=TRUE) functions, and I am trying to run my differential analysis with DESeq. The DESeqDataSetFromMatrix() function returns "some values in assay are negative" after passing the normalized counts into the function, and I am not sure how to mitigate this error.

Is it possible for me to use the normalized data with DESeq(2), and if so, I would love to see how. Attached please see the full pipeline, and I appreciate all help greatly!

Thanks,

Neekon

#

countdata <- unname(t(data.frame)) head(countdata, 10)

coldata["condition"] = condition coldata["color"] = condition.color coldata["cluster"] = condition.cluster head(coldata, 10)

head(condition, 10)

y <- DGEList(counts=countdata) keep <- filterByExpr(y) y <- y[keep, , keep.lib.sizes=FALSE] y <- calcNormFactors(y) data.scaled <- cpm(y, log=TRUE)

fvizpcaind(df.pca, label="none", habillage = condition.color, geom.ind="point")

dds <- DESeqDataSetFromMatrix(countData = t(data.scaled), colData = coldata, design= ~ condition)

dds.color <- DESeqDataSetFromMatrix(data.scaled = t(data.scaled), colData = coldata, design= ~ color)

dds.cluster <- DESeqDataSetFromMatrix(countData = t(data.scaled), colData = coldata, design= ~ cluster)

DESeq(dds)

DESeq(dds.color)

DESeq(dds.cluster)

res <- results(dds, name = "results") summary(res)

res.color <- results(dds.color, name = "results.color") summary(res.color)

res.cluster <- results(dds.cluster, name = "results.cluster") summary(res.cluster)

normalization deseq deseq2 • 243 views
ADD COMMENTlink modified 4 months ago by wunderl20 • written 4 months ago by neekonsu0
Answer: TMM Normalization: DESeqDataSet: some values in assay are negative
4
gravatar for Simon Anders
4 months ago by
Simon Anders3.6k
Zentrum für Molekularbiologie, Universität Heidelberg
Simon Anders3.6k wrote:

Your question is asked quite often, strangely enough, so let me first repeat our standard answer:

DESeq2 wants raw unnormalized counts. Do not supply anything else, ever.

Obviously, the easiest would be to to run the whole analysis (normalization and DE testing) using either only edgeR or only DESeq2.

It is possible to mix both, by extracting the normalization coefficients from edgeR and handing them over to DESeq2. If you really need that, I can dig out how to do that, but I then would be curious why. This is a very unusual approach, and unless you know very well what you are doing and why, I would advise against it.

ADD COMMENTlink written 4 months ago by Simon Anders3.6k

That is great to know, thanks for the speedy answer! The reason I was looking to mix both functions was because I am comparing pipelines, between EdgeR and DESeq(2), and I wanted to keep the normalization step constant in my comparison. However, since I was comparing for performance measure, I think that I will stick with the recommended implementation of DESeq, although I would hugely appreciate it if you could also tell me how I could use norm-factors from edgeR in DESeq -- it's not a huge deal but I am curious. Thanks again for your advice!

-Neekon ,

ADD REPLYlink written 4 months ago by neekonsu0

If you want to truly compare pipeline performance, you should let each method do the normalization it's own way. Each method has been designed with its own philosophy and underlying assumptions, so mixing parts of one method with another is likely going to give you sub-optimal performance. (or, if you have enough time, you could test both mixed and unmixed and see for yourself).

ADD REPLYlink written 4 months ago by wunderl20
Answer: TMM Normalization: DESeqDataSet: some values in assay are negative
0
gravatar for wunderl
4 months ago by
wunderl20
wunderl20 wrote:

You are likely getting this error because of log=TRUE in your call to cpm(). The log of anything less than 1 will be negative, so if after normalization you end up with any fractional counts they will produce a negative number when you take the log. This issue also occurs with DESeq2's rlog transformation.

As was mentioned in the other response, DESeq will only work with raw, unnormalized counts. For information on why, see here.

If you really want to use normalization factors from edgeR, then you still need to provide DESeq with the raw counts and then provide the normalization factors separately.

I am not familiar with edgeR, so how you provide the normalization factors will depend on if edgeR returns per-sample based normalization factors (a vector with one entry for each sample) or if they are per-gene normalization factors (a matrix in the form gene X samples). For official documentation on this process, see here.

Sample based normalization factors

dds = DESeqDataSetFromMatrix(counts=rawCounts,.....)
sizeFactors(dds) = normalizationFactorsFromEdgeR

Gene based normalization factors

dds = DESeqDataSetFromMatrix(counts=rawCounts,.....)
normalizationFactors(dds) = normalizationFactorsFromEdgeR

Note: The authors of DESeq recommend transforming the normalization factors so that the geometric mean of each row (ie gene) is 1, so that the mean of normalized counts for a gene is close to the mean of the unnormalized counts. If you want to follow this recommendation, then you would do the following:

dds = DESeqDataSetFromMatrix(counts=rawCounts,.....)

normFactors = normalizationFactorsFromEdgeR
normFactors = normFactors / exp(rowMeans(log(normFactors)))

normalizationFactors(dds) = normFactors

After setting the normalization factors using one of the methods above, continue with your regular down-stream analysis and DESeq should use the provided normalization factors instead of calculating its own. I recommend checking the help pages for normalizationFactors and sizeFactors if you need more information on how each function works.

ADD COMMENTlink modified 4 months ago • written 4 months ago by wunderl20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 229 users visited in the last hour