Question: DESeq2 Normalized Counts for Mfuzz
0
10 months ago by
rbenel0
rbenel0 wrote:

Hi,

I would like to preform temporal patterning using the package Mfuzz which assumes the given expression data are fully pre-processed including data normalization and averaging of replicates.

As @Michael Love explained in this question expression profiles between DESeq2 and Mfuzz "the normalized counts in DESeq2 are just provided for visualization, they aren't actually used in the DESeq2 model which uses all samples on the original counts scale and keeps track of size factors on the right-hand side of the equations as shown in the DESeq2 paper".

So, is performing one of DESeq2's transformations and then adjusting for a covariate, using limma's removeBatchEffect() function on the transformed values, as shown here How do I extract read counts from DESeq2 still the most effective way to extract normalized counts from a dds object for downstream use?

Thanks!

rnaseq mfuzz deseq2 • 324 views
modified 10 months ago by Michael Love24k • written 10 months ago by rbenel0
Answer: DESeq2 Normalized Counts for Mfuzz
0
10 months ago by
Michael Love24k
United States
Michael Love24k wrote:

This really depends on the downstream use.

Count based methods will want unnormalized counts. If a method wants normalized, variance stabilized transformations we provide these, see the transformations section of the vignette. Normalized counts are useful for visualization mostly.

OK. This specific method wants normalized counts, and since the sample size is n < 30, in the vignette you suggest to use rlog.  So if my design = ~ replicate + day it is in my best interest to use mat <- assay(rld)

mat2 <- removeBatchEffect(mat, rld\$replicate)?

and ​then use mat2 as an input for Mfuzz?

If it wants normalized counts, you don't want to use log-transformed, VST or rlog counts.

I guess you should ask the Mfuzz developers how to deal with batch. I can't give too much advice because I don't know what they expect as input.

OK, thanks for the input. I tagged Mfuzz as well, but not every developer is as active as you are ;)

On the topic of downstream analysis and normalized counts, if I wanted to look at the correlation between genes, (at the expression level), is using one of the transformed counts appropriate?

as always, thank you for your time!

1

I would look at correlation on the log2 scale. I tend to use vst() for transformation, and if I want to remove batches, I use removeBatchEffect on the matrix in assay(vsd).

Just to update anyone who might be interested in using the Mfuzz package on normalized DESeq2 counts:

From the developers of Mfuzz: "For RNA-seq, you could input log-transformed normalised cpm and then use the standardise function. If you are looking specifically for peaks, you can spare the log transformation, since it reduces the dynamic range... TPM would also be OK, as TPM and CPM only differ by the inverse length of the transcript since Mfuzz::standardise() divides through the standard deviation for each gene, this factor is taken out again."

Hi, rbenel

Could you tell how did you import normalized objects (rld or vsd) to Mfuzz? Just eset <- new('ExpressionSet', exprs=mat2, phenoData=phenodf)? But if so, how did you include group information?

It was mentioned in GOexpress: use DESeq2 data object as input

I am not sure about the link you posted, but I don't import directly the vsd object, after correcting for batch effects, I save the object as .RData and then when I worked with MFuzz I created an ExpressionSet like I showed below.

#The clustering is based soley on the exprs matrix and no information is used from the phenoData.
#so it is possible to just create the expression set using biobase (see below)

library(MFuzz)
library(Biobase)

exp.set <- ExpressionSet(assayData = as.matrix(vsd.matrix))