I want to check contrasts between certain samples in my dataset, and I was told that for DESeq2 we need to use raw data that is normalized. I don't have real data to practice this on but I have been going through the code snippets I obtained trying to understand it better. However, I am confused by one thing here.
I've read that DESwq2 is to be used on unnormalized data so it can properly estimate size factors. So was the advice I was told earlier where it should not be used on raw data incorrect?
In any case, DESeq2 will normalize the data afterwards (so this leads me to think that for my first question, data needs to be raw counts, but when running contrasts (which is what I will be doing), I see the code does not use the normalized counts dataframe, but instead calls
dds
which isn't normalized yet. The code is below. Keeping in mind I will be using this for species data which tends to be very 0-inflated.
dds <- DESeqDataSetFromMatrix(countData = some.counts.data,
colData = some.env.coldata,
design = ~ condition)
dds <- estimateSizeFactors(dds, type="poscounts") # for 0-inflated data use poscounts
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)
dds
normalized_counts <- counts(dds, normalized=TRUE) # can extract this as a dataframe later on
#Run below when comparing between groups of interest
contrast1 <- as.data.frame(results(dds, contrast=c("condition", "CONDITION1", "CONTROL")))
When running the contrasts and calling dds
, the counts are not normalized at this point right? So would this be incorrect? Shouldn't contrasts be done after everything is normalized so it's properly compared?
Ok, so the comment I was given where they said DeSeq2 should be with normalized counts is incorrect.
the
normalized_counts
variable is then just for other visualizations as you mentioned, just for plotting.