Please, could you help me. I have some doubts about the DESeq2 differential analysis.
We were performing a DEA analysis (report: http://rpubs.com/tiagochst/DESeq2) between some cell lines, and the gene log2FC for the ST13P12 gene caught our attention.
In the section ST13P12 of the report you can see that group E has the higher average between all groups, while group A has 0 counts for all samples. When we perform the DEA using as reference the group "A"
Since the vignette says "estimates are of the logarithmic fold change log2(treated/untreated)" for condition treated vs untreated. I would expect E vs A to be log2(E/A) > 0 in the results group2_E_vs_A object. However, is is negative (-29.72).
Please, am I wrong ? Is my code wrong? Or there is an explanation for that change of signal ?
Thanks!
Yes. But:
- DEA section, second chunk, third line. I use E as reference
- DEA section, forth chunk, first line. I use A as reference
But E is more expressed than A for gene ST13P12 (section Gene ST13P12,
avg.count), and setting E as reference gives me log2FC > 0. It should be the opposite no ?
(DEA section, third chunk output)There is a bit of re-ordering going on with
all_cell_lines_ordered
andMetaData_File
, and it's hard for me to keep track. Can you just doSo the plot is correlated with results. But is it possible that a 0 count for all samples in a group increase if normalized ? It seems the number of samples in each group are correct. http://rpubs.com/tiagochst/392569
I think you are saying that the plot goes along with the results table. That's how I see it as well.
This means that there is a sample mixup in how you are preparing the data. I had a hard time following what was going on in these lines, so what I'd recommend is to use very simple R code when you provide counts and metadata to DESeqDataSet constructors, and use checks to make sure these are in the same order.
You can also check by looking at
cbind(counts(dds)[gene.id,], dds$group2)
to see if it confirms how you think the samples should line up with the counts.I'm scanning your code and the counts and metadata are not in the same order when you provide them to the DESeqDataSet constructor. You need to make sure these are in the same order.
Thanks! I thought it was going to do the same thing Summarized Experiment does, it checks if rownames and colnames match (even though my code was not going to work since rownames were empty)
DESeq2 does check this, and will give an error with a mis-match:
But that's weird because I did not get that. And the rownames and colnames were different.
Like SummarizedExperiment, it only works (you get a useful error) if you have rownames on the colData.
Neither DESeq2 nor SummarizedExperiment will complain if they don't exist. DESeq2 will give the SummarizedExperiment error ("assay colnames() must be NULL or equal colData rownames()") if the rownames exist and are entirely different. This is because it calls SummarizedExperiment directly.
DESeq2 will give the above extra useful error if the rownames exist and are the same values but in a wrong order.
Thank you very much for the help!
I added the data here: https://drive.google.com/drive/folders/1p5CDMR2OOMCbT2WXRNtNRUrv4f3FhCNo?usp=sharing