Dear DESeq2 community,
I am trying to use DESeq2 to detect DEG for my RNAseq data because it can deal with multi-factor designs, although there is not only multi-factor analysis in my experiment data. Let me clarify my datasets firstly.
For my first and second dataset, there is only one factor with 2 levels (They are unbalanced experiments. One treatment level is of 4 biological replication and the other is of 5). So, what I need is just to do 2 pairwise comparisons for each of them by reading the count table (the counts are raw counts got from Htseq-count) by
STR_dds <- DESeqDataSetFromMatrix(countData = STR_count, colData = STR_colData, design = ~ trt )
PFC_dds <- DESeqDataSetFromMatrix(countData = PFC_count, colData = PFC_colData, design = ~ trt ), then doing the differential expression analysis
In the meantime, I also used Cuffdiff to do the tests. However, when I got the results from these 2 pipelines (Cuffdiff and DESeq2), they are so different. Both of them are with FDR at 0.05. For one of the dataset, there are 325 genes detected as DEG from Cuffdiff, while it is only 6 genes that are differentially expressed by DESeq2. For another dataset, there are 25 DEG from Cuffdiff, while it is only 1 gene from DESeq2. It looks like either Cufflinks is too liberal, or makes false positive calls, or HTseq-DEseq2 is to conservative, or makes false negative calls. It is hard for me to make a conclusion based on the different results.
So, I am trying to see the expression levels for each gene and check the correlation between cuffdiff and DESeq2 results. However, it is RPKM (my data is single-end) from cuffdiff (cufflinks) but baseMean from DESeq2. So, my question is how DESeq2 to calculate the baseMean for each gene? Is there any relationship to RPKM? For baseMean, it is a just average of the normalized count values, taken over all samples. On my understanding, it is normalized only by library size not genes length, right? Could I get FPKM from DESeq2? If I can't, I can also calculate it manually based on gene count and length. But, I wonder know if I could get it from DESeq2 and my answer is no but I don't know the exact reason.
For my third dataset, it is multi-factor design. There are 2 factors in my experiment, genotype and condition, with 2 levels for each of them. I fit a model with interaction. Here are the codes that I used to read data in:
GT_Con_colData <- data.frame(row.names = colnames(GT_Con_countData), GT = c(rep('WT', 11), rep('NEO', 11)), Con = c(rep('MS', 5), rep('NMS', 6), rep('MS', 6), rep('NMS', 5)))
GT_Con_dds <- DESeqDataSetFromMatrix(countData = GT_Con_countData , colData = GT_Con_colData, design = formula(~ GT+Con+GT:Con) )
Then, I did test by
GT_Con_dds <- DESeq(GT_Con_dds)
GT_Con_dds_res <- results(GT_Con_dds)
The result shows that there is nothing significant. How to explain the results and what is the next step based on this non-significant results. Is it showing there is no interaction effect for my data? Actually, for this test, my full model is GT+Con+GT:Con (H1), and my reduced model is GT+Con (H0). Since we fail to reject our H0, we can conclude that the interaction effect is not significant for this dataset. Is my understanding correct? If yes, then the null model is enough to fit my data. Could I just drop/remove the interaction from my model, then the full model is GT+Con? Does is it matter the order of GT and Con because I see from vignette the variable of interest should be put at the end of the formula? In this case, what is the reduced model? Also, could you illuminate what is the code should I use if I am interested in all of contrasts based on this non-signifciant interaction effect?
I hope I make it clear for my questions in this big email (actually, I did some research online and in DESeq2 community. It really helps but I am still confused about my questions). Thank you so much!