Hi all,
Was wondering if somebody could provide some input. I am struggling to look at DE using DESeq2 with data from a count matrix! I am applying the following code and I get no significant results whatsoever. I think that the creation of the coldata is incorrect. Here is my code for the design and then the output of DE (which shows no significance). I have looked around but cannot find the answer to my problem. Any input would be greatly appreciated and I will be happy to add any other code required.
countdata <- RNAseq[ ,3:ncol(RNAseq)]
countdata<-as.matrix(countdata)
storage.mode(countdata) = "integer"
head(countdata)
colnames(countdata)
condition<- factor(c(rep("con1", 5), rep("con2", 7), rep("con3", 7)))
treatment<-factor(c(rep("WT", 5), rep("genot1", 7), rep("genot2", 7)))
coldata <- data.frame(row.names=colnames(countdata), condition, treatment)
coldata
condition treatment
SJMMNORM016986_G1 con1 WT
SJMMNORM016994_G1 con1 WT
SJMMNORM016996_G1 con1 WT
SJMMNORM016997_G1 con1 WT
SJMMNORM016999_G1 con1 WT
SJMMNORM016977_G1 con2 genot1
SJMMNORM016978_G1 con2 genot1
SJMMNORM016979_G1 con2 genot1
SJMMNORM016983_G1 con2 genot1
SJMMNORM016985_G1 con2 genot1
SJMMNORM016989_G1 con2 genot1
SJMMNORM016995_G1 con2 genot1
SJMMNORM016981_G1 con3 genot2
SJMMNORM016982_G1 con3 genot2
SJMMNORM016984_G1 con3 genot2
SJMMNORM016988_G1 con3 genot2
SJMMNORM016990_G1 con3 genot2
SJMMNORM016992_G1 con3 genot2
SJMMNORM016993_G1 con3 genot2
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design= ~treatment)dds
dds <- DESeq(dds)
res <- results(dds)
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
ENSMUSG00000064052 0.0000000 NA NA NA NA NA
ENSMUSG00000037169 2.2600347 1.10287749 0.6880776 1.6028387331 0.1089703 0.9998955
ENSMUSG00000077976 0.1544783 0.00299096 3.8710595 0.0007726463 0.9993835 0.9998955
ENSMUSG00000086031 0.0000000 NA NA NA NA NA
ENSMUSG00000000197 0.1026481 -1.15111679 3.8710595 -0.2973647864 0.7661880 0.9998955
... ... ... ... ... ... ...
ENSMUSG00000093086 0.00000000 NA NA NA NA NA
ENSMUSG00000065511 0.00000000 NA NA NA NA NA
ENSMUSG00000076628 0.05757301 0.002998166 3.87106 0.0007745079 0.999382 0.9998955
ENSMUSG00000076626 0.05757301 0.002998166 3.87106 0.0007745079 0.999382 0.9998955
ENSMUSG00000077841 0.00000000 NA NA NA NA NA
Many thanks again!
Many thanks! and yep, was just going to these steps (PCA specifically) to check for variability, but decided to post as, on excel a colleague is getting significantly expressed genes between groups, hence why I thought, my design of coldata was fundamentally wrong and hence the "design" was not actually comparing between the 3 groups of replicates? con1 vs con2, con2 vs con3 etc.
I am indeed following the workflow, just stopped after results(dds) as I am really concenred by the lack of significance (literally not a single gene)...
Many thanks again!