Hi, I have DESeqDataSet object 'ddsHTSeq' which contain multiple sample types. But i want to perform differential gene expression analysis only at a subset of this data. I have subset the data and ran DESeq, but the result is not what i was expected, since i knew some genes which were among the most differentially expressed genes from microarray data. if i could get any comments or suggest where i could go wrong, it would be much appreciated. Thank you in advance.
##Extract coldata
>coldata_1 = colData(ddsHTSeq)
## subset the coldata
>coldata = coldata_1[coldata_1$Subgroup %in% c("MB.WNT","MB.SHH","MB.Grp3","MB.Grp4"),]
## Extract and subset count data
>countdata = assay(ddsHTSeq)
>countdata = countdata[, coldata_1$Subgroup %in% c("MB.WNT","MB.SHH","MB.Grp3","MB.Grp4")]
## Make variables for design, to perform differential expression of MB.Grp4 compared to all others combined
>Subgroup = as.character(coldata$Subgroup)
>Subgroup[Subgroup !="MB.Grp4"] = "other"
>Subgroup = factor(Subgroup, levels = c("other","MB.Grp4"))
## Re-assign the coldata Subgroup
>coldata$Subgroup = Subgroup
## Create a new DESeq object
>dds = DESeqDataSetFromMatrix(countdata = countdata, coldata = coldata, design =~Subgroup)
>dds = DESeq(dds)
>res = results(dds, alpha = 0.01)
## Check the top differentially expressed genes
>rownames(res) [1:50]