Hello,
I am working on RNA-seq data that comes from different patient derived samples and are then treated with four different possible treatments:
directory <- "human_count_files"
sampleFiles <- grep("counts", list.files(directory), value=TRUE)
sampleTreatment <- factor(rep(c(rep("control",3), rep("A",3), rep("B",3), rep("A_B",3)),3), levels=c("control","A", "B", "A_B"))
samplePatient <- factor(c(rep("p1",12), rep("p2",12), rep("p3",12)), levels=c("c1","c2","c3"))
sampleName <- 1:12
sample_df <- data.frame(sampleName, sampleFiles, sampleTreatment, sampleCord)
sample_df
sampleName sampleFiles sampleTreatment samplePatient
3 1 lib11.Aligned.sortedByCoord.out.counts control p1
6 2 lib14.Aligned.sortedByCoord.out.counts A p1
9 3 lib17.Aligned.sortedByCoord.out.counts B p1
12 4 lib2.Aligned.sortedByCoord.out.counts A_B p1
15 5 lib22.Aligned.sortedByCoord.out.counts control p2
18 6 lib25.Aligned.sortedByCoord.out.counts A p2
21 7 lib28.Aligned.sortedByCoord.out.counts B p2
24 8 lib30.Aligned.sortedByCoord.out.counts A_B p2
27 9 lib33.Aligned.sortedByCoord.out.counts control p3
30 10 lib36.Aligned.sortedByCoord.out.counts A p3
33 11 lib6.Aligned.sortedByCoord.out.counts B p3
36 12 lib9.Aligned.sortedByCoord.out.counts A_B p3
Because of the variability involved in patient samples, I expect that some gene expression changes after a given treatment, while they may be changing in the same direction (i.e. they are up-regulated in all treatments vs controls within a patient sample) are not going to be consistent in the magnitude of change from patient sample to patient sample. Indeed, when I look at my normalized counts data I see something like the following:
dds <- DESeqDataSetFromHTSeqCount(sampleTable=sample_df, directory=directory, design= ~samplePatient + sampleTreatment)
dds <- estimateSizeFactors(dds)
counts <- data.frame(counts(dds, normalized=TRUE))
colnames(counts) <- paste0(dds$samplePatient,"_",dds$sampleTreatment)
counts[5613,]
counts[5613,]
p1_control p1_A p1_B p1_A_B p2_control p2_A p2_B p2_A_B p3_control p3_A p3_B p3_A_B
ENSG00000125124.12 398.8299 949.9343 575.5421 1573.744 514.532 5442.532 930.1038 1558.219 1117.352 1952.98 1701.638 1966.304
Now looking at treatment A vs control in the context of each patient separately, you can see that there are approximately 2.5-fold, 10-fold, and 2-fold up regulations from control to A if you look at the patients individually. However, because of the inconsistency in the increase, this gene is not considered significantly changed with an alpha value of 0.1 when I perform the DESeq analysis using design= ~samplePatient + sampleTreatment
and results(dds, contrast = c("sampleTreatment", "A", "control")
. I know that this particular design is meant to model batch effect for the patients, and that to fit this model well, the changes in genes should be consistent. However, how might I find genes such as the one I've shown above, where the change might not be to the same magnitude consistently, but they are always changing in the same direction and seem to be "big enough" changes to warrant follow-up (I know that "big" isn't very specific, but hopefully it gets the point across).
I'm looking for something similar to This previous question. However, I attempted to do as the poster here did, which was to combine the patient and treatment factors and then look at the average of the effect of A vs control in each patient. When I do this, I receive an error:
sample_df$combine <- factor(paste0(sample_df$samplePatient, "_", sample_df$sampleTreatment))
dds <- DESeqDataSetFromHTSeqCount(sampleTable=sample_df, directory=directory, design= ~combine)
dds<- DESeq(dds)
estimating size factors
estimating dispersions
Error in checkForExperimentalReplicates(object, modelMatrix) :
The design matrix has the same number of samples and coefficients to fit,
so estimation of dispersion is not possible. Treating samples
as replicates was deprecated in v1.20 and no longer supported since v1.22.
Did I misunderstand how the previous poster set up their analysis, or is this an approach that isn't possible with DESeq2 since the release of v1.22? If this doesn't work anymore, is there another approach you would recommend?
Thanks, Jim