Looking at interaction term in DESeq2 while also controlling for individual variable
1
0
Entering edit mode
hmgeiger ▴ 10
@hmgeiger-9902
Last seen 3.8 years ago

I am running an experiment where we have samples from a number of individuals, pre and post treatment. These individuals were then also grouped according to features found in the DNA. So, overall the design looks like this. I named pre treatment "before" so that DESeq2 will automatically take it as the first level in the factor. All three groups are of interest to compare to the other two (none of the 3 is a control).

          Sample  Patient Treatment  Group

1   PatientA-pre PatientA    before Group3

2  PatientA-post PatientA      post Group3

3   PatientB-pre PatientB    before Group3

4  PatientB-post PatientB      post Group3

5   PatientC-pre PatientC    before Group1

6  PatientC-post PatientC      post Group1

7   PatientD-pre PatientD    before Group3

8  PatientD-post PatientD      post Group3

9   PatientE-pre PatientE    before Group1

10 PatientE-post PatientE      post Group1

11  PatientF-pre PatientF    before Group1

12 PatientF-post PatientF      post Group1

13  PatientG-pre PatientG    before Group1

14 PatientG-post PatientG      post Group1

15  PatientH-pre PatientH    before Group3

16 PatientH-post PatientH      post Group3

17  PatientI-pre PatientI    before Group1

18 PatientI-post PatientI      post Group1

19  PatientJ-pre PatientJ    before Group1

20 PatientJ-post PatientJ      post Group1

21  PatientK-pre PatientK    before Group1

22 PatientK-post PatientK      post Group1

23  PatientL-pre PatientL    before Group2

24 PatientL-post PatientL      post Group2

25  PatientM-pre PatientM    before Group2

26 PatientM-post PatientM      post Group2

27  PatientN-pre PatientN    before Group2

28 PatientN-post PatientN      post Group2

29  PatientO-pre PatientO    before Group2

30 PatientO-post PatientO      post Group2

31  PatientP-pre PatientP    before Group2

32 PatientP-post PatientP      post Group2

33  PatientQ-pre PatientQ    before Group1

34 PatientQ-post PatientQ      post Group1

35  PatientR-pre PatientR    before Group2

36 PatientR-post PatientR      post Group2

37  PatientS-pre PatientS    before Group3

38 PatientS-post PatientS      post Group3

We are interested in determining the interaction between groups and treatment. In other words, do certain genes have a different change between pre and post depending on group?

I know if I did not have to control for individual (Patient), I would run like so (maybe afterward also re-running with group1 not as the first level so can test that):

dds <- DESeq(DESeqDataSetFromMatrix(countData=featureCounts_pairs,colData=design_table,design=~Treatment+Group+Treatment:Group))

results_group2 <- results(dds,name="Treatmentpost.GroupGroup2")

results_group3 <- results(dds,name="Treatmentpost.GroupGroup3")

However, I would also like to control for patient differences within each group. So that if a gene goes up or down consistently between pre and post within all patients of one group but not the other, it will show up, even if there is a lot of variability between all "before" or all "post" samples within each group.

Adding "+Patient" to the design above does not solve this issue, as then I get the "model matrix not full rank" error since each of the two samples for the same patient are also in the same group.

deseq2 • 1.0k views
0
Entering edit mode

That makes sense, although for the last part ("now this matrix m1 can be provided to the full argument of DESeq") I am still unclear on the syntax. Can you take a look at my code and let me know?

First, I add the patient indices to the design table.

patient_indices <- rep("Index",times=nrow(design_table))

patient_indices[which(as.vector(design_table$Group) == "Group1")] <- paste0("Index",rep(1:(length(which(as.vector(design_table$Group) == "Group1"))/2),each=2))

patient_indices[which(as.vector(design_table$Group) == "Group2")] <- paste0("Index",rep(1:(length(which(as.vector(design_table$Group) == "Group2"))/2),each=2))

patient_indices[which(as.vector(design_table$Group) == "Group3")] <- paste0("Index",rep(1:(length(which(as.vector(design_table$Group) == "Group3"))/2),each=2))

design_table <- data.frame(design_table,Patient.index = patient_indices)

Next, I create the model matrix, and then remove columns with all zeroes.

design_table_model_matrix <- model.matrix(~Group + Group:Patient.index + Group:Treatment,design_table)

design_table_model_matrix <- design_table_model_matrix[,which(colSums(design_table_model_matrix) > 0)]

Now, I am unclear on the syntax of actually making the DESeq object. Here is what I tried so far. Does this make sense?

data_set <- DESeqDataSetFromMatrix(countData=raw_counts,colData=design_table,design=~1)

design_table_model_matrix_reduced <- model.matrix(~Treatment+Group,design_table)

dds <- DESeq(data_set,full=design_table_model_matrix,reduced=design_table_model_matrix_reduced,test="LRT")

0
Entering edit mode

Update: Tried running the following code afterwards, but get the same p-values no matter which two groups are being compared. Also there are way too many DEGs given the expected strength of the signal. Don't think the above design is correct, but not sure how to fix it.

ddsClean <- dds[which(mcols(dds)\$fullBetaConv == TRUE),]

results_group2.post.vs.pre.change_vs_group1.post.vs.pre.change <- results(ddsClean,contrast=list("GroupGroup2.Treatmentpost","GroupGroup1.Treatmentpost"))

results_group3.post.vs.pre.change_vs_group1.post.vs.pre.change <- results(ddsClean,contrast=list("GroupGroup3.Treatmentpost","GroupGroup1.Treatmentpost"))

results_group3.post.vs.pre.change_vs_group2.post.vs.pre.change <- results(ddsClean,contrast=list("GroupGroup3.Treatmentpost","GroupGroup2.Treatmentpost"))

0
Entering edit mode

See the section of ?results about the LRT. It will explain what you observed.

0
Entering edit mode

Finally got results that make sense. This is my code.

Only issue left is I am still getting 206 rows that do not converge in beta even after strict expression filtering and increasing maxit (following the tips here: https://support.bioconductor.org/p/65091/). Any other tips on how to get them all to converge? I know I can just remove those rows if I really need to, but I would prefer to just get them all to converge.

design_table_model_matrix <- model.matrix(~Group + Group:Patient.index + Group:Treatment,design_table)

design_table_model_matrix <- design_table_model_matrix[,which(colSums(design_table_model_matrix) > 0)]

design_table_model_matrix_reduced <- model.matrix(~Group + Group:Patient.index,design_table)

design_table_model_matrix_reduced <- design_table_model_matrix_reduced[,which(colSums(design_table_model_matrix_reduced) > 0)]

myDESeqDataSet <-  DESeqDataSetFromMatrix(countData=raw_counts,colData=design_table,design=~1)

dds <- estimateSizeFactors(myDESeqDataSet)

nc <- counts(dds, normalized=TRUE)

filter <- rowSums(nc >= 30) >= 3

dds <- dds[filter,]

dds <- estimateDispersions(dds)

dds <- nbinomWaldTest(dds, modelMatrix=design_table_model_matrix,maxit=500,betaPrior=FALSE)

results_group2.interaction_vs_group1.interaction <- results(dds,contrast=list("GroupGroup2.Treatmentpost","GroupGroup1.Treatmentpost"))

results_group3.interaction_vs_group1.interaction <- results(dds,contrast=list("GroupGroup3.Treatmentpost","GroupGroup1.Treatmentpost"))

results_group3.interaction_vs_group2.interaction <- results(dds,contrast=list("GroupGroup3.Treatmentpost","GroupGroup2.Treatmentpost"))
0
Entering edit mode

What do the counts look like for the rows that aren't converging? The convergence is slower when there are many coefficients to fit, but often this is where only a few samples have expression. You could change from 30 counts and 3 samples to less counts but more samples, e.g. 10 normalized counts but more samples required.

0
Entering edit mode

There are a few genes that are DE if you do not remove these rows, and for these the counts are often high in quite a few samples.

For example, here are normalized counts for one gene that did not converge, for group1 and group3. Note, in the real data there are 4 individuals for group 1 and 9 individuals for group3.

It also seems to make sense that this gene is showing up as DE for the interaction term. In group1, 3/4 individuals have strong downregulation of this gene in post, with no change pre/post in the 4th individual. In group3, although most individuals do also have downregulation in post, the change is not nearly as severe.

  Treatment  Group Patient.index Gene.normalized.count

1     before Group1        Index1                159.82

2       post Group1        Index1                  0.00

3     before Group1        Index2               1013.40

4       post Group1        Index2                  0.00

5     before Group1        Index3                170.32

6       post Group1        Index3                 10.98

7     before Group1        Index4                 57.91

8       post Group1        Index4                 36.34

9     before Group3        Index1                 42.39

10      post Group3        Index1                162.52

11    before Group3        Index2                143.11

12      post Group3        Index2                 60.79

13    before Group3        Index3                215.74

14      post Group3        Index3                482.29

15    before Group3        Index4                295.58

16      post Group3        Index4                 53.31

17    before Group3        Index5                764.68

18      post Group3        Index5                335.54

19    before Group3        Index6                241.39

20      post Group3        Index6                137.68

21    before Group3        Index7                171.07

22      post Group3        Index7                450.52

23    before Group3        Index8                 72.61

24      post Group3        Index8                 76.38

25    before Group3        Index9                440.54

26      post Group3        Index9                204.05

Here group3 indices 2,4,5,6, and 9 have downregulation in post, but the change is way less dramatic. Plus indices 1,3,and 7 actually have upregulation in post, with index 8 being constant.

0
Entering edit mode

Update: Also tried a super high number of iterations (maxit=10000) with only a few additional rows converging.

0
Entering edit mode

I think you can safely ignore the 'not converging' warning for now, which I think is overly conservative here. what's happening is a tradeoff between the post/before effect in a group and the per-group patient effect, which gives probably very small changes in beta that very slightly increase the log likelihood at each iteration. I'd guess the LFC and test statistics aren't changing much as you increase maxit.

0
Entering edit mode

I also realized just now that I did not supply the model matrix to estimateDispersions. The code should be estimateDispersions(dds,modelMatrix = design_table_model_matrix). I imagine this could have affected the results as well.

0
Entering edit mode

Yes, you need to supply any custom model matrix to each function (estimateDispersions(), and nbinom*), if you aren't using DESeq().

0
Entering edit mode
@mikelove
Last seen 4 hours ago
United States

It sounds like the following section of the vignette will help you with this design. Take a look and let me know:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups