Hello!
I am working on one RNA-Seq project, where the task is to find DE genes between several conditions. I am a bit new to the filed and I have tried to understand how it is working by reading the vignette, function manuals and other community members' answered posts. However, there are still some things which are not quite clear to me, and I would appreciate any help. Here is the experimental setup:
we have 3 factors, with 2 levels each:
- sex: female and male
- disease status: disease and control
- tissue types: tissue A and tissue B.
The data is the following (each sample is one patient):
> data_bioc_2 sex disease_status tissue [1,] "female" "disease" "B" [2,] "female" "disease" "A" [3,] "female" "control" "B" [4,] "female" "disease" "B" [5,] "female" "disease" "A" [6,] "female" "disease" "B" [7,] "male" "control" "A" [8,] "male" "control" "B" [9,] "male" "control" "A" [10,] "male" "control" "B" [11,] "female" "disease" "A" [12,] "female" "disease" "A" [13,] "female" "control" "B" [14,] "male" "control" "B" [15,] "male" "control" "B" [16,] "female" "control" "A" [17,] "male" "control" "B" [18,] "male" "control" "A" [19,] "male" "control" "A" [20,] "male" "control" "B" [21,] "female" "control" "A" [22,] "female" "disease" "A" [23,] "female" "disease" "B" [24,] "male" "control" "A" [25,] "male" "control" "B" [26,] "male" "control" "A" [27,] "male" "control" "B" [28,] "female" "control" "A" [29,] "female" "control" "B" [30,] "male" "disease" "A" [31,] "male" "disease" "B" >sessionInfo () DESeq2_1.8.2
The questions I would like to answer are:
1. which model to use to find DE genes between:
1.1. overall difference between disease and control, not taking into account tissue type, but blocking for sex.
1.2. difference in expression between female and male only in control (in disease state there is too little male samples, only 2).
1.3. difference in expression between tissue A and tissue B, blocking for sex and disease status.
1.4. difference in expression between tissue A and tissue B in control, but blocking for sex.
1.5. difference in expression between tissue A and tissue B in disease, but blocking for sex.
1.6. interaction between disease_status and tissue type.
2. Would it be better to find contrasts in subsets, and thus use simpler models? For example, for 1.2.- should I put as an input only control samples, and write a model:
count~sex, and then look into a difference female-male.
3. Since I am new to this field: Is it allowed in one specific analysis to use different models (set different designs) just to get different desired contrasts? The reason why I am asking this question is that for answering e.g. 1.4. , I will probably have to use a design with an interaction (counts~sex+disease_status+tissue+disease_status:tissue). If the interaction coefficient is not signif.different from 0, should I remove the interaction and work with a model: counts~sex+disease_status+tissue (but then I cannot answer question 1.4.)? I guess that if I would compare the models, the coefficients ß1 (fem vs. male) from both models (with and without interaction) will be different ( because of different fitting). Which one is then the correct design?
4. Actually the most important question: How to know based on resultsNames, which resultsName represents which contrast, and which combination of resultsNames represents which contrast, i.e. is there a way to know that without writing all the formulas for linear models, and calculating coefficients, and than matching them to the resultNames.
E.g. Would it be easier that immediately in the resultNames we see which contrast each resultName represents, e.g. conditionA_vs_conditionB_in_setX (example is taken from ?results), instead of "conditionA_vs_conditionB".
5. why there is a difference between results(dds, contrast=list("conditionB","conditionA")) and results(dds, contrast=list(c("conditionB","conditionA")))? What is the difference?
This refers to an example in ?results for Example 3: two conditions, three sets.
6.1. Is the notation for all models consistent? For example, notation "setY.conditionB" for the case of 2 level factors with interaction (set: Y,X; conditions :A,B) means the contrast (B vs A)vs(Y vs X). What does the same notation mean in the case where the factor set has 3 levels (X,Y, and Z), or what do other similar notations like "setZ.conditionA" and "setZ.conditionB" mean?
6.2. How can we know from the modelMatrix that the contrast "setY.conditionB" means (B vs A)vs(Y vs X), since the ones (1s) in the modelMatrix for the contrast "setY.conditionB" are on the positions where we have both set Y and condition B in the samples? More precisely, should we also have 1s in the modelMatrix for the contrast "setY.conditionB" where we have both set X and condition A, since (B vs A)vs(Y vs X)=BY-AY-BX+AX? (i.e. for BY we should have 1s in the modelMatrix, for AY 0s, for BX 0s, and for AX again 1s, so that we can interpret the contrast "setY.conditionB" as (B vs A)vs(Y vs X)?.
Thanks a lot for any help!
Best regards,
Mislav