which contrasts to use? matching contrasts with resultsNames
1
0
Entering edit mode
sabomislav • 0
@sabomislav-12804
Last seen 7.0 years ago

Hello!

I am working on one RNA-Seq project, where the task is to find DE genes between several conditions. I am a bit new to the filed and I have tried to understand how it is working by reading the vignette, function manuals and other community members' answered posts. However, there are still some things which are not quite clear to me, and I would appreciate any help. Here is the experimental setup:

we have 3 factors, with 2 levels each:

- sex: female and male

- disease status: disease and control

- tissue types: tissue A and tissue B.

The data is the following (each sample is one patient): 

> data_bioc_2

      sex      disease_status tissue

 [1,] "female" "disease"      "B"   

 [2,] "female" "disease"      "A"   

 [3,] "female" "control"      "B"   

 [4,] "female" "disease"      "B"   

 [5,] "female" "disease"      "A"   

 [6,] "female" "disease"      "B"   

 [7,] "male"   "control"      "A"   

 [8,] "male"   "control"      "B"   

 [9,] "male"   "control"      "A"   

[10,] "male"   "control"      "B"   

[11,] "female" "disease"      "A"   

[12,] "female" "disease"      "A"   

[13,] "female" "control"      "B"   

[14,] "male"   "control"      "B"   

[15,] "male"   "control"      "B"   

[16,] "female" "control"      "A"   

[17,] "male"   "control"      "B"   

[18,] "male"   "control"      "A"   

[19,] "male"   "control"      "A"   

[20,] "male"   "control"      "B"   

[21,] "female" "control"      "A"   

[22,] "female" "disease"      "A"   

[23,] "female" "disease"      "B"   

[24,] "male"   "control"      "A"   

[25,] "male"   "control"      "B"   

[26,] "male"   "control"      "A"   

[27,] "male"   "control"      "B"   

[28,] "female" "control"      "A"   

[29,] "female" "control"      "B"   

[30,] "male"   "disease"      "A"   

[31,] "male"   "disease"      "B"  


>sessionInfo () 

DESeq2_1.8.2  

 

The questions I would like to answer are: 

1. which model to use to find DE genes between:

    1.1. overall difference between disease and control, not taking into account tissue type, but blocking for sex.

    1.2. difference in expression between female and male only in control (in disease state there is too little male samples, only 2).

    1.3. difference in expression between tissue A and tissue B, blocking for sex and disease status.

    1.4. difference in expression between tissue A and tissue B in control, but blocking for sex.

    1.5. difference in expression between tissue A and tissue B in disease, but blocking for sex.

    1.6. interaction between disease_status and tissue type.

2. Would it be better to find contrasts in subsets, and thus use simpler models? For example, for 1.2.- should I put as an input only control samples, and write a model:

count~sex, and then look into a difference female-male.

3. Since I am new to this field: Is it allowed in one specific analysis to use different models (set different designs) just to get different desired contrasts? The reason why I am asking this question is that for answering e.g. 1.4. , I will probably have to use a design with an interaction (counts~sex+disease_status+tissue+disease_status:tissue). If the interaction coefficient is not signif.different from 0, should I remove the interaction and work with a model: counts~sex+disease_status+tissue (but then I cannot answer question 1.4.)? I guess that if I would compare the models, the coefficients ß1 (fem vs. male) from both models (with and without interaction) will be different ( because of different fitting). Which one is then the correct design?

4. Actually the most important question: How to know based on resultsNames, which resultsName represents which contrast, and which combination of resultsNames represents which contrast, i.e. is there a way to know that without writing all the formulas for linear models, and calculating coefficients, and than matching them to the resultNames.

E.g. Would it be easier that immediately in the resultNames we see which contrast each resultName represents, e.g. conditionA_vs_conditionB_in_setX (example is taken from ?results), instead of "conditionA_vs_conditionB".

5. why there is a difference between results(dds, contrast=list("conditionB","conditionA")) and results(dds, contrast=list(c("conditionB","conditionA")))? What is the difference?

This refers to an example in ?results for Example 3: two conditions, three sets.

6.1. Is the notation for all models consistent? For example, notation "setY.conditionB" for the case of 2 level factors with interaction (set: Y,X; conditions :A,B) means the contrast (B vs A)vs(Y vs X). What does the same notation mean in the case where the factor set has 3 levels (X,Y, and Z), or what do other similar notations like "setZ.conditionA" and "setZ.conditionB" mean?

6.2. How can we know from the modelMatrix that the contrast "setY.conditionB" means (B vs A)vs(Y vs X), since the ones (1s) in the modelMatrix for the contrast "setY.conditionB" are on the positions where we have both set Y and condition B in the samples? More precisely, should we also have 1s in the modelMatrix for the contrast "setY.conditionB" where we have both set X and condition A, since (B vs A)vs(Y vs X)=BY-AY-BX+AX? (i.e. for BY we should have 1s in the modelMatrix, for AY 0s, for BX 0s, and for AX again 1s, so that we can interpret the contrast "setY.conditionB" as (B vs A)vs(Y vs X)?.

Thanks a lot for any help!

Best regards,

Mislav

deseq2 • 1.0k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 14 hours ago
United States

hi Mislav,

Re: (1)-(4),(6) You have a lot of contrasts you are interested in testing, therefore I would recommend that you partner with a local statistician who can help you design the analysis and construct results tables. I can answer specific questions about software, but you have more generic statistical modeling help, which deserves bringing on a statistician collaborator and having face-to-face meetings to discuss these topics. These is nothing unique about how DESeq2 constructs the design matrix from the design formula, and anyone with experience building linear models in R could tell you how to set up the design and which contrasts to perform.

Re: (5) The first element of the list is a vector which is added to the numerator of the contrast, the second element is added to the denominator. So you have B / A in the first case, and (B+A) / 1 in the second case. 

You should update to the latest version of R and Bioconductor, because DESeq2 v1.8 is now quite out of date (October 2015). There will be a new version of Bioconductor available in less than a week, which will be DESeq2 v1.16.

ADD COMMENT

Login before adding your answer.

Traffic: 706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6