DESeq2 multi-factor comparison problem
1
0
Entering edit mode
Vivek.b ▴ 100
@vivekb-7661
Last seen 3.9 years ago
Germany

Hi Everyone

I have a DESeq2 design problem, where I have the counts for reads mapping to maternal and paternal allele, for the wildtype and knockdown samples. So the design is like this, where TF = transcription factor, rep=replicate and KD=knockdown. I have matching controls for each TF.

sampleName Sample Allele
Control_1_repA_mat Control_1 Maternal
Control_1_repB_mat Control_1 Maternal
Control_1_repC_mat Control_1 Maternal
Control_2_repA_mat Control_2 Maternal
Control_2_repB_mat Control_2 Maternal
Control_2_repC_mat Control_2 Maternal
KD_TF1_1_repA_mat KD_TF1_1 Maternal
KD_TF1_1_repB_mat KD_TF1_1 Maternal
KD_TF1_1_repC_mat KD_TF1_1 Maternal
KD_TF2_2_repA_mat KD_TF2_2 Maternal
KD_TF2_2_repB_mat KD_TF2_2 Maternal
KD_TF2_2_repC_mat KD_TF2_2 Maternal
Control_1_repA_pat Control_1 Paternal
Control_1_repB_pat Control_1 Paternal
Control_1_repC_pat Control_1 Paternal
Control_2_repA_pat Control_2 Paternal
Control_2_repB_pat Control_2 Paternal
Control_2_repC_pat Control_2 Paternal
KD_TF1_1_repA_pat KD_TF1_1 Paternal
KD_TF1_1_repB_pat KD_TF1_1 Paternal
KD_TF1_1_repC_pat KD_TF1_1 Paternal
KD_TF2_2_repA_pat KD_TF2_2 Paternal
KD_TF2_2_repB_pat KD_TF2_2 Paternal
KD_TF2_2_repC_pat KD_TF2_2 Paternal

 

Now I want to see, for each TF knockdown, the differential expression between maternal and paternal allele. But I also want to exclude the genes which show differential expression between maternal and paternal allele in Controls. Earlier I was splitting the samples into Control and Test, and use DESeq2 with Design ~ sample + allele, and later remove the genes which are diff expressed in both control and test, but it's not giving me expected results. 

So is it a better strategy to not split the samples and use the same Design formula? Also after running DESeq how shall I extract the differences? Shall I extract Mat over Pat difference for each sample (Control and KD) and then again simply remove the common diffExp genes, or is there a better way ( i.e for constructing the design matrix or extracting results), that takes care of this thing.

Thanks in advance

deseq2 rnaseq • 2.2k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 6 hours ago
United States

"the differential expression between maternal and paternal allele. But I also want to exclude the genes which show differential expression between maternal and paternal allele in Controls"

This is a standard interaction design, and you just want to test the interaction term (interactions are the extra difference which is present, for example for the paternal vs maternal difference after controlling for the difference observed in the reference level, here "control"). Make a column 'condition' which is a factor with levels "control", "KDTF1" and "KDTF2".

dds$condition <- relevel(dds$condition, "control")
design(dds) <- ~ allele + sample + allele:sample
dds <- DESeq(dds, betaPrior=FALSE) # I suggest no LFC prior for interaction designs

Then the following results() call is testing for differential Paternal vs Maternal effect in KDTF1 controlling for the difference in Control:

results(dds, name="Paternal.KDTF1")

The same for KDTF2:

results(dds, name="Paternal.KDTF2")
ADD COMMENT
0
Entering edit mode

Thanks Michael for the reply.. I saw the dealing with interactions section in the manual but couldn't understand that it's the same situation that I have. In that case, is it better to split the input by KD and the matching Control? or is it not important?

ADD REPLY
0
Entering edit mode

Sorry, I missed that in my first pass. How are the controls for TF1 and TF2 different?

ADD REPLY
0
Entering edit mode

Each TF has one matching control as the knock-down was performed by different people. They used different scrambled siRNA sequences. 

ADD REPLY
1
Entering edit mode

If you want to control each TF with its matching control, this is possible as well.

Create column data which looks like this (update: note that all columns should be factors)

TF condition allele
1  control   P
1  KD        P
1  control   M
1  KD        M
2  control   P
2  KD        P
2  control   M
2  KD        M

Then use a design of ~ TF + TF:condition + TF:allele + TF:condition:allele

The two interaction terms for TF:condition:allele are tests for differences in Paternal vs Maternal, controlling for differences in that TF's control. You will use results(dds, name=...) to extract each one separately.

ADD REPLY
0
Entering edit mode

So when I made the desing matrix like the one above, I found out that resultNames(dds) looks like this

"Intercept"                    "TF"                           "TF.conditionKD"             
"TF.allelePaternal"             "TF.conditionKD.allelePaternal"

Then I extracted results with name = "TF.conditionKD.allelePaternal". But this is maybe the combined one from two TFs?

I was wondering whether It would be good if I just divide my input into two separate data frames and run DESeq on them separately with the first design matrix you suggested above.

ADD REPLY
1
Entering edit mode

Can you post your column data and your design which produced these resultsNames?

I was thinking it would look like my example in the comment above.

ADD REPLY
0
Entering edit mode

My Design:

Row.names    condition    allele    TF
TF1_1_Mat    KD    Mat    1
TF1_2_Mat    KD    Mat    1
TF1_3_Mat    KD    Mat    1
TF1_1_Pat    KD    Pat    1
TF1_2_Pat    KD    Pat    1
TF1_3_Pat    KD    Pat    1
TF2_1_Mat    KD    Mat    2
TF2_2_Mat    KD    Mat    2
TF2_3_Mat    KD    Mat    2
TF2_1_Pat    KD    Pat    2
TF2_2_Pat    KD    Pat    2
TF2_3_Pat    KD    Pat    2
Scr2_1_Mat    Control    Mat    2
Scr2_2_Mat    Control    Mat    2
Scr2_3_Mat    Control    Mat    2
Scr2_1_Pat    Control    Pat    2
Scr2_2_Pat    Control    Pat    2
Scr2_3_Pat    Control    Pat    2
Scr1_1_Mat    Control    Mat    1
Scr1_2_Mat    Control    Mat    1
Scr1_3_Mat    Control    Mat    1
Scr1_1_Pat    Control    Pat    1
Scr1_2_Pat    Control    Pat    1
Scr1_3_Pat    Control    Pat    1

where row.names(design) = colnames(featureCount.Result). Condition,allele and TF are factors.

Then I run the following:

design$allele = relevel(design$allele),"Mat")
ase.deseq <- DESeqDataSetFromMatrix(featureCount.Result,design,design = ~ TF + TF:condition + TF:allele + TF:condition:allele)

ase.deseq <- DESeq(ase.deseq,betaPrior = FALSE)
ase.deseq <- DESeq(ase.deseq,betaPrior = FALSE)

Runs without any trouble..

colData(ase.deseq)

DataFrame with 24 rows and 4 columns
               condition   allele        TF sizeFactor
                <factor> <factor> <numeric>  <numeric>
TF1_1_Mat           KD    Mat         1  0.9903505
TF1_2_Mat           KD    Mat         1  0.8977864
TF1_3_Mat           KD    Mat         1  1.0096987
TF1_1_Pat         KD  Pat         1  0.9316722
TF1_2_Pat         KD  Pat         1  0.8573930
...                  ...      ...       ...        ...
Scr2_2_Mat     Control    Mat         1  0.9526810
Scr2_3_Mat     Control    Mat         1  1.1222974
Scr2_1_Pat   Control  Pat         1  1.1081599
Scr2_2_Pat   Control  Pat         1  0.8673878
Scr2_3_Pat   Control  Pat         1  1.0313058
resultsNames(ase.deseq)
[1] "Intercept"                    "TF"                           "TF.conditionKD"              
[4] "TF.allelePat"             "TF.conditionKD.allelePat"
 
ADD REPLY
1
Entering edit mode

The TF column needs to be a factor.

dds$TF = factor( dds$TF)
ADD REPLY
0
Entering edit mode

Thanks a lot, it was my mistake.

One last thing, I ran DESeq with this design, I get 40 significant genes for TF1 . Then I run it with previous design that I mentioned.(i.e. split the input by TF and fit the following design formula you suggested).

~ allele + sample + allele:sample

 and I get 35 genes for TF1. (34 of them common bw them).

Can you please tell where this difference comes from, and which one is a better strategy then? I assume splitting by TF and running DESeq seperately is also correct, as it just means I am treating TF1 and TF2 as separate experiments.

ADD REPLY
1
Entering edit mode

Small fluctuations are expected when you change the samples involved in the analysis. 

p-values are tail probabilities and therefore very sensitive to small changes in parameter estimates.

Remember also that the set of genes with FDR < alpha are not the "true set of DE genes", but at most a set enriched with the genes for which you had power to detect DE. By changing the samples you change the power as well.

ADD REPLY
0
Entering edit mode

I see.. Alright , thanks a lot Michael.. 

ADD REPLY

Login before adding your answer.

Traffic: 680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6