Question

DESeq2 multi-factor comparison problem

0

Entering edit mode

Vivek.b ▴ 100

@vivekb-7661

Last seen 3.9 years ago

Germany

Hi Everyone

I have a DESeq2 design problem, where I have the counts for reads mapping to maternal and paternal allele, for the wildtype and knockdown samples. So the design is like this, where TF = transcription factor, rep=replicate and KD=knockdown. I have matching controls for each TF.

sampleName	Sample	Allele
Control_1_repA_mat	Control_1	Maternal
Control_1_repB_mat	Control_1	Maternal
Control_1_repC_mat	Control_1	Maternal
Control_2_repA_mat	Control_2	Maternal
Control_2_repB_mat	Control_2	Maternal
Control_2_repC_mat	Control_2	Maternal
KD_TF1_1_repA_mat	KD_TF1_1	Maternal
KD_TF1_1_repB_mat	KD_TF1_1	Maternal
KD_TF1_1_repC_mat	KD_TF1_1	Maternal
KD_TF2_2_repA_mat	KD_TF2_2	Maternal
KD_TF2_2_repB_mat	KD_TF2_2	Maternal
KD_TF2_2_repC_mat	KD_TF2_2	Maternal
Control_1_repA_pat	Control_1	Paternal
Control_1_repB_pat	Control_1	Paternal
Control_1_repC_pat	Control_1	Paternal
Control_2_repA_pat	Control_2	Paternal
Control_2_repB_pat	Control_2	Paternal
Control_2_repC_pat	Control_2	Paternal
KD_TF1_1_repA_pat	KD_TF1_1	Paternal
KD_TF1_1_repB_pat	KD_TF1_1	Paternal
KD_TF1_1_repC_pat	KD_TF1_1	Paternal
KD_TF2_2_repA_pat	KD_TF2_2	Paternal
KD_TF2_2_repB_pat	KD_TF2_2	Paternal
KD_TF2_2_repC_pat	KD_TF2_2	Paternal

Now I want to see, for each TF knockdown, the differential expression between maternal and paternal allele. But I also want to exclude the genes which show differential expression between maternal and paternal allele in Controls. Earlier I was splitting the samples into Control and Test, and use DESeq2 with Design ~ sample + allele, and later remove the genes which are diff expressed in both control and test, but it's not giving me expected results.

So is it a better strategy to not split the samples and use the same Design formula? Also after running DESeq how shall I extract the differences? Shall I extract Mat over Pat difference for each sample (Control and KD) and then again simply remove the common diffExp genes, or is there a better way ( i.e for constructing the design matrix or extracting results), that takes care of this thing.

Thanks in advance

deseq2 rnaseq • 2.2k views

ADD COMMENT • link 9.0 years ago Vivek.b ▴ 100

score 1 · Answer 1 · 2015-05-09

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 6 hours ago

United States

"the differential expression between maternal and paternal allele. But I also want to exclude the genes which show differential expression between maternal and paternal allele in Controls"

This is a standard interaction design, and you just want to test the interaction term (interactions are the extra difference which is present, for example for the paternal vs maternal difference after controlling for the difference observed in the reference level, here "control"). Make a column 'condition' which is a factor with levels "control", "KDTF1" and "KDTF2".

dds$condition <- relevel(dds$condition, "control")
design(dds) <- ~ allele + sample + allele:sample
dds <- DESeq(dds, betaPrior=FALSE) # I suggest no LFC prior for interaction designs

Then the following results() call is testing for differential Paternal vs Maternal effect in KDTF1 controlling for the difference in Control:

results(dds, name="Paternal.KDTF1")

The same for KDTF2:

results(dds, name="Paternal.KDTF2")

ADD COMMENT • link 9.0 years ago Michael Love 41k

0

Entering edit mode

Thanks Michael for the reply.. I saw the dealing with interactions section in the manual but couldn't understand that it's the same situation that I have. In that case, is it better to split the input by KD and the matching Control? or is it not important?

ADD REPLY • link 9.0 years ago Vivek.b ▴ 100

0

Entering edit mode

Sorry, I missed that in my first pass. How are the controls for TF1 and TF2 different?

ADD REPLY • link 9.0 years ago Michael Love 41k

0

Entering edit mode

Each TF has one matching control as the knock-down was performed by different people. They used different scrambled siRNA sequences.

ADD REPLY • link 9.0 years ago Vivek.b ▴ 100

1

Entering edit mode

If you want to control each TF with its matching control, this is possible as well.

Create column data which looks like this (update: note that all columns should be factors)

TF condition allele
1  control   P
1  KD        P
1  control   M
1  KD        M
2  control   P
2  KD        P
2  control   M
2  KD        M

Then use a design of ~ TF + TF:condition + TF:allele + TF:condition:allele

The two interaction terms for TF:condition:allele are tests for differences in Paternal vs Maternal, controlling for differences in that TF's control. You will use results(dds, name=...) to extract each one separately.

ADD REPLY • link 9.0 years ago • updated 8.9 years ago Michael Love 41k

0

Entering edit mode

So when I made the desing matrix like the one above, I found out that resultNames(dds) looks like this

"Intercept" "TF" "TF.conditionKD" "TF.allelePaternal" "TF.conditionKD.allelePaternal"

Then I extracted results with name = "TF.conditionKD.allelePaternal". But this is maybe the combined one from two TFs?

I was wondering whether It would be good if I just divide my input into two separate data frames and run DESeq on them separately with the first design matrix you suggested above.

ADD REPLY • link 8.9 years ago Vivek.b ▴ 100

1

Entering edit mode

Can you post your column data and your design which produced these resultsNames?

I was thinking it would look like my example in the comment above.

ADD REPLY • link 8.9 years ago Michael Love 41k

0

Entering edit mode

My Design:

Row.names    condition    allele    TF
TF1_1_Mat    KD    Mat    1
TF1_2_Mat    KD    Mat    1
TF1_3_Mat    KD    Mat    1
TF1_1_Pat    KD    Pat    1
TF1_2_Pat    KD    Pat    1
TF1_3_Pat    KD    Pat    1
TF2_1_Mat    KD    Mat    2
TF2_2_Mat    KD    Mat    2
TF2_3_Mat    KD    Mat    2
TF2_1_Pat    KD    Pat    2
TF2_2_Pat    KD    Pat    2
TF2_3_Pat    KD    Pat    2
Scr2_1_Mat    Control    Mat    2
Scr2_2_Mat    Control    Mat    2
Scr2_3_Mat    Control    Mat    2
Scr2_1_Pat    Control    Pat    2
Scr2_2_Pat    Control    Pat    2
Scr2_3_Pat    Control    Pat    2
Scr1_1_Mat    Control    Mat    1
Scr1_2_Mat    Control    Mat    1
Scr1_3_Mat    Control    Mat    1
Scr1_1_Pat    Control    Pat    1
Scr1_2_Pat    Control    Pat    1
Scr1_3_Pat    Control    Pat    1

where row.names(design) = colnames(featureCount.Result). Condition,allele and TF are factors.

Then I run the following:

design$allele = relevel(design$allele),"Mat")

ase.deseq <- DESeqDataSetFromMatrix(featureCount.Result,design,design = ~ TF + TF:condition + TF:allele + TF:condition:allele)

ase.deseq <- DESeq(ase.deseq,betaPrior = FALSE)

ase.deseq <- DESeq(ase.deseq,betaPrior = FALSE)

Runs without any trouble..

colData(ase.deseq)

DataFrame with 24 rows and 4 columns
               condition   allele        TF sizeFactor
                <factor> <factor> <numeric>  <numeric>
TF1_1_Mat           KD    Mat         1  0.9903505
TF1_2_Mat           KD    Mat         1  0.8977864
TF1_3_Mat           KD    Mat         1  1.0096987
TF1_1_Pat         KD  Pat         1  0.9316722
TF1_2_Pat         KD  Pat         1  0.8573930
...                  ...      ...       ...        ...
Scr2_2_Mat     Control    Mat         1  0.9526810
Scr2_3_Mat     Control    Mat         1  1.1222974
Scr2_1_Pat   Control  Pat         1  1.1081599
Scr2_2_Pat   Control  Pat         1  0.8673878
Scr2_3_Pat   Control  Pat         1  1.0313058

resultsNames(ase.deseq)
[1] "Intercept"                    "TF"                           "TF.conditionKD"              
[4] "TF.allelePat"             "TF.conditionKD.allelePat"

ADD REPLY • link 8.9 years ago Vivek.b ▴ 100

1

Entering edit mode

The TF column needs to be a factor.

dds$TF = factor( dds$TF)

ADD REPLY • link 8.9 years ago Michael Love 41k

0

Entering edit mode

Thanks a lot, it was my mistake.

One last thing, I ran DESeq with this design, I get 40 significant genes for TF1 . Then I run it with previous design that I mentioned.(i.e. split the input by TF and fit the following design formula you suggested).

~ allele + sample + allele:sample

and I get 35 genes for TF1. (34 of them common bw them).

Can you please tell where this difference comes from, and which one is a better strategy then? I assume splitting by TF and running DESeq seperately is also correct, as it just means I am treating TF1 and TF2 as separate experiments.

ADD REPLY • link 8.9 years ago Vivek.b ▴ 100

1

Entering edit mode

Small fluctuations are expected when you change the samples involved in the analysis.

p-values are tail probabilities and therefore very sensitive to small changes in parameter estimates.

Remember also that the set of genes with FDR < alpha are not the "true set of DE genes", but at most a set enriched with the genes for which you had power to detect DE. By changing the samples you change the power as well.