Hello,

I am working on RNA-seq data which consists of 15 samples:

Sample | Condition | Type |

X1.1 | LType1 | UR |

X1.2 | LType1 | UR |

X1.3 | LType1 | UR |

X2.1 | LType2 | UR |

X2.2 | LType2 | UR |

X2.3 | LType2 | UR |

X3.1 | LType2 | UR |

X3.2 | LType2 | UR |

X3.3 | LType2 | UR |

X4.1 | LType1 | DR |

X4.2 | LType1 | DR |

X4.3 | LType1 | DR |

X5.1 | LType2 | DR |

X5.1 | LType2 | DR |

X5.2 | LType2 | DR |

Although the Ligand Type (LType) was used rather than Sample to avoid “model matrix not full rank”, either Sample or Sample set (e.g. X1, X2) the following formula was used: design=~Type+Condition+Type:Condition

The comparison we’re interested in is between the UR and the DR, accounting for the differences in Sample/Condition.

The commands used are:

dds = DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~Type+Condition+Type:Condition) dds = DESeq(dds, test="LRT", reduced=~Type:Condition) res = results(dds, name="type_DR_vs_UR")

I have 3 questions:

1) Is the correct way to assess the comparison I am interested in?

2) Is the inclusion of an interaction term justified or not?

3) Is there a way in DESeq2 to obtain a single good-of-fit statistic for the model?

Many thanks for any comments!

R version 3.3.1, DESeq2_1.14.1

Hello, Michael!

a question about the goodness of fit. Can I use the number of DEGs identified to determine whether the model should include a certain variable? I mean, If after adding a variable, the number of DEGs increases significantly, does it mean that I should add this variable to the model?

I'm not a fan of determining the design by # DEG.

My approach is to include variables that I believe may affect the counts. If there are a lot of technical variables and not many samples then I use SVA or RUV.