Hello,
I am working on RNA-seq data which consists of 15 samples:
Sample | Condition | Type |
X1.1 | LType1 | UR |
X1.2 | LType1 | UR |
X1.3 | LType1 | UR |
X2.1 | LType2 | UR |
X2.2 | LType2 | UR |
X2.3 | LType2 | UR |
X3.1 | LType2 | UR |
X3.2 | LType2 | UR |
X3.3 | LType2 | UR |
X4.1 | LType1 | DR |
X4.2 | LType1 | DR |
X4.3 | LType1 | DR |
X5.1 | LType2 | DR |
X5.1 | LType2 | DR |
X5.2 | LType2 | DR |
Although the Ligand Type (LType) was used rather than Sample to avoid “model matrix not full rank”, either Sample or Sample set (e.g. X1, X2) the following formula was used: design=~Type+Condition+Type:Condition
The comparison we’re interested in is between the UR and the DR, accounting for the differences in Sample/Condition.
The commands used are:
dds = DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~Type+Condition+Type:Condition) dds = DESeq(dds, test="LRT", reduced=~Type:Condition) res = results(dds, name="type_DR_vs_UR")
I have 3 questions:
1) Is the correct way to assess the comparison I am interested in?
2) Is the inclusion of an interaction term justified or not?
3) Is there a way in DESeq2 to obtain a single good-of-fit statistic for the model?
Many thanks for any comments!
R version 3.3.1, DESeq2_1.14.1
Hello, Michael!
a question about the goodness of fit. Can I use the number of DEGs identified to determine whether the model should include a certain variable? I mean, If after adding a variable, the number of DEGs increases significantly, does it mean that I should add this variable to the model?
I'm not a fan of determining the design by # DEG.
My approach is to include variables that I believe may affect the counts. If there are a lot of technical variables and not many samples then I use SVA or RUV.