Handling missing data, DESeq2
Hello All,

I've got some low quality RNA-seq data or data looking dissimilar in the sample distance analysis from a couple of samples that need removal. The samples were collected before, during and after a treatment. A question is - should the problematic samples be excluded only or the entire subject data to be removed in DESeq2 analysis? Thanks.

Regards Guan

Looks like you have 3 time points, but how many samples for each time point do you have? I think that as long as you have at least 3 samples per time point, you should be fine?

Please post your entire colData (masked if necessary) so we can better answer your question.

I post the entire colData here (masked) below for further evaluation, including 5 time points per subject for 10 subjects. Sample 40 (marked in bold) in S8 is dissimilar to all the rest of samples following the sample distance analysis. This observation is in line with what we expect given less volume of Sample 40 was carried over during RNA-seq library prep. In this case, should Sample 40 or all S8 samples (i.e. 5 samples) be excluded in DESeq2 analysis? Thanks.

    Subject Condition
1   S1  Base1
2   S1  During1
3   S1  During2
4   S1  Base2
5   S1  Post
6   S2  Base2
7   S2  During1
8   S2  Base1
9   S2  During2
10  S2  Post
11  S3  Base1
12  S3  Base2
13  S3  During1
14  S3  During2
15  S3  Post
16  S4  Base1
17  S4  Base2
18  S4  During1
19  S4  During2
20  S4  Post
21  S5  Base1
22  S5  During1
23  S5  Base2
24  S5  During2
25  S5  Post
26  S6  Base2
27  S6  During1
28  S6  During2
29  S6  Base1
30  S6  Post
31  S7  Base2
32  S7  During1
33  S7  During2
34  S7  Base1
35  S7  Post
36  S8  Base2
37  S8  During1
38  S8  Base1
39  S8  During2
**40    S8  Post**
41  S9  Base2
42  S9  During1
43  S9  Base1
44  S9  During2
45  S9  Post
46  S10 Base1
47  S10 During1
48  S10 During2
49  S10 Base2
50  S10 Post
You can just use ~subject + condition, and then extract results for condition. You can include the subjects with partial data.


