I have a 3-condition, 4-timepoint, 3-replicates model and data that is whole proteomics mass spectrometry. My data (normalized LFQ values) contains missing values that are both Missing At Random (MAR) and Missing Not At Random (MNAR). I am sure of this because a lot of proteins have missing values only in one condition and it is of biological relevance that they are not being detected in that particular condition because they are very low abundant.
My problem is I am looking for a way that can impute MAR and MNAR values differently. If I impute all of them based on normal distribution fit, it skews the data too much to low intensity values which affects the later differential expression analysis later.
Here is my design matrix:
Cond Time Batch Rep Fed_2_B_2 Fed 2 B 2 Fed_2_D_1 Fed 2 D 1 Fed_2_D_2 Fed 2 D 2 Fed_4_A_1 Fed 4 A 1 Fed_4_B_2 Fed 4 B 2 Fed_4_C_3 Fed 4 C 3 Fed_6_A_1 Fed 6 A 1 Fed_6_D_1 Fed 6 D 1 Fed_6_D_2 Fed 6 D 2 Fed_24_A_1 Fed 24 A 1 Fed_24_C_3 Fed 24 C 3 Fed_24_D_1 Fed 24 D 1 Fasted_2_A_1 Fasted 2 A 1 Fasted_2_C_3 Fasted 2 C 3 Fasted_2_D_1 Fasted 2 D 1 Fasted_4_B_2 Fasted 4 B 2 Fasted_4_C_3 Fasted 4 C 3 Fasted_4_D_1 Fasted 4 D 1 Fasted_6_A_1 Fasted 6 A 1 Fasted_6_B_2 Fasted 6 B 2 Fasted_6_C_3 Fasted 6 C 3 Fasted_24_B_2 Fasted 24 B 2 Fasted_24_C_3 Fasted 24 C 3 Fasted_24_D_1 Fasted 24 D 1 Refed_2_A_1 Refed 2 A 1 Refed_2_C_3 Refed 2 C 3 Refed_2_D_1 Refed 2 D 1 Refed_4_B_2 Refed 4 B 2 Refed_4_D_1 Refed 4 D 1 Refed_4_D_2 Refed 4 D 2 Refed_6_B_2 Refed 6 B 2 Refed_6_C_3 Refed 6 C 3 Refed_6_D_1 Refed 6 D 1 Refed_24_A_1 Refed 24 A 1 Refed_24_C_3 Refed 24 C 3 Refed_24_D_1 Refed 24 D 1
I was trying to use proDA (https://github.com/const-ae/proDA) to see if it can do a better job to fill out these missing values. However, I am not sure if it can work with more than 2 conditions and if I do any filtering of the data before putting it in to proDA. Before I have been filtering the data to keep only those proteins which have atleast 8 out of 12 non-zero values in atleast 2 out of the 3 conditions. This filtering reduces nrow (proteins) from 4128 initially to 868 proteins.
Any help would be appreciated.