Question

Missing value imputation

0

Entering edit mode

kamal.fartiyal84 ▴ 10

@kamalfartiyal84-7976

Last seen 5.3 years ago

Cancer Research UK Cambridge Institute

Hi,

I want to perform missing value imputation on TMT-tags based quantitative proteomics data. I would be performing mixed imputation by applying two different methods (MCAR/MNAR) on two different groups within same dataset. Should I perform the imputation on raw or log transformed peptide intensity data?

Kamal

msnbase • 1.9k views

ADD COMMENT • link 6.8 years ago kamal.fartiyal84 ▴ 10

score 0 · Answer 1 · 2017-07-26

I don't think it matters, as long as you wouldn't use a zero imputation for MNAR.

However, as you use TMT tags, one would expect your missing values to be the results of absent peptides, rather than the MS missing features, because samples were combined. If it is a typical shotgun experiment, one wouldn't expect many missing values; some features can have many missing values, and these should probably be filtered out completely.

Laurent Gatto · Answer 2 · 2017-07-27

Thanks Laurent for your reply. So in my dataset one condition is supposed to be have more missing peptide than the other due to biological reasons than the other. So filtering strategy I am employing is as below:

Condition A (5 Replicates) supposed to have more peptide than Condition B (5 Replicates)

-> Filter all the peptide completely missing from Condition A & B.
-> Keep peptide that are present in atleast 3 replicates of Condition A.
-> No such restriction on Condition B.
-> Apply MAR on Condition A and MNAR in Condition B (as here they are supposed to be biologically missing).
-> Another way is making average of peptide intensity of all replicate (for each peptide) in Condition A and assigning it to missing peptides (in other replicates) in Condition A. On the other hand giving minimum peptide intensity of all replicates (for each peptide) in Condition B and assigning it to missing peptides (in other replicates) in Condition B.

I usually remove the missing values in all my analysis but in this specific dataset due to the nature of biology I have to keep them for analysis. Hence, I would highly appreciate your feedback on the above outline as this is the first time I am using imputation in analysis.

Thanks.

Kamal