I'm going to use limma squeeVar function to estimate protein variance, for the input sample variances and degrees of freedom for the sample variances, I would like to know if I should take missing values into consideration or not. For example, the values in group 1 is 10.5, 11,11.2, NA,NA and in group 2 is 15,15.1,15.5, NA,NA. The df is 2n-2 that is 8. Or I should ignore the NAs, then that would be 4? Thank you!
I am not quite clear why you're not using the limma package directly, because it will compute all the variances and the df automatically for you for any high-throughput proteomics dataset.
If you are using squeezeVar() for a different more bespoke research project, then you need to explain the purpose of the study. I can't advise on whether to remove NAs or not without knowing anything about the data or the purposes for which it is being analysed. If you don't remove the NAs, then the variance would obviously simply be NA, so there are no df at all.
Thank you for your reply. I'm working on a proteomics data imputation project, I would like to describe the intensity distribution of a certain protein in an experimental group, so the protein's sample variance needs to be estimated. I found squeezeVar() shrinks observed sample variance towards a prior, but I'm not sure whether I should take NAs into consideration when my data looks like this.
group1: 16.56412567 NA NA NA 16.38149395; group2: 16.64612271 NA 16.22489667 NA NA
Hi Gordon,
Thank you for your reply. I'm working on a proteomics data imputation project, I would like to describe the intensity distribution of a certain protein in an experimental group, so the protein's sample variance needs to be estimated. I found squeezeVar() shrinks observed sample variance towards a prior, but I'm not sure whether I should take NAs into consideration when my data looks like this.
group1: 16.56412567 NA NA NA 16.38149395; group2: 16.64612271 NA 16.22489667 NA NA
Best wishes, Mengchun