I have a final report methylation data for ~800k rows for CpG sites for 50 samples. There are many missing-values (NA) in this dataset which I have a problem when I want use Normalization and also differential methylation analysis in Limma. I just provide you a small sample of my data as follows: (Values of each cell is Beta-values)
CpG-sites | sample1 | sample2 | sample3 | sample4 | sample5 |
cg01017367 | 0.6735 | 0.7229 | 0.6696 | 0.6561 | 0.6043 |
cg01485780 | NA | 0.7923 | 0.7458 | NA | 0.7526 |
cg02276259 | 0.4328 | 0.4618 | 0.4860 | 0.4493 | 0.3947 |
cg04315069 | 0.7968 | NA | 0.7816 | 0.8490 | 0.7797 |
cg06291348 | 0.3715 | 0.3593 | NA | 0.3172 | 0.2958 |
cg07495256 | 0.8986 | 0.9079 | 0.9192 | 0.9116 | 0.8012 |
cg07920074 | 0.7049 | 0.7388 | 0.7777 | 0.7039 | NA |
My question is: How to handle these missing-data (NA) in this huge dataset (~800k rows + 50 columns)? Is there any package in R to consider missing data? Is there a fast program to impute missing data in R? Thanks in advance for any advise
limma handles missing values naturally in
lmFit
. You'll have to be more precise about the nature of your problem.But when I run "lmFit", I got the following error in Limma:
fit=lmFit(CpG, design)
Error in rowMeans(y$exprs, na.rm = TRUE) : 'x' must be numeric
and I also have problem for Normalization of these Beta-values