filter rows by missing values frequency
rgescudero



Does anybody has an R script to filter rows (genes) from a gene expression datamatrix, depending on the frequency of missing data along the samples? Let's say that I only want to analyze genes in a datamatrix having non-missing values in 90% of the samples.

Many thanks

Ramon

James W. MacDonald

United States

You don't need an R script, per se. Unless you call a one-liner a script. Assume you have a matrix called 'thematrix'.

thecount <- ceiling(ncol(thematrix) * 0.1)
filtered <- thematrix[rowSums(is.na(thematrix)) <= thecount,]