ExpressionSet: Remove NA values above a certain threshold
1
1
Entering edit mode
philipp24 ▴ 30
@philipp24-8672
Last seen 7.5 years ago
Germany

Dear all,

I build the following simple ExpressionSet in R:

dataDirectory <- system.file("extdata", package = "Biobase")
exprsFile <- "path to expression data.txt"
exprs <- as.matrix(read.table(exprsFile, header = TRUE, sep = "\t", row.names = 1, as.is = TRUE))

pDataFile <- "path to phenotype data.txt"
pData <- read.table(pDataFile, row.names=1, header=TRUE, sep="\t")
phenoData <- new("AnnotatedDataFrame",data=pData)

exampleSet <- ExpressionSet(assayData=exprs, phenoData=phenoData)

Now I want to remove all samples with >80% NA values & also each gene with >50% NA values. Is there a simple solution for that?

Thanks in advance for your help!

expressionset • 3.1k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 11 hours ago
The city by the bay

To identify the samples with too many NA's:

bad.sample <- colMeans(is.na(exprs)) > 0.8

To identify the genes with too many NA's (assuming you don't want to recalculate the proportions after removing samples):

bad.gene <- rowMeans(is.na(exprs)) > 0.5

To remove them:

newExampleSet <- exampleSet[!bad.gene,!bad.sample]
ADD COMMENT

Login before adding your answer.

Traffic: 722 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6