Question

ExpressionSet: Remove NA values above a certain threshold

1

Entering edit mode

philipp24 ▴ 30

@philipp24-8672

Last seen 7.5 years ago

Germany

Dear all,

I build the following simple ExpressionSet in R:

dataDirectory <- system.file("extdata", package = "Biobase")
exprsFile <- "path to expression data.txt"
exprs <- as.matrix(read.table(exprsFile, header = TRUE, sep = "\t", row.names = 1, as.is = TRUE))

pDataFile <- "path to phenotype data.txt"
pData <- read.table(pDataFile, row.names=1, header=TRUE, sep="\t")
phenoData <- new("AnnotatedDataFrame",data=pData)

exampleSet <- ExpressionSet(assayData=exprs, phenoData=phenoData)

Now I want to remove all samples with >80% NA values & also each gene with >50% NA values. Is there a simple solution for that?

Thanks in advance for your help!

expressionset • 3.1k views

ADD COMMENT • link updated 8.7 years ago by Aaron Lun ★ 28k • written 8.7 years ago by philipp24 ▴ 30

score 1 · Answer 1 · 2015-08-24

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 11 hours ago

The city by the bay

To identify the samples with too many NA's:

bad.sample <- colMeans(is.na(exprs)) > 0.8

To identify the genes with too many NA's (assuming you don't want to recalculate the proportions after removing samples):

bad.gene <- rowMeans(is.na(exprs)) > 0.5

To remove them:

newExampleSet <- exampleSet[!bad.gene,!bad.sample]

ADD COMMENT • link 8.7 years ago Aaron Lun ★ 28k