How to exclude outlier (Microarray ) from PCA in R
1
0
Entering edit mode
waiyin0923 • 0
@8954fa9d
Last seen 3.7 years ago
Hong Kong

I have done the quality controls like PCA and hierarchical clustering and found an outlier, which is Cancer5.CEL. How can I remove this outlier for Differential Gene Expression Analysis? I don't know the code, please help. Thank you!

enter image description here

my code:

crc.fac<- factor(c(rep("Cancer", 7),rep("Healthy",6)))
crc.df <- data.frame(crc = crc.fac,row.names = paste(crc.fac, rep(1:13, 1), sep = ''))
crc.mData <- data.frame(labelDescription = c("gene regulation"))
crc.mData
crc.pData <- new("AnnotatedDataFrame", data = crc.df, varMetadata = crc.mData)
validObject(crc.pData)
[1] TRUE
list.files(path = ".", pattern = ".CEL")
crc.df <- data.frame(crc.fac, filename = list.files(path =".",pattern=".CEL"),row.names =paste(crc.fac, rep(1:13, 1), sep = ''))
crc.affy <- read.affybatch(filename = list.files(path =".",pattern=".CEL", full.names = TRUE),
+ phenoData = crc.pData)
View(crc.affy)
crc_calls.eSet <- mas5calls.AffyBatch(crc.affy)
crc_calls.mx <- exprs(crc_calls.eSet)
crc.eSet <- rma(crc.affy)
crc_log2.mx <- exprs(crc.eSet)
head(crc_log2.mx)
boxplot(as.data.frame(crc_log2.mx), xlab = "", ylab = "Log2 rma signal", las = 2, main = "Sample Distributions")
crc_P_rate.nv <- apply(crc_calls.mx == "P", 2, sum) / nrow (crc_calls.mx)

quality controls:

check potential physical defects in the arrays

image(crc.affy[, 1])

PCA
pca <- prcomp(t(crc_log2.mx))
eigs <- pca$sdev^2
varexplained <- eigs/sum(eigs)
varexplained
barplot(varexplained * 100, ylab="% variance explained", xlab="principal components")
box()
plot(pca$x[, 1], pca$x[, 2], col=rep(rainbow(2), each=7,6), xlab="PC1", ylab="PC2", cex=3)
text(pca$x[, 1], pca$x[, 2], labels = colnames(crc_log2.mx))

I have found the below remove outlier code, but it seem didn't work on my case, please help!!

remove_outliers <- function(x, na.rm = TRUE, ...) {
  qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
  H <- 1.5 * IQR(x, na.rm = na.rm)
  y <- x
  y[x < (qnt[1] - H)] <- NA
  y[x > (qnt[2] + H)] <- NA
  y
}

# Removes all outliers from a data set
remove_all_outliers <- function(df){
  # We only want the numeric columns
  a<-df[,sapply(df, is.numeric)]
  b<-df[,!sapply(df, is.numeric)]
  a<-lapply(a,function(x) remove_outliers(x))
  d<-merge(a,b)
  d
}

# Removes all outliers from a data set
remove_all_outliers1 <- function(df){
  # We only want the numeric columns
  df[,sapply(df, is.numeric)] <- lapply(df[,sapply(df, is.numeric)], remove_outliers)
  df
}

remove_all_outliers2 <- function(df){
  df[] <- lapply(df, function(x) if (is.numeric(x))
    remove_outliers(x) else x)
  df
}

enter image description here

If I can't exclude outlier, it may affect the further processing "Differential gene expression analysis" Please help!

OutlierD pcaMethods MicroarrayData Microarray • 2.4k views
ADD COMMENT
0
Entering edit mode
Kevin Blighe ★ 4.0k
@kevin
Last seen 6 weeks ago
Republic of Ireland

Hey,

There is no strong evidence that it is an outlier. What is the percent explained variation on PC2? - this should be contained in your varexplained variable.

I have found the below remove outlier code, but it seem didn't work on my case, please help!!

You found some code, ran it, and do not understand how it functions? This code that you found appears to remove outliers based on IQR - if it identified no sample as outlier, then, as I mentioned, it supports the idea that there is (are) no statistical outlier(s) in your dataset.

Kevin

ADD COMMENT
0
Entering edit mode

Actually, I didn't know what remove outlier code can be applied to my case.

ADD REPLY

Login before adding your answer.

Traffic: 698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6