Entering edit mode
Hi all,
I have read the archives and found many useful information for
analyzing microarray data, and many thanks to you and those posts. Now
I have two main questions after I did some analyses using limma and
oligo (please see my code attached).
(1) In the topTable, some symbols are NAs. From the archives, I found
James said that any probesets that interrogate multiple transcripts
are set to return NA by default. I am wondering if we just leave those
NAs there??
> topTable(fit, coef="typeT",adjust="BH")
ID Symbol logFC
AveExpr t P.Value adj.P.Val
B
21074 8066214 TGM2 1.299535 9.693098
11.406010 3.642842e-08 0.0007610873 8.813239
5960 7914580 FNDC5 -1.940412 8.141962
-9.336465 3.810047e-07 0.0021143857 6.793619
569 7893074 <na> 1.057330 8.412720
8.940184 6.250135e-07 0.0026535447 6.351810
31030 8161865 PRUNE2 -2.528722 7.987195
-8.883848 6.714628e-07 0.0026535447 6.287414
I also did the following, and found 21934 out of 33295 transcript
symbols were mapped. I am curious what we should do for those
unmapped?? The results exclude controls as I read, right??
> x<- hugene10sttranscriptclusterSYMBOL
> length(x)
[1] 33295
> xx<- x[mappedkeys(x)]
> length(xx)
[1] 21934
(2) I saw someone did nonspecific filtering on Human Gene ST 1.0
Arrays like below. I tried ??nsFilter to understand more. But Is it
necessary to do this filtering?? My analyses of illumina array and
affy gene array were based on similar patient population, but the
results were quite different. I got ~6000 significant transcripts
(p.adjust()<0.05, which is z in my code) from the gene array data
analysis whereas only 12 significant genes from the illuimna array
data analysis. Do I need to do some filtering?? Please help!!
> nsFilter(subEset,var.cutoff=0.5)
# Removes AFFX genes, those without EntrezGene IDs, etc.
Code attached:
Main purpose: compare gene expression between tumor and normal samples
(paired samples)
(1) Illumina microarray - genomestudio "HumanHT-12 v4 Gene
Expression BeadChip":
library(limma)
x<-read.ilmn(files="irina_second_first_sample0222-analysis3.txt",ctrlf
iles="control_probe_file_2.txt",other.columns="Detection")
targets <- readTargets()
y <- neqc(x)
expressed <- apply(y$other$Detection < 0.05,1,any)
y <- y[expressed,]
subject <- factor(targets$subject)
type <- factor(targets$type, levels=c("N","T","M"))
design <- model.matrix(~subject+type)
fit <- lmFit(y, design)
fit <- eBayes(fit)
topTable(fit, coef="typeT")
#heatmap: compare between Tumor and Normal
ind1 <- p.adjust(fit$p.value[, 16], method = "BH") <0.05
z<-y[ind1,]
heatmap.2(z$E, col=greenred(75), scale="row", key=T,
density.info="none", trace="none")
(2) Affymetrix human gene array ST 1.0 (examine gene level,
similar patient samples as above):
library(oligo)
library(annotate)
library(hugene10sttranscriptcluster.db)
mydata <- read.celfiles(list.celfiles())
geneCore <- rma(mydata, target = "core")
featureData(geneCore) <- getNetAffx(geneCore, "transcript")
ID <- featureNames(geneCore)
Symbol <- getSYMBOL(ID,"hugene10sttranscriptcluster.db")
fData(geneCore) <- data.frame(ID=ID,Symbol=Symbol)
library(limma)
subject <- c("S4","S4","S405","S405","S430","S430","S465","S465","S518
","S518","S495","S495","S540","S540","S545","S545","S548","S548","S615
","S615","SP12","SP12")
type <- rep(c("T","N"),11)
subject <- factor(subject)
type <- factor(type)
design <- model.matrix(~subject+type)
fit <- lmFit(geneCore, design)
fit <- eBayes(fit)
topTable(fit, coef="typeT",adjust="BH")
ind1 <- p.adjust(fit$p.value[, 12], method = "BH") <0.05
z <- geneCore[ind1,]
z2 <- exprs(z)
heatmap.2(z2, col=greenred(75), scale="row", key=T,
density.info="none", trace="none")
I am using the latest R.
Thanks a lot!
Best regards,
Xiayu
[[alternative HTML version deleted]]