Question

symbol and filtering -affy hugene ST 1.0 -oligo limma

0

Entering edit mode

Rao,Xiayu ▴ 550

@raoxiayu-6003

Last seen 8.9 years ago

United States

Hi all, I have read the archives and found many useful information for analyzing microarray data, and many thanks to you and those posts. Now I have two main questions after I did some analyses using limma and oligo (please see my code attached). (1) In the topTable, some symbols are NAs. From the archives, I found James said that any probesets that interrogate multiple transcripts are set to return NA by default. I am wondering if we just leave those NAs there?? > topTable(fit, coef="typeT",adjust="BH") ID Symbol logFC AveExpr t P.Value adj.P.Val B 21074 8066214 TGM2 1.299535 9.693098 11.406010 3.642842e-08 0.0007610873 8.813239 5960 7914580 FNDC5 -1.940412 8.141962 -9.336465 3.810047e-07 0.0021143857 6.793619 569 7893074 <na> 1.057330 8.412720 8.940184 6.250135e-07 0.0026535447 6.351810 31030 8161865 PRUNE2 -2.528722 7.987195 -8.883848 6.714628e-07 0.0026535447 6.287414 I also did the following, and found 21934 out of 33295 transcript symbols were mapped. I am curious what we should do for those unmapped?? The results exclude controls as I read, right?? > x<- hugene10sttranscriptclusterSYMBOL > length(x) [1] 33295 > xx<- x[mappedkeys(x)] > length(xx) [1] 21934 (2) I saw someone did nonspecific filtering on Human Gene ST 1.0 Arrays like below. I tried ??nsFilter to understand more. But Is it necessary to do this filtering?? My analyses of illumina array and affy gene array were based on similar patient population, but the results were quite different. I got ~6000 significant transcripts (p.adjust()<0.05, which is z in my code) from the gene array data analysis whereas only 12 significant genes from the illuimna array data analysis. Do I need to do some filtering?? Please help!! > nsFilter(subEset,var.cutoff=0.5) # Removes AFFX genes, those without EntrezGene IDs, etc. Code attached: Main purpose: compare gene expression between tumor and normal samples (paired samples) (1) Illumina microarray - genomestudio "HumanHT-12 v4 Gene Expression BeadChip": library(limma) x<-read.ilmn(files="irina_second_first_sample0222-analysis3.txt",ctrlf iles="control_probe_file_2.txt",other.columns="Detection") targets <- readTargets() y <- neqc(x) expressed <- apply(y$other$Detection < 0.05,1,any) y <- y[expressed,] subject <- factor(targets$subject) type <- factor(targets$type, levels=c("N","T","M")) design <- model.matrix(~subject+type) fit <- lmFit(y, design) fit <- eBayes(fit) topTable(fit, coef="typeT") #heatmap: compare between Tumor and Normal ind1 <- p.adjust(fit$p.value[, 16], method = "BH") <0.05 z<-y[ind1,] heatmap.2(z$E, col=greenred(75), scale="row", key=T, density.info="none", trace="none") (2) Affymetrix human gene array ST 1.0 (examine gene level, similar patient samples as above): library(oligo) library(annotate) library(hugene10sttranscriptcluster.db) mydata <- read.celfiles(list.celfiles()) geneCore <- rma(mydata, target = "core") featureData(geneCore) <- getNetAffx(geneCore, "transcript") ID <- featureNames(geneCore) Symbol <- getSYMBOL(ID,"hugene10sttranscriptcluster.db") fData(geneCore) <- data.frame(ID=ID,Symbol=Symbol) library(limma) subject <- c("S4","S4","S405","S405","S430","S430","S465","S465","S518 ","S518","S495","S495","S540","S540","S545","S545","S548","S548","S615 ","S615","SP12","SP12") type <- rep(c("T","N"),11) subject <- factor(subject) type <- factor(type) design <- model.matrix(~subject+type) fit <- lmFit(geneCore, design) fit <- eBayes(fit) topTable(fit, coef="typeT",adjust="BH") ind1 <- p.adjust(fit$p.value[, 12], method = "BH") <0.05 z <- geneCore[ind1,] z2 <- exprs(z) heatmap.2(z2, col=greenred(75), scale="row", key=T, density.info="none", trace="none") I am using the latest R. Thanks a lot! Best regards, Xiayu [[alternative HTML version deleted]]

Microarray limma Microarray limma • 1.1k views

ADD COMMENT • link 10.8 years ago Rao,Xiayu ▴ 550