How to extract expressed genes from Affymetrix data

0

Entering edit mode

michelle_low ▴ 50

@michelle_low-5267

Last seen 9.6 years ago

Hi all, I have done the differential expression analysis below but I wanna know the total number of expressed genes/transcripts in each cell (wild type and the mir223 knockout cell). Can somebody help? Thanks in advance.. Regards, Michelle R version 2.13.2 (2011-09-30) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: i386-apple-darwin9.8.0/i386 (32-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [R.app GUI 1.42 (5910) i386-apple-darwin9.8.0] [Workspace restored from /Users/mlow/.RData] [History restored from /Users/mlow/.Rapp.history] > library(affy) Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation("pkgname")'. > library(limma) > pd=read.AnnotatedDataFrame("phenodata.txt",header=TRUE,sep="",row.n ames=1) > a=ReadAffy(filenames=rownames(pData(pd)),phenoData=pd,verbose=TRUE) 1 reading wildrep1.CEL ...instantiating an AffyBatch (intensity a 1004004x6 matrix)...done. Reading in : wildrep1.CEL Reading in : wildrep2.CEL Reading in : wildrep3.CEL Reading in : miR223rep1.CEL Reading in : miR223rep2.CEL Reading in : miR223rep3.CEL > x=rma(a) Background correcting Normalizing Calculating Expression > c=paste(pd$treatment,pd$n,sep="") > f=factor(c) > design=model.matrix(~0+f) > colnames(design)=levels(f) > fit=lmFit(x,design) > library(mouse4302.db) Loading required package: AnnotationDbi Loading required package: org.Mm.eg.db Loading required package: DBI > fit$genes$Symbol <- getSYMBOL(fit$genes$ID,"mouse4302.db") Error: could not find function "getSYMBOL" > library(annotate) > fit$genes$Symbol <- getSYMBOL(fit$genes$ID,"mouse4302.db") > contrast.matrix=makeContrasts(E="present-absent") Error in .Internal(inherits(x, what, which)) : 'x' is missing > contrast.matrix=makeContrasts(E="present-absent",levels=design) > fit2=contrasts.fit(fit,contrast.matrix) > fit2=eBayes(fit2) > results1 <-topTable (fit2, coef=1, p.value=0.3,number=nrow(fit2)) > dim(results1) [1] 889 8 > results1 <-topTable (fit2, coef=1, p.value=0.5,number=nrow(fit2)) > dim(results1) [1] 2997 8 > results1 <-topTable (fit2, coef=2, p.value=0.5,number=nrow(fit2)) Error in toptable(fit = fit[c("coefficients", "stdev.unscaled")], coef = coef, : subscript out of bounds > write.table(results1, file="WT-KO.txt") > results1 <-topTable (fit2, coef=2, p.value=0.4,number=nrow(fit2)) Error in toptable(fit = fit[c("coefficients", "stdev.unscaled")], coef = coef, : subscript out of bounds > results1 <-topTable (fit2, coef=1, p.value=0.4,number=nrow(fit2)) > dim(results1) [1] 1668 8 > write.table(results1, file="WT-KO.txt")

GUI GUI • 1.9k views

ADD COMMENT • link updated 11.8 years ago by Belinda Phipson ▴ 130 • written 11.8 years ago by michelle_low ▴ 50

0

Entering edit mode

Belinda Phipson ▴ 130

@belinda-phipson-5333

Last seen 9.6 years ago

Hi Michelle The function propexpr() in the limma package will estimate the proportion of expressed probes in each array, as long as there are some negative controls present in your data. You can also estimate the number of probes that are not differentially expressed between WT and knock-out using the convest() function in limma. It takes a vector of p-values. The proportion of differentially expressed genes will be 1-convest(p-vals). Cheers, Belinda -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of michelle_low Sent: Monday, 23 July 2012 3:56 PM To: bioconductor at r-project.org Subject: [BioC] How to extract expressed genes from Affymetrix data Hi all, I have done the differential expression analysis below but I wanna know the total number of expressed genes/transcripts in each cell (wild type and the mir223 knockout cell). Can somebody help? Thanks in advance.. Regards, Michelle R version 2.13.2 (2011-09-30) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: i386-apple-darwin9.8.0/i386 (32-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [R.app GUI 1.42 (5910) i386-apple-darwin9.8.0] [Workspace restored from /Users/mlow/.RData] [History restored from /Users/mlow/.Rapp.history] > library(affy) Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation("pkgname")'. > library(limma) > pd=read.AnnotatedDataFrame("phenodata.txt",header=TRUE,sep="",row.name s=1) > a=ReadAffy(filenames=rownames(pData(pd)),phenoData=pd,verbose=TRUE) 1 reading wildrep1.CEL ...instantiating an AffyBatch (intensity a 1004004x6 matrix)...done. Reading in : wildrep1.CEL Reading in : wildrep2.CEL Reading in : wildrep3.CEL Reading in : miR223rep1.CEL Reading in : miR223rep2.CEL Reading in : miR223rep3.CEL > x=rma(a) Background correcting Normalizing Calculating Expression > c=paste(pd$treatment,pd$n,sep="") > f=factor(c) > design=model.matrix(~0+f) > colnames(design)=levels(f) > fit=lmFit(x,design) > library(mouse4302.db) Loading required package: AnnotationDbi Loading required package: org.Mm.eg.db Loading required package: DBI > fit$genes$Symbol <- getSYMBOL(fit$genes$ID,"mouse4302.db") Error: could not find function "getSYMBOL" > library(annotate) > fit$genes$Symbol <- getSYMBOL(fit$genes$ID,"mouse4302.db") > contrast.matrix=makeContrasts(E="present-absent") Error in .Internal(inherits(x, what, which)) : 'x' is missing > contrast.matrix=makeContrasts(E="present-absent",levels=design) > fit2=contrasts.fit(fit,contrast.matrix) > fit2=eBayes(fit2) > results1 <-topTable (fit2, coef=1, p.value=0.3,number=nrow(fit2)) > dim(results1) [1] 889 8 > results1 <-topTable (fit2, coef=1, p.value=0.5,number=nrow(fit2)) > dim(results1) [1] 2997 8 > results1 <-topTable (fit2, coef=2, p.value=0.5,number=nrow(fit2)) Error in toptable(fit = fit[c("coefficients", "stdev.unscaled")], coef = coef, : subscript out of bounds > write.table(results1, file="WT-KO.txt") > results1 <-topTable (fit2, coef=2, p.value=0.4,number=nrow(fit2)) Error in toptable(fit = fit[c("coefficients", "stdev.unscaled")], coef = coef, : subscript out of bounds > results1 <-topTable (fit2, coef=1, p.value=0.4,number=nrow(fit2)) > dim(results1) [1] 1668 8 > write.table(results1, file="WT-KO.txt") _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 11.8 years ago Belinda Phipson ▴ 130

0

Entering edit mode

Hi Belinda, Thanks for the reply. After reading the manual, I'm still not quite sure how to use the functions. If I start from the beginning and use only one cell (wild type) with 3 replicates. I wanna get only the expressed genes in this cell. I am not sure if the following script is correct to find these genes (I used P/M/A calling and extract only the probes with 'present' ). I also have issue with the annotation. I can't seem to make it work. After writing them into text file or excel file, they were in mess. I appreciate very much for any help.. Thanks, Michelle R version 2.13.2 (2011-09-30) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: i386-apple-darwin9.8.0/i386 (32-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [R.app GUI 1.42 (5910) i386-apple-darwin9.8.0] [Workspace restored from /Users/mlow/.RData] [History restored from /Users/mlow/.Rapp.history] > library(affy) Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation("pkgname")'. > arrays=ReadAffy() > arrayRMA=rma(arrays) Background correcting Normalizing Calculating Expression > arrayRMAtable=exprs(arrayRMA) > write.exprs(arrayRMA, "RMAvalue.txt") > arrayPMA=mas5calls(arrays) Getting probe level data... Computing p-values Making P/M/A Calls write.exprs(arrayRMA, "RMAvalue > write.exprs(arrayPMA, "PMAvalue.txt") > arrayPMA=exprs(arrayPMA) > present=rowSums(arrayPMA=='P') > present=which(present==ncol(arrayPMA)) > rmafilter=arrayRMA[present,] > write.exprs(rmafilter,"result.txt") > dim(rmafilter) Features Samples 16485 3 > library(mouse4302.db) Loading required package: AnnotationDbi Loading required package: org.Mm.eg.db Loading required package: DBI > all=mget(names(rmafilter),mouse4302GENENAME) > write.table(all, "all.txt") ---- On Sun, 22 Jul 2012 23:48:06 -0700 Belinda Phipson wrote ---- >Hi Michelle > >The function propexpr() in the limma package will estimate the proportion of >expressed probes in each array, as long as there are some negative controls >present in your data. > >You can also estimate the number of probes that are not differentially >expressed between WT and knock-out using the convest() function in limma. It >takes a vector of p-values. The proportion of differentially expressed genes >will be 1-convest(p-vals). > >Cheers, >Belinda > >-----Original Message----- >From: bioconductor-bounces at r-project.org >[mailto:bioconductor-bounces at r-project.org] On Behalf Of michelle_low >Sent: Monday, 23 July 2012 3:56 PM >To: bioconductor at r-project.org >Subject: [BioC] How to extract expressed genes from Affymetrix data > >Hi all, > >I have done the differential expression analysis below but I wanna know the >total number of expressed genes/transcripts in each cell (wild type and the >mir223 knockout cell). Can somebody help? Thanks in advance.. > > >Regards, >Michelle > > > > R version 2.13.2 (2011-09-30) >Copyright (C) 2011 The R Foundation for Statistical Computing >ISBN 3-900051-07-0 >Platform: i386-apple-darwin9.8.0/i386 (32-bit) > >R is free software and comes with ABSOLUTELY NO WARRANTY. >You are welcome to redistribute it under certain conditions. >Type 'license()' or 'licence()' for distribution details. > > Natural language support but running in an English locale > >R is a collaborative project with many contributors. >Type 'contributors()' for more information and >'citation()' on how to cite R or R packages in publications. > >Type 'demo()' for some demos, 'help()' for on-line help, or >'help.start()' for an HTML browser interface to help. >Type 'q()' to quit R. > >[R.app GUI 1.42 (5910) i386-apple-darwin9.8.0] > >[Workspace restored from /Users/mlow/.RData] >[History restored from /Users/mlow/.Rapp.history] > >> library(affy) >Loading required package: Biobase > >Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'browseVignettes()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation("pkgname")'. > >> library(limma) >> >pd=read.AnnotatedDataFrame("phenodata.txt",header=TRUE,sep="",row.nam es=1) >> a=ReadAffy(filenames=rownames(pData(pd)),phenoData=pd,verbose=TRUE) >1 reading wildrep1.CEL ...instantiating an AffyBatch (intensity a 1004004x6 >matrix)...done. >Reading in : wildrep1.CEL >Reading in : wildrep2.CEL >Reading in : wildrep3.CEL >Reading in : miR223rep1.CEL >Reading in : miR223rep2.CEL >Reading in : miR223rep3.CEL >> x=rma(a) >Background correcting >Normalizing >Calculating Expression >> c=paste(pd$treatment,pd$n,sep="") >> f=factor(c) >> design=model.matrix(~0+f) >> colnames(design)=levels(f) >> fit=lmFit(x,design) >> library(mouse4302.db) >Loading required package: AnnotationDbi >Loading required package: org.Mm.eg.db >Loading required package: DBI > > >> fit$genes$Symbol <- getSYMBOL(fit$genes$ID,"mouse4302.db") >Error: could not find function "getSYMBOL" >> library(annotate) >> fit$genes$Symbol <- getSYMBOL(fit$genes$ID,"mouse4302.db") >> contrast.matrix=makeContrasts(E="present-absent") >Error in .Internal(inherits(x, what, which)) : 'x' is missing >> contrast.matrix=makeContrasts(E="present-absent",levels=design) >> fit2=contrasts.fit(fit,contrast.matrix) >> fit2=eBayes(fit2) >> results1 <-topTable (fit2, coef=1, p.value=0.3,number=nrow(fit2)) >> dim(results1) >[1] 889 8 >> results1 <-topTable (fit2, coef=1, p.value=0.5,number=nrow(fit2)) >> dim(results1) >[1] 2997 8 >> results1 <-topTable (fit2, coef=2, p.value=0.5,number=nrow(fit2)) >Error in toptable(fit = fit[c("coefficients", "stdev.unscaled")], coef = >coef, : > subscript out of bounds >> write.table(results1, file="WT-KO.txt") >> results1 <-topTable (fit2, coef=2, p.value=0.4,number=nrow(fit2)) >Error in toptable(fit = fit[c("coefficients", "stdev.unscaled")], coef = >coef, : > subscript out of bounds >> results1 <-topTable (fit2, coef=1, p.value=0.4,number=nrow(fit2)) >> dim(results1) >[1] 1668 8 >> write.table(results1, file="WT-KO.txt") > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor > > >_____________________________________________________________________ _ >The information in this email is confidential and inten...{{dropped:5}}

ADD REPLY • link 11.7 years ago michelle_low ▴ 50

Login before adding your answer.