RE : create an new AnnotatedDataFrame

0

Entering edit mode

Yad Ghavi-Helm ▴ 30

@yad-ghavi-helm-2284

Last seen 10.5 years ago

Hi Martin, >AnnotatedDataFrame coordinates a data.frame with it's metadata. From >your naming convention, I'm guessing that what your command is doing >is trying to coordinate an expression matrix with its varMetadata. In fact, that's exactely what I want to do. Since I don't have any annotation package for the chips I'm using, I would like to add usefull informations (for exemple correspondance between oligs ID's and Genenames) in the mData. My MetaData file looks like : gene A_75_P0000001 TEL01L A_75_P0000002 YAL067W-A A_75_P0000003 YAL067C A_75_P0000004 YAL067C A_75_P0000005 YAL067C A_75_P0000006 YAL067C A_75_P0000007 YAL067C A_75_P0000008 YAL067C with as many features as my assayData (exprs2) I think it is possible to do this, because I read: "It is also possible to record information about features that are unique to the experiment (e.g.,flagging particularly relevant features). This is done by creating or modifying an Annotated Data Frame like that for phenoData but with rownames of the AnnotatedDataFrame matching rows of the assaydata." in the Biobase "ExpressionSetIntroduction.pdf" manual. Yad. -------- Message d'origine-------- De: Martin Morgan [mailto:mtmorgan at fhcrc.org] Date: mer. 08/08/2007 18:54 ?: GHAVI-HELM Yad Cc: Bioconductor at stat.math.ethz.ch Objet : Re: [BioC] create an new AnnotatedDataFrame "GHAVI-HELM Yad" <yad.ghavi-helm at="" cea.fr=""> writes: > > > exprsFile<-"D:/exprsData.txt" > exprs<-read.table(exprsFile,header=TRUE,sep="",as.is=TRUE) > > pDataFile<-"D:/pData.txt" > pData<-read.table(pDataFile,header=TRUE, sep="", as.is=TRUE) > > metaData<-"D:/mData.txt" > mData<-read.table(metaData,header=TRUE,sep="",as.is=TRUE) > metData<-new("AnnotatedDataFrame",data=exprs2,varMetadata=mData) > > At this step I have the following error: > Error in `row.names<-.data.frame`(`*tmp*`, value = c("A", "B")) : > length of 'row.names' incorrect > > It seems strangle because "A" and "B" are the colnames of exprsData > (or the rownames of pData). AnnotatedDataFrame coordinates a data.frame with it's metadata. From your naming convention, I'm guessing that what your command is doing is trying to coordinate an expression matrix with its varMetadata. I think what you want to do is > phenoData = new("AnnotatedDataFrame", data=pData, varMetadata=mData) You might then use this to create an ExpressionSet (for example) > new("ExpressionSet", exprs=exprs, phenoData=phenoData) The read.AnnotatedDataFrame page might provide some additional hints on reading data from files; a warning is that read.AnnotatedDataFrame will change (hopefully for the better) in the next release of Bioconductor. Hope that helps, Martin > I tried to do : > > metData<-new("AnnotatedDataFrame",data=exprs2,varMetadata=mData, row.names=1) > > or > > rown=rownames(exprs) > metData<-new("AnnotatedDataFrame",data=exprs2,varMetadata=mData, row.names=rown) > > > but I steel got the same error > > hope anyone could help me... > > >> sessionInfo() > R version 2.5.0 (2007-04-23) > i386-pc-mingw32 > > locale: > LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETAR Y=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 > > attached base packages: > [1] "tcltk" "splines" "tools" "stats" "graphics" "grDevices" "utils" "datasets" "methods" "base" > > other attached packages: > YEAST convert marray tkWidgets DynDoc widgetTools arrayMagic genefilter survival vsn affy affyio limma > "1.16.0" "1.10.0" "1.14.0" "1.14.0" "1.14.0" "1.12.0" "1.14.0" "1.14.1" "2.31" "2.2.0" "1.14.2" "1.4.1" "2.10.5" > Biobase > "1.14.1" > > > > Yad. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Bioconductor / Computational Biology http://bioconductor.org

Annotation Survival Yeast Biobase genefilter tkWidgets marray convert arrayMagic affyio • 3.6k views

ADD COMMENT • link updated 17.5 years ago by Martin Morgan 25k • written 17.5 years ago by Yad Ghavi-Helm ▴ 30

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 29 days ago

United States

Hi Yad -- An AnnotatedDataFrame is a data frame, with additional information about the columns of the data frame (such as a longer description of the column name). The 'additional information about tthe columns' is the 'varMetadata' (i.e., *Metadata* about the *var*iables in the data frame). The varMetadata is itself a data frame, and it must have a column named 'labelDescription'. You need to create an AnnotatedDataFrame for your feature data. Suppose the 'MetaData' file is D:/MetaData.txt, then read the data into a data.frame with something like > featureDataFrame <- read.table("D:/MetaData",hheader=TRUE, + sep="",as.is=TRUE) you might also have 'meta-data' about the columns in the featureDataFrame, which you would arrange in another data.frame, but this is not essential. Then create an AnnotatedDataFrame > featureData <- new("AnnotatedDataFrame", data=featureDataFrame) and finally use this, along with the matrix of expression value, to create an ExpressionSet > es <- new("ExpressionSet", exprs=exprs, featureData=featureData) If you had phenotype data as well, you could do the same steps > phenoDataFile <- "D:/phenoData.txt" > phenoDataFrame <- read.table(phenoDataFile, + header=TRUE,sep="",as.is=TRUE) > phenoData <- new("AnnotatedDataFrame", data=phenoDataFrame) and create an ExpressionSet with both phenoData and featureData > es <- new("ExpressionSet", exprs=exprs, phenoData=phenoData, + featureData=featureData) Notice that featureData is really meant to mark up features with information _unique to the experiment_; what you might really want to do is to create an annotation package, as this can then be used by other experiments on the same chip, and by other Bioconductor software packages that expect something in the 'annotation' slot of an ExpressionSet. This could be fairly challenging, but would be better in the long run. Also, ExpressionSets are produced or readily constructed from several of the preprocessing steps that are often used, so perhaps creating an expression set 'by hand' is not really what you want to be doing. Martin "GHAVI-HELM Yad" <yad.ghavi-helm at="" cea.fr=""> writes: > Hi Martin, > >>AnnotatedDataFrame coordinates a data.frame with it's metadata. From >>your naming convention, I'm guessing that what your command is doing >>is trying to coordinate an expression matrix with its varMetadata. > > In fact, that's exactely what I want to do. > Since I don't have any annotation package for the chips I'm using, > I would like to add usefull informations (for exemple correspondance between oligs ID's and Genenames) in the mData. > > My MetaData file looks like : > > gene > A_75_P0000001 TEL01L > A_75_P0000002 YAL067W-A > A_75_P0000003 YAL067C > A_75_P0000004 YAL067C > A_75_P0000005 YAL067C > A_75_P0000006 YAL067C > A_75_P0000007 YAL067C > A_75_P0000008 YAL067C > > with as many features as my assayData (exprs2) > > I think it is possible to do this, because I read: > > "It is also possible to record information about features that are > unique to the experiment (e.g.,flagging particularly relevant > features). This is done by creating or modifying an Annotated Data > Frame like that for phenoData but with rownames of the > AnnotatedDataFrame matching rows of the assaydata." > > in the Biobase "ExpressionSetIntroduction.pdf" manual. > > Yad. > > > -------- Message d'origine-------- > De: Martin Morgan [mailto:mtmorgan at fhcrc.org] > Date: mer. 08/08/2007 18:54 > ?: GHAVI-HELM Yad > Cc: Bioconductor at stat.math.ethz.ch > Objet : Re: [BioC] create an new AnnotatedDataFrame > > "GHAVI-HELM Yad" <yad.ghavi-helm at="" cea.fr=""> writes: > >> >> >> exprsFile<-"D:/exprsData.txt" >> exprs<-read.table(exprsFile,header=TRUE,sep="",as.is=TRUE) >> >> pDataFile<-"D:/pData.txt" >> pData<-read.table(pDataFile,header=TRUE, sep="", as.is=TRUE) >> >> metaData<-"D:/mData.txt" >> mData<-read.table(metaData,header=TRUE,sep="",as.is=TRUE) >> metData<-new("AnnotatedDataFrame",data=exprs2,varMetadata=mData) >> >> At this step I have the following error: >> Error in `row.names<-.data.frame`(`*tmp*`, value = c("A", "B")) : >> length of 'row.names' incorrect >> >> It seems strangle because "A" and "B" are the colnames of exprsData >> (or the rownames of pData). > > AnnotatedDataFrame coordinates a data.frame with it's metadata. From > your naming convention, I'm guessing that what your command is doing > is trying to coordinate an expression matrix with its varMetadata. I > think what you want to do is > >> phenoData = new("AnnotatedDataFrame", data=pData, varMetadata=mData) > > You might then use this to create an ExpressionSet (for example) > >> new("ExpressionSet", exprs=exprs, phenoData=phenoData) > > The read.AnnotatedDataFrame page might provide some additional hints > on reading data from files; a warning is that read.AnnotatedDataFrame > will change (hopefully for the better) in the next release of > Bioconductor. > > Hope that helps, > > Martin > >> I tried to do : >> >> metData<-new("AnnotatedDataFrame",data=exprs2,varMetadata=mData, row.names=1) >> >> or >> >> rown=rownames(exprs) >> metData<-new("AnnotatedDataFrame",data=exprs2,varMetadata=mData, row.names=rown) >> >> >> but I steel got the same error >> >> hope anyone could help me... >> >> >>> sessionInfo() >> R version 2.5.0 (2007-04-23) >> i386-pc-mingw32 >> >> locale: >> LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETA RY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 >> >> attached base packages: >> [1] "tcltk" "splines" "tools" "stats" "graphics" "grDevices" "utils" "datasets" "methods" "base" >> >> other attached packages: >> YEAST convert marray tkWidgets DynDoc widgetTools arrayMagic genefilter survival vsn affy affyio limma >> "1.16.0" "1.10.0" "1.14.0" "1.14.0" "1.14.0" "1.12.0" "1.14.0" "1.14.1" "2.31" "2.2.0" "1.14.2" "1.4.1" "2.10.5" >> Biobase >> "1.14.1" >> >> >> >> Yad. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Martin Morgan > Bioconductor / Computational Biology > http://bioconductor.org > -- Martin Morgan Bioconductor / Computational Biology http://bioconductor.org

ADD COMMENT • link 17.5 years ago Martin Morgan 25k

Login before adding your answer.