read.phenoData vs read.AnnotatedDataFrame

0

Entering edit mode

Johnstone, Alice ▴ 410

@johnstone-alice-2290

Last seen 9.6 years ago

For interest sake, I have found out why I wasn't getting my expected results when using read.AnnotatedDataFrame Turns out the error was made in the ReadAffy command, where I specified the filenames to be read from my AnnotatedDataFrame object. There was a typo error with a capital N ($FileName) rather than lowercase n ($Filename) as in my target file..whoops. However this meant the filename argument was ignored without the error message(!) and instead of using the information in the AnnotatedDataFrame object (which included filenames, but not alphabetically) it read the .cel files in alphabetical order from the working directory - hence the wrong file was given the wrong label (given by the order of Annotated object) and my comparisons were confused without being obvious as to why or where. Our solution: specify that filename is as.character so assignment of file to target is correct(after correcting $Filename) now that using read.AnnotatedDataFrame rather than readphenoData. Data<-ReadAffy(filenames=as.character(pData(pd)$Filename),phenoData=pd ) Hurrah! It may be beneficial to others, that if the filename argument isn't specified, that filenames are read from the phenoData object if included here. Thanks! -----Original Message----- From: Martin Morgan [mailto:mtmorgan@fhcrc.org] Sent: Thursday, 26 July 2007 11:49 a.m. To: Johnstone, Alice Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame Hi Alice -- "Johnstone, Alice" <alice.johnstone at="" esr.cri.nz=""> writes: > Using R2.5.0 and Bioconductor I have been following code to analysis > Affymetrix expression data: 2 treatments vs control. The original > code was run last year and used the read.phenoData command, however > with the newer version I get the error message Warning messages: > read.phenoData is deprecated, use read.AnnotatedDataFrame instead The > phenoData class is deprecated, use AnnotatedDataFrame (with > ExpressionSet) instead > > I use the read.AnnotatedDataFrame command, but when it comes to the > end of the analysis the comparison of the treatment to the controls > gets mixed up compared to what you get using the original > read.phenoData ie it looks like the 3 groups get labelled wrong and so > the comparisons are different (but they can still be matched up). > My questions are, > 1) do you need to set up your target file differently when using > read.AnnotatedDataFrame - what is the standard format? I can't quite tell where things are going wrong for you, so it would help if you can narrow down where the problem occurs. I think read.AnnotatedDataFrame should be comparable to read.phenoData. Does > pData(pd) look right? What about > pData(Data) and > pData(eset.rma) ? It's not important but pData(pd)$Target is the same as pd$Target. Since the analysis is on eset.rma, it probably makes sense to use the pData from there to construct your design matrix > targs<-factor(eset.rma$Target) > design<-model.matrix(~0+targs) > colnames(design)<-levels(targs) Does design look right? > I have three columns sample, filename and target. > 2) do you need to use a different model matrix to what I have? > 3) do you use a different command for making the contrasts? Depends on the question! If you're performing the same analysis as last year, then the model matrix and contrasts have to be the same! > I have included my code below if that is of any assistance. > Many Thanks! > Alice > > > > ##Read data > pd<-read.AnnotatedDataFrame("targets.txt",header=T,row.name="sample") > Data<-ReadAffy(filenames=pData(pd)$FileName,phenoData=pd) > ##normalisation > eset.rma<-rma(Data) > ##analysis > targs<-factor(pData(pd)$Target) > design<-model.matrix(~0+targs) > colnames(design)<-levels(targs) > fit<-lmFit(eset.rma,design) > cont.wt<-makeContrasts("treatment1-control","treatment2-control",level > s= > design) > fit2<-contrasts.fit(fit,cont.wt) > fit2.eb<-eBayes(fit2) > testconts<-classifyTestsF(fit2.eb,p.value=0.01) > topTable(fit2.eb,coef=2,n=300) > topTable(fit2.eb,coef=1,n=300) > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Bioconductor / Computational Biology http://bioconductor.org

• 2.5k views

ADD COMMENT • link updated 16.7 years ago by Seth Falcon ★ 7.4k • written 16.7 years ago by Johnstone, Alice ▴ 410

0

Entering edit mode

Steven McKinney ▴ 310

@steven-mckinney-1754

Last seen 9.6 years ago

Hi Alice, A coding alternative that helps in debugging those mis-spelled column names is to use the square bracket "[" extractor instead of the dollar "$" extractor for data frames, e.g. > Data<-ReadAffy(filenames=as.character(pData(pd)[T, "Filename"]),phenoData=pd) instead of > Data<-ReadAffy(filenames=as.character(pData(pd)$Filename),phenoDat a=pd) This will throw an error when you try to reference a column that does not exist, instead of silently returning NULL as you experienced. pData(pd)[T, "Filename"] will return the same result as pData(pd)$Filename if the column exists. The "T" or "TRUE" in the row argument position ensures that an error message is returned, as pData(pd)[, "FileName"] with no row index argument will also silently return a NULL as does pData(pd)$FileName Example: > foo <- data.frame(Filename = c("a", "b")) > foo$FileName NULL > foo[, "FileName"] NULL > foo[T, "FileName"] Error in `[.data.frame`(foo, T, "FileName") : undefined columns selected > all.equal(foo$Filename, foo[T, "Filename"]) [1] TRUE > sessionInfo() R version 2.5.1 (2007-06-27) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7] "base" Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney +at+ bccrc +dot+ ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch on behalf of Johnstone, Alice Sent: Wed 8/1/2007 7:20 PM To: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame For interest sake, I have found out why I wasn't getting my expected results when using read.AnnotatedDataFrame Turns out the error was made in the ReadAffy command, where I specified the filenames to be read from my AnnotatedDataFrame object. There was a typo error with a capital N ($FileName) rather than lowercase n ($Filename) as in my target file..whoops. However this meant the filename argument was ignored without the error message(!) and instead of using the information in the AnnotatedDataFrame object (which included filenames, but not alphabetically) it read the .cel files in alphabetical order from the working directory - hence the wrong file was given the wrong label (given by the order of Annotated object) and my comparisons were confused without being obvious as to why or where. Our solution: specify that filename is as.character so assignment of file to target is correct(after correcting $Filename) now that using read.AnnotatedDataFrame rather than readphenoData. Data<-ReadAffy(filenames=as.character(pData(pd)$Filename),phenoData=pd ) Hurrah! It may be beneficial to others, that if the filename argument isn't specified, that filenames are read from the phenoData object if included here. Thanks! -----Original Message----- From: Martin Morgan [mailto:mtmorgan@fhcrc.org] Sent: Thursday, 26 July 2007 11:49 a.m. To: Johnstone, Alice Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame Hi Alice -- "Johnstone, Alice" <alice.johnstone at="" esr.cri.nz=""> writes: > Using R2.5.0 and Bioconductor I have been following code to analysis > Affymetrix expression data: 2 treatments vs control. The original > code was run last year and used the read.phenoData command, however > with the newer version I get the error message Warning messages: > read.phenoData is deprecated, use read.AnnotatedDataFrame instead The > phenoData class is deprecated, use AnnotatedDataFrame (with > ExpressionSet) instead > > I use the read.AnnotatedDataFrame command, but when it comes to the > end of the analysis the comparison of the treatment to the controls > gets mixed up compared to what you get using the original > read.phenoData ie it looks like the 3 groups get labelled wrong and so > the comparisons are different (but they can still be matched up). > My questions are, > 1) do you need to set up your target file differently when using > read.AnnotatedDataFrame - what is the standard format? I can't quite tell where things are going wrong for you, so it would help if you can narrow down where the problem occurs. I think read.AnnotatedDataFrame should be comparable to read.phenoData. Does > pData(pd) look right? What about > pData(Data) and > pData(eset.rma) ? It's not important but pData(pd)$Target is the same as pd$Target. Since the analysis is on eset.rma, it probably makes sense to use the pData from there to construct your design matrix > targs<-factor(eset.rma$Target) > design<-model.matrix(~0+targs) > colnames(design)<-levels(targs) Does design look right? > I have three columns sample, filename and target. > 2) do you need to use a different model matrix to what I have? > 3) do you use a different command for making the contrasts? Depends on the question! If you're performing the same analysis as last year, then the model matrix and contrasts have to be the same! > I have included my code below if that is of any assistance. > Many Thanks! > Alice > > > > ##Read data > pd<-read.AnnotatedDataFrame("targets.txt",header=T,row.name="sample") > Data<-ReadAffy(filenames=pData(pd)$FileName,phenoData=pd) > ##normalisation > eset.rma<-rma(Data) > ##analysis > targs<-factor(pData(pd)$Target) > design<-model.matrix(~0+targs) > colnames(design)<-levels(targs) > fit<-lmFit(eset.rma,design) > cont.wt<-makeContrasts("treatment1-control","treatment2-control",level > s= > design) > fit2<-contrasts.fit(fit,cont.wt) > fit2.eb<-eBayes(fit2) > testconts<-classifyTestsF(fit2.eb,p.value=0.01) > topTable(fit2.eb,coef=2,n=300) > topTable(fit2.eb,coef=1,n=300) > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Bioconductor / Computational Biology http://bioconductor.org _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 16.7 years ago Steven McKinney ▴ 310

0

Entering edit mode

Seth Falcon ★ 7.4k

@seth-falcon-992

Last seen 9.6 years ago

"Steven McKinney" <smckinney at="" bccrc.ca=""> writes: > Hi Alice, > > A coding alternative that helps in debugging > those mis-spelled column names is to use the > square bracket "[" extractor instead of the > dollar "$" extractor for data frames, e.g. > > > Data<-ReadAffy(filenames=as.character(pData(pd)[T, "Filename"]),phenoData=pd) > > instead of > > > Data<-ReadAffy(filenames=as.character(pData(pd)$Filename),phenoD ata=pd) > > This will throw an error when you try to reference a column > that does not exist, instead of silently returning NULL > as you experienced. > > pData(pd)[T, "Filename"] will return the same result as > pData(pd)$Filename if the column exists. > > The "T" or "TRUE" in the row argument position ensures that an > error message is returned, as I like this suggestion, but want to add that using T instead of TRUE is a bad idea: T = FALSE is a perfectly valid assignment whereas TRUE is a keyword. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/

ADD COMMENT • link 16.7 years ago Seth Falcon ★ 7.4k

Login before adding your answer.