Manual annotation of ExpressionSet object created from scratch

0

Entering edit mode

Michael Muratet ▴ 420

@michael-muratet-3076

Last seen 11.3 years ago

Greetings I have an ExpressionSet object that I created from scratch with expression data for features identified with Ensembl transcript IDs. The ExpressionSet constructor wants a character string for annotation data. Is there another way to populate the slot? From an AnnotatedDataFrame? Should I write a function that pulls in the data with biomaRt? Thanks Mike

• 1.8k views

ADD COMMENT • link updated 17.2 years ago by Sean Davis 21k • written 17.2 years ago by Michael Muratet ▴ 420

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 10 months ago

United States

On Mon, Oct 13, 2008 at 5:34 PM, Michael Muratet <mmuratet at="" hudsonalpha.org=""> wrote: > Greetings > > I have an ExpressionSet object that I created from scratch with expression > data for features identified with Ensembl transcript IDs. The ExpressionSet > constructor wants a character string for annotation data. Is there another > way to populate the slot? From an AnnotatedDataFrame? Should I write a > function that pulls in the data with biomaRt? Hi, Mike. Perhaps you can show us what you mean. If you are talking about the annotation data slot, that is meant to be the string name of the annotation data package associated with the array. I guess that you do not have an annotation data package for the array, so you can leave out that slot when creating the ExpressionSet. If you have problems, it is best to post the code and, of course, your sessionInfo(). Sean

ADD COMMENT • link 17.2 years ago Sean Davis 21k

0

Entering edit mode

On Oct 13, 2008, at 4:48 PM, Sean Davis wrote: > On Mon, Oct 13, 2008 at 5:34 PM, Michael Muratet > <mmuratet at="" hudsonalpha.org=""> wrote: >> Greetings >> >> I have an ExpressionSet object that I created from scratch with >> expression >> data for features identified with Ensembl transcript IDs. The >> ExpressionSet >> constructor wants a character string for annotation data. Is there >> another >> way to populate the slot? From an AnnotatedDataFrame? Should I >> write a >> function that pulls in the data with biomaRt? > > Hi, Mike. Perhaps you can show us what you mean. If you are talking > about the annotation data slot, that is meant to be the string name of > the annotation data package associated with the array. I guess that > you do not have an annotation data package for the array, so you can > leave out that slot when creating the ExpressionSet. If you have > problems, it is best to post the code and, of course, your > sessionInfo(). Sean Here's what I'm trying to do.... > library("Biobase") > exprMatrix <- as.matrix(read.table("exprset.txt", header=TRUE, > sep="\t", row.names=1, as.is=TRUE)) > pData <- read.table("phenoData.txt", row.names=1, header=TRUE, > sep="\t") > phenoData <- new("AnnotatedDataFrame", data=pData) > rnaseq_exprs <- new("ExpressionSet", exprs=exprMatrix, > phenoData=phenoData) > save(rnaseq_exprs, file="rnaseq_data.Robj") > > The data consists of RNAseq reads that I have mapped to Ensembl transcripts and normalized appropriately, e.g., SL265 SL264 SL266 SL310 SL312 SL313 ENST00000369829 0 0 0 0.00288159443768686 0.000696405393229021 0.000473063478950364 ENST00000393415 0 0 0 0.000428628056614047 0.000621528594887718 0.00047497519763826 So far this looks like a fairly useful way of looking at the data. I'd like to be able to use all the functionality I see in the docs for annotation of ExpressionSets. The ExpressionSet vignette talks about using an AnnotatedData frame but it doesn't really say where it goes. I haven't seen an annotation data package for Ensembl although I see how you might be able to create one with biomaRt. I'm looking for some expert advice so I don't go down any blind alleys. Thanks Mike > Sean > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 17.2 years ago Michael Muratet ▴ 420

0

Entering edit mode

On Mon, Oct 13, 2008 at 6:00 PM, Michael Muratet <mmuratet at="" hudsonalpha.org=""> wrote: > > On Oct 13, 2008, at 4:48 PM, Sean Davis wrote: > >> On Mon, Oct 13, 2008 at 5:34 PM, Michael Muratet >> <mmuratet at="" hudsonalpha.org=""> wrote: >>> >>> Greetings >>> >>> I have an ExpressionSet object that I created from scratch with >>> expression >>> data for features identified with Ensembl transcript IDs. The >>> ExpressionSet >>> constructor wants a character string for annotation data. Is there >>> another >>> way to populate the slot? From an AnnotatedDataFrame? Should I write a >>> function that pulls in the data with biomaRt? >> >> Hi, Mike. Perhaps you can show us what you mean. If you are talking >> about the annotation data slot, that is meant to be the string name of >> the annotation data package associated with the array. I guess that >> you do not have an annotation data package for the array, so you can >> leave out that slot when creating the ExpressionSet. If you have >> problems, it is best to post the code and, of course, your >> sessionInfo(). > > Sean > > Here's what I'm trying to do.... > >> library("Biobase") >> exprMatrix <- as.matrix(read.table("exprset.txt", header=TRUE, sep="\t", >> row.names=1, as.is=TRUE)) >> pData <- read.table("phenoData.txt", row.names=1, header=TRUE, sep="\t") >> phenoData <- new("AnnotatedDataFrame", data=pData) >> rnaseq_exprs <- new("ExpressionSet", exprs=exprMatrix, >> phenoData=phenoData) >> save(rnaseq_exprs, file="rnaseq_data.Robj") >> >> > > The data consists of RNAseq reads that I have mapped to Ensembl transcripts > and normalized appropriately, e.g., > > SL265 SL264 SL266 SL310 SL312 SL313 > ENST00000369829 0 0 0 0.00288159443768686 > 0.000696405393229021 0.000473063478950364 > ENST00000393415 0 0 0 0.000428628056614047 > 0.000621528594887718 0.00047497519763826 > > So far this looks like a fairly useful way of looking at the data. > > I'd like to be able to use all the functionality I see in the docs for > annotation of ExpressionSets. The ExpressionSet vignette talks about using > an AnnotatedData frame but it doesn't really say where it goes. I haven't > seen an annotation data package for Ensembl although I see how you might be > able to create one with biomaRt. I'm looking for some expert advice so I > don't go down any blind alleys. For building annotation packages, see the AnnotationDbi package and the SQLForge vignette. See the Vignettes in Biobase for discussion of AnnotatedDataFrame. In short, though, an ExpressionSet contains two AnnotatedDataFrames, one for the sample information (the phenoData) and the other for the features on the array (the featureData). The featureData slot is often redundant if you build an annotation data package. However, you could use it to store a data frame of data from ensembl if you like. Sean

ADD REPLY • link 17.2 years ago Sean Davis 21k

0

Entering edit mode

> > For building annotation packages, see the AnnotationDbi package and > the SQLForge vignette. See the Vignettes in Biobase for discussion of > AnnotatedDataFrame. In short, though, an ExpressionSet contains two > AnnotatedDataFrames, one for the sample information (the phenoData) > and the other for the features on the array (the featureData). The > featureData slot is often redundant if you build an annotation data > package. However, you could use it to store a data frame of data from > ensembl if you like. > > Sean > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > Hi Michael, If you really want to make an annotation package where ensembl IDs are the main IDs to everything, then you are going to have to first make a mapping of the ensembl IDs to entrez gene IDs. This information is available for a lot of species already and so it can probably be found in the organism package that matches the critter you are working on (org.Hs.eg.db for human). Then you could use that mapping to make a custom annotation package where the ensembl IDs are basically presented as if they were the "probes". But the mappings in that case should be ok. However, I think its worth noting that unless you have a more complete ensembl to entrez ID mapping from another source, this is all just represents a reprocessing of the existing data that can already be found in the mapping of the appropriate organism package. Marc

ADD REPLY • link 17.2 years ago Marc Carlson ★ 7.2k

Login before adding your answer.