problem about hgu133plus2 annotation

0

Entering edit mode

Gina Liao ▴ 10

@gina-liao-4178

Last seen 9.6 years ago

Dear All, I have 20 chips, and I used R to standardize the CEL files.Then, i got an expression value data of all chips.And I also downloaded the annotation csv format from NetAffy.(HG-U133_Plus_2 Annotations, CSV format, Release 30 (22 MB, 11/15/09)) Here's my code. ########test = justRMA()eset.st = standardise(test) exprs.st = exprseset.st)e.out = exprs.stdim(e.out) #* 54675 20######## However, i found out that the order of the rownames(e.out) is a little different to the row name of hgu133plus2.csv. The order from 54630 to 54640 is not the same to these two rows. They should be the same,right? Is "hgu133plus2cdf" the problem? How could I solve it? Thanks!!!!! Best,Gina _________________________________________________________________ [[alternative HTML version deleted]]

hgu133plus2 hgu133plus2 • 1.1k views

ADD COMMENT • link updated 13.8 years ago by James W. MacDonald 65k • written 13.8 years ago by Gina Liao ▴ 10

0

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 7.7 years ago

United States

Hi Gina, I am afraid it's a little hard to tell what is going on here. For example, I don't see sessionInfo() so it is hard to tell what you were running. And I only have enough code to wildly speculate about what you were doing. You might want to see our posting guide here: http://www.bioconductor.org/docs/postingGuide.html Marc On 07/22/2010 02:11 AM, Gina Liao wrote: > Dear All, > I have 20 chips, and I used R to standardize the CEL files.Then, i got an expression value data of all chips.And I also downloaded the annotation csv format from NetAffy.(HG-U133_Plus_2 Annotations, CSV format, Release 30 (22 MB, 11/15/09)) > Here's my code. > ########test = justRMA()eset.st = standardise(test) > exprs.st = exprseset.st)e.out = exprs.stdim(e.out) #* 54675 20######## > However, i found out that the order of the rownames(e.out) is a little different to the row name of hgu133plus2.csv. The order from 54630 to 54640 is not the same to these two rows. > They should be the same,right? Is "hgu133plus2cdf" the problem? How could I solve it? > Thanks!!!!! > Best,Gina > _________________________________________________________________ > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 13.8 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 51 minutes ago

United States

Hi Gina, On 7/22/2010 5:11 AM, Gina Liao wrote: > > Dear All, > I have 20 chips, and I used R to standardize the CEL files.Then, i got an expression value data of all chips.And I also downloaded the annotation csv format from NetAffy.(HG-U133_Plus_2 Annotations, CSV format, Release 30 (22 MB, 11/15/09)) > Here's my code. > ########test = justRMA()eset.st = standardise(test) > exprs.st = exprseset.st)e.out = exprs.stdim(e.out) #* 54675 20######## > However, i found out that the order of the rownames(e.out) is a little different to the row name of hgu133plus2.csv. The order from 54630 to 54640 is not the same to these two rows. > They should be the same,right? Is "hgu133plus2cdf" the problem? How could I solve it? I would recommend you use the annotation packages that are available from Bioconductor rather than downloading the annotation packages from Affymetrix. The BioC annotation packages contain the same information, but are designed to be easily used from within R, and you will find the .csv files you can get from Affy are not as user-friendly. You can get the annotation package using biocLite(): biocLite("hgu133plus2.db") Note that there is no reason to expect that the order of annotation data will be the same as the order of expression data. Re-ordering things is exceedingly simple in R, so this point is irrelevant. Using the annotation packages will take some reading on your part, but once you get the hang of things, I think you will like how they work. You might start with library(hgu133plus2.db) ?hgu133plus2.db as well as openVignette() and choose the AnnotationDbi vignette. If you are interested in annotating the set of interesting genes from your experiment, you will want to look at the annaffy package, which will allow you to output both HTML and text files with your results and annotations for each gene. In addition, you might want to look at the affycoretools package, which helps automate some of the steps required to annotate results. This package is also integrated with limma, so you can go straight from your linear model fits to output in one function call. Best, Jim > Thanks!!!!! > Best,Gina > _________________________________________________________________ > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 13.8 years ago James W. MacDonald 65k

0

Entering edit mode

Hi Gina, I do agree with Jim, have a look at the mentioned packages. However if for some reason you still want to use the csv file from affy, do have a look the the merge() function. That should solve your problem. By the way, which one has 54630 rows and which 54640? Because if you use the affy csv file you might want to check wether the missing probeset ids are AFFX-xxxx probesets or some transcripts you want to keep in your data set. As for the AFFX-probesets you can ignore those because these are only some kind of internal controls you won't need for differential expression analysis. cheers Benjamin PS: Another thing: If you prefer some interface for annotation of not sooo big gene lists instead of coding some R-functions, have a look at BioMart on the ensembl webpage. As for my part, I always think R-code has the advantage, that you can save and use it as a kind of log-book. So you can always comeback to it and check what you have done and how. Am 22.07.2010 um 18:41 schrieb James W. MacDonald: > Hi Gina, > > On 7/22/2010 5:11 AM, Gina Liao wrote: >> >> Dear All, >> I have 20 chips, and I used R to standardize the CEL files.Then, i got an expression value data of all chips.And I also downloaded the annotation csv format from NetAffy.(HG-U133_Plus_2 Annotations, CSV format, Release 30 (22 MB, 11/15/09)) >> Here's my code. >> ########test = justRMA()eset.st = standardise(test) >> exprs.st = exprseset.st)e.out = exprs.stdim(e.out) #* 54675 20######## >> However, i found out that the order of the rownames(e.out) is a little different to the row name of hgu133plus2.csv. The order from 54630 to 54640 is not the same to these two rows. >> They should be the same,right? Is "hgu133plus2cdf" the problem? How could I solve it? > > I would recommend you use the annotation packages that are available from Bioconductor rather than downloading the annotation packages from Affymetrix. The BioC annotation packages contain the same information, but are designed to be easily used from within R, and you will find the .csv files you can get from Affy are not as user-friendly. > > You can get the annotation package using biocLite(): > > biocLite("hgu133plus2.db") > > Note that there is no reason to expect that the order of annotation data will be the same as the order of expression data. Re-ordering things is exceedingly simple in R, so this point is irrelevant. > > Using the annotation packages will take some reading on your part, but once you get the hang of things, I think you will like how they work. You might start with > > library(hgu133plus2.db) > ?hgu133plus2.db > > as well as > > openVignette() and choose the AnnotationDbi vignette. > > If you are interested in annotating the set of interesting genes from your experiment, you will want to look at the annaffy package, which will allow you to output both HTML and text files with your results and annotations for each gene. > > In addition, you might want to look at the affycoretools package, which helps automate some of the steps required to annotate results. This package is also integrated with limma, so you can go straight from your linear model fits to output in one function call. > > Best, > > Jim > > > >> Thanks!!!!! >> Best,Gina >> _________________________________________________________________ >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > ___________________________________________ Benjamin Otto, PhD University Medical Center Hamburg-Eppendorf Institute For Clinical Chemistry / Central Laboratories Campus Forschung N27 Martinistr. 52, D-20246 Hamburg Tel.: +49 40 7410 51908 Fax.: +49 40 7410 54971 ___________________________________________ -- Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG): Universit?tsklinikum Hamburg-Eppendorf K?rperschaft des ?ffentlichen Rechts Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. J?rg F. Debatin (Vorsitzender) Dr. Alexander Kirstein Joachim Pr?l? Prof. Dr. Dr. Uwe Koch-Gromus

ADD REPLY • link 13.8 years ago Benjamin Otto ▴ 830

Login before adding your answer.