Why some data missing when I use R to analysis the NimbleGen array

0

Entering edit mode

陈娟 ▴ 70

@-4757

Last seen 9.7 years ago

Dear Professor,It seems that I have found the essence of my problem. When I loaded rawdata(.xys files) into R and preprocessed the rawdata using rma(), it seemed that the data from B73 genome (with the name like GRMZM2G130813_T01)were lost, and I can't extract them via grep(). There are 92492 features in database(GEO),whereas only 69557 features in my ExpressionSet. I guess my interest data are inside the losing dataset.Maybe it's the method I used that accounts for this matter, but I don't know the solution. Could you tell me how to solve it?Looking forward to your reply!!Thank you very much!Best RegardsMaggie [[alternative HTML version deleted]]

• 730 views

ADD COMMENT • link 12.8 years ago 陈娟 ▴ 70

0

Entering edit mode

Benilton Carvalho ★ 4.3k

@benilton-carvalho-1375

Last seen 4.2 years ago

Brazil/Campinas/UNICAMP

Maggie, the annotation package that you built contains the chip information (X/Y coordinates, probeset IDs). The preprocessing, through rma(), summarizes the intensities using the probeset IDs. The probeset IDs aren't necessarily transcript IDs. In the NDF, you have a column called SEQ_ID, which is the one that contains the probeset IDs. Open that file and look for the ID (GRMZM2G130813_T01) you refer to. If you find it in that column, then it must be on the output object (the object you get from rma) as well. If the NDF (or any other source) contains further information that you want to use on the downstream analyses (for example, gene associations or transcript IDs that you can link to probeset ID), then you need to load the NDF manually, appropriately extract the bits of information of interest and merge them with the preprocessed data. This is what I meant by 'genomic annotation' (apologies for not being careful on the initial explanation). About the phenoData slot: check the Biobase documentation (Section 4.2) http://www.bioconductor.org/packages/2.8/bioc/vignettes/Biobase/inst/d oc/ExpressionSetIntroduction.pdf best, b ps: your email messages are not being shown as they're supposed to and I believe it has to do with the fact that your email client is set to send it in HTML format... So it would be nice if you could set it to send messages in text format, so the messages are not garbled. phenoData <- new("AnnotatedDataFrame",data=data.frame(pdata)) On 22 July 2011 11:28, ?? <gtzxchj at="" hotmail.com=""> wrote: > > Dear Professor,It seems that I have found the essence of my problem. > When I loaded rawdata(.xys files) into R and preprocessed the rawdata using rma(), it seemed that the data from B73 genome (with the name like GRMZM2G130813_T01)were lost, and I can't ?extract ?them via grep(). There are ?92492 ?features in database(GEO),whereas only 69557 features in my ExpressionSet. I guess my interest data are inside the losing dataset.Maybe it's the method I used that accounts for this matter, but I don't know the solution. Could you tell me how to solve it?Looking forward to your reply!!Thank you very much!Best RegardsMaggie > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Successful people ask better questions, and as a result, they get better answers. (Tony Robbins)

ADD COMMENT • link 12.8 years ago Benilton Carvalho ★ 4.3k

0

Entering edit mode

陈娟 ▴ 70

@-4757

Last seen 9.7 years ago

Dear Professor,Thank you for your patience and kindness.I have thought that the NDF was loaded along with xys files using read.xysfiles(), or library(pd.090319.zea.kr.exptil) which was made based on the .ndf file and .xys files. But you meant that I should load it manually, and which command can achieve this ?I have searched the NDF and couldn't find the ID named like "GRMZM2G046829_T01"which are names in B73 genome. Instead, there are names like DESIGN_ID1, DESIGN_ID2 and other control probes. Do you know the reason that why the NDF excluded the B73 genome ID? In that case, how to find the corresponding ID of my interest? Truly YoursMaggie

ADD COMMENT • link 12.8 years ago 陈娟 ▴ 70

0

Entering edit mode

Maggie, this is not what I said. What I said was: "If the NDF (or any other source) contains further information that you want to use on the downstream analyses (for example, gene associations or transcript IDs that you can link to probeset ID), then you need to load the NDF manually, appropriately extract the bits of information of interest and merge them with the preprocessed data." It's *if* the NDF contained information that you find relevant, which does not seem to be the case. What you will need to do is to get the featureNames() of the output object (after rma()) and map them to your units (transcript ID / gene ID) of interest. One tool that will help you with this is BioMart (or the R package biomaRt). It also appears that you would benefit a lot from the help of a local bioinformatics team, who should be able to assist you not only on the analyses but also on the basic concepts and manipulation aspects of microarray data. b On 23 July 2011 03:48, ?? <gtzxchj at="" hotmail.com=""> wrote: > > Dear Professor,Thank you for your patience and kindness.I have thought that the NDF was loaded along with xys files using read.xysfiles(), or library(pd.090319.zea.kr.exptil) which was made based on the .ndf file and .xys files. But you meant that I should load it manually, and which command can achieve this ?I have searched the NDF and couldn't find the ID named like "GRMZM2G046829_T01"which are names in B73 genome. Instead, there are names like DESIGN_ID1, DESIGN_ID2 and other control probes. Do you know the reason that why the NDF excluded the B73 genome ID? In that case, how to find the corresponding ID of my interest? > > Truly YoursMaggie > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Successful people ask better questions, and as a result, they get better answers. (Tony Robbins)

ADD REPLY • link 12.8 years ago Benilton Carvalho ★ 4.3k

Login before adding your answer.