processing Illumina HT12v4.0 expression data from GEO

0

Entering edit mode

Abhishek Pratap ▴ 190

@abhishek-pratap-4927

Last seen 9.7 years ago

United States

Hi Guys I would like to know the basic analysis workflow for downloading and processing a Illumina HTV12 expression data from GEO. I have seen the beadArray vignette but not sure which normalization process to use. For example with Affy datasets I normally download the raw data and normalize it with fRMA package to produce a final expression matrix of genes. Here is some code but basically the final goal is to produce a normalized expression matrix at genelevel. library( GEOquery ) gse <- getGEO("GSE58037") gse <- gse[[1]] mat <- exprs(gse) Appreciate any pointers Thanks! -Abhi

Normalization affy PROcess GEOquery frma Normalization affy PROcess GEOquery frma • 3.7k views

ADD COMMENT • link updated 11.5 years ago by Sean Davis 21k • written 11.6 years ago by Abhishek Pratap ▴ 190

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 25 days ago

United States

On Thu, Jul 17, 2014 at 8:17 PM, Abhishek Pratap <abhishek.vit@gmail.com> wrote: > Hi Guys > > I would like to know the basic analysis workflow for downloading and > processing a Illumina HTV12 expression data from GEO. I have seen the > beadArray vignette but not sure which normalization process to use. > > For example with Affy datasets I normally download the raw data and > normalize it with fRMA package to produce a final expression matrix of > genes. > > Here is some code but basically the final goal is to produce a > normalized expression matrix at genelevel. > > library( GEOquery ) > gse <- getGEO("GSE58037") > gse <- gse[[1]] > mat <- exprs(gse) > > Hi, Abhi. The "mat" variable above will give you expression measures as submitted by the authors. NCBI GEO provides a description: "The data were normalised using normal-exponential convolution model- based background correction and quantile normalization. Merging of the data, background removal and normalization processes were performed using the limma R package. All of the batches were normalized at once after excluding probes with low quality." If you do not want to use those normalized values, then you will need to define for yourself what the best approach is. I don't know of an accepted "best" approach for such arrays. Sean > Appreciate any pointers > > Thanks! > -Abhi > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 11.5 years ago Sean Davis 21k

0

Entering edit mode

Hi Sean Thanks for the details. I actually was wondering if I can get the raw data so I can do my own normalization. For example in the case of affy based GEO studies I normally see the raw CEL files also present which can be used with fRMA for producing normalized data. In this specific study http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE58037 I dont see the raw data similar to CEL files in affy studies. Just wondering if this particular case where that is missing or beadarray based studies dont tend to have raw data in GEO. Cheers! -Abhi On Wed, Jul 23, 2014 at 3:40 AM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > > > > On Thu, Jul 17, 2014 at 8:17 PM, Abhishek Pratap <abhishek.vit at="" gmail.com=""> > wrote: >> >> Hi Guys >> >> I would like to know the basic analysis workflow for downloading and >> processing a Illumina HTV12 expression data from GEO. I have seen the >> beadArray vignette but not sure which normalization process to use. >> >> For example with Affy datasets I normally download the raw data and >> normalize it with fRMA package to produce a final expression matrix of >> genes. >> >> Here is some code but basically the final goal is to produce a >> normalized expression matrix at genelevel. >> >> library( GEOquery ) >> gse <- getGEO("GSE58037") >> gse <- gse[[1]] >> mat <- exprs(gse) >> > > Hi, Abhi. > > The "mat" variable above will give you expression measures as submitted by > the authors. NCBI GEO provides a description: > > "The data were normalised using normal-exponential convolution model-based > background correction and quantile normalization. Merging of the data, > background removal and normalization processes were performed using the > limma R package. All of the batches were normalized at once after excluding > probes with low quality." > > If you do not want to use those normalized values, then you will need to > define for yourself what the best approach is. I don't know of an accepted > "best" approach for such arrays. > > Sean > > >> >> Appreciate any pointers >> >> Thanks! >> -Abhi >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 11.5 years ago Abhishek Pratap ▴ 190

0

Entering edit mode

Hi, Abhi. It looks like the GSE record contains raw data. However, you may need to write to the authors to confirm what was done to create the .txt files that are present in the raw data archives. Sean On Wed, Jul 23, 2014 at 11:43 AM, Abhishek Pratap <abhishek.vit@gmail.com> wrote: > Hi Sean > > Thanks for the details. I actually was wondering if I can get the raw > data so I can do my own normalization. For example in the case of affy > based GEO studies I normally see the raw CEL files also present which > can be used with fRMA for producing normalized data. > > In this specific study > http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE58037 I dont see > the raw data similar to CEL files in affy studies. Just wondering if > this particular case where that is missing or beadarray based studies > dont tend to have raw data in GEO. > > Cheers! > -Abhi > > On Wed, Jul 23, 2014 at 3:40 AM, Sean Davis <sdavis2@mail.nih.gov> wrote: > > > > > > > > On Thu, Jul 17, 2014 at 8:17 PM, Abhishek Pratap <abhishek.vit@gmail.com> > > > wrote: > >> > >> Hi Guys > >> > >> I would like to know the basic analysis workflow for downloading and > >> processing a Illumina HTV12 expression data from GEO. I have seen the > >> beadArray vignette but not sure which normalization process to use. > >> > >> For example with Affy datasets I normally download the raw data and > >> normalize it with fRMA package to produce a final expression matrix of > >> genes. > >> > >> Here is some code but basically the final goal is to produce a > >> normalized expression matrix at genelevel. > >> > >> library( GEOquery ) > >> gse <- getGEO("GSE58037") > >> gse <- gse[[1]] > >> mat <- exprs(gse) > >> > > > > Hi, Abhi. > > > > The "mat" variable above will give you expression measures as submitted > by > > the authors. NCBI GEO provides a description: > > > > "The data were normalised using normal-exponential convolution > model-based > > background correction and quantile normalization. Merging of the data, > > background removal and normalization processes were performed using the > > limma R package. All of the batches were normalized at once after > excluding > > probes with low quality." > > > > If you do not want to use those normalized values, then you will need to > > define for yourself what the best approach is. I don't know of an > accepted > > "best" approach for such arrays. > > > > Sean > > > > > >> > >> Appreciate any pointers > >> > >> Thanks! > >> -Abhi > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago Sean Davis 21k

0

Entering edit mode

Hi Sean I did download and open the files under raw data and strangely enough they have two files which seem like annotation for the Illumina probes. http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE58037&format=file&fil e=GSE58037%5Fmeningiomas%2Eraw%2Etxt%2Egz -A On Wed, Jul 23, 2014 at 8:52 AM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > Hi, Abhi. > > It looks like the GSE record contains raw data. However, you may need to > write to the authors to confirm what was done to create the .txt files that > are present in the raw data archives. > > Sean > > > > On Wed, Jul 23, 2014 at 11:43 AM, Abhishek Pratap <abhishek.vit at="" gmail.com=""> > wrote: >> >> Hi Sean >> >> Thanks for the details. I actually was wondering if I can get the raw >> data so I can do my own normalization. For example in the case of affy >> based GEO studies I normally see the raw CEL files also present which >> can be used with fRMA for producing normalized data. >> >> In this specific study >> http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE58037 I dont see >> the raw data similar to CEL files in affy studies. Just wondering if >> this particular case where that is missing or beadarray based studies >> dont tend to have raw data in GEO. >> >> Cheers! >> -Abhi >> >> On Wed, Jul 23, 2014 at 3:40 AM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: >> > >> > >> > >> > On Thu, Jul 17, 2014 at 8:17 PM, Abhishek Pratap >> > <abhishek.vit at="" gmail.com=""> >> > wrote: >> >> >> >> Hi Guys >> >> >> >> I would like to know the basic analysis workflow for downloading and >> >> processing a Illumina HTV12 expression data from GEO. I have seen the >> >> beadArray vignette but not sure which normalization process to use. >> >> >> >> For example with Affy datasets I normally download the raw data and >> >> normalize it with fRMA package to produce a final expression matrix of >> >> genes. >> >> >> >> Here is some code but basically the final goal is to produce a >> >> normalized expression matrix at genelevel. >> >> >> >> library( GEOquery ) >> >> gse <- getGEO("GSE58037") >> >> gse <- gse[[1]] >> >> mat <- exprs(gse) >> >> >> > >> > Hi, Abhi. >> > >> > The "mat" variable above will give you expression measures as submitted >> > by >> > the authors. NCBI GEO provides a description: >> > >> > "The data were normalised using normal-exponential convolution >> > model-based >> > background correction and quantile normalization. Merging of the data, >> > background removal and normalization processes were performed using the >> > limma R package. All of the batches were normalized at once after >> > excluding >> > probes with low quality." >> > >> > If you do not want to use those normalized values, then you will need to >> > define for yourself what the best approach is. I don't know of an >> > accepted >> > "best" approach for such arrays. >> > >> > Sean >> > >> > >> >> >> >> Appreciate any pointers >> >> >> >> Thanks! >> >> -Abhi >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 11.5 years ago Abhishek Pratap ▴ 190

0

Entering edit mode

I think the file named: GSE58037_meningiomas.raw.txt.gz is the file with the data in it. Again, since the illumina software allows dumping for raw or normalized data, it is not clear what is actually in the file and an email to authors may be required. Sean On Wed, Jul 23, 2014 at 11:58 AM, Abhishek Pratap <abhishek.vit@gmail.com> wrote: > Hi Sean > > I did download and open the files under raw data and strangely enough > they have two files which seem like annotation for the Illumina > probes. > > > http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE58037&format=file&f ile=GSE58037%5Fmeningiomas%2Eraw%2Etxt%2Egz > > -A > > On Wed, Jul 23, 2014 at 8:52 AM, Sean Davis <sdavis2@mail.nih.gov> wrote: > > Hi, Abhi. > > > > It looks like the GSE record contains raw data. However, you may need to > > write to the authors to confirm what was done to create the .txt files > that > > are present in the raw data archives. > > > > Sean > > > > > > > > On Wed, Jul 23, 2014 at 11:43 AM, Abhishek Pratap < > abhishek.vit@gmail.com> > > wrote: > >> > >> Hi Sean > >> > >> Thanks for the details. I actually was wondering if I can get the raw > >> data so I can do my own normalization. For example in the case of affy > >> based GEO studies I normally see the raw CEL files also present which > >> can be used with fRMA for producing normalized data. > >> > >> In this specific study > >> http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE58037 I dont see > >> the raw data similar to CEL files in affy studies. Just wondering if > >> this particular case where that is missing or beadarray based studies > >> dont tend to have raw data in GEO. > >> > >> Cheers! > >> -Abhi > >> > >> On Wed, Jul 23, 2014 at 3:40 AM, Sean Davis <sdavis2@mail.nih.gov> > wrote: > >> > > >> > > >> > > >> > On Thu, Jul 17, 2014 at 8:17 PM, Abhishek Pratap > >> > <abhishek.vit@gmail.com> > >> > wrote: > >> >> > >> >> Hi Guys > >> >> > >> >> I would like to know the basic analysis workflow for downloading and > >> >> processing a Illumina HTV12 expression data from GEO. I have seen the > >> >> beadArray vignette but not sure which normalization process to use. > >> >> > >> >> For example with Affy datasets I normally download the raw data and > >> >> normalize it with fRMA package to produce a final expression matrix > of > >> >> genes. > >> >> > >> >> Here is some code but basically the final goal is to produce a > >> >> normalized expression matrix at genelevel. > >> >> > >> >> library( GEOquery ) > >> >> gse <- getGEO("GSE58037") > >> >> gse <- gse[[1]] > >> >> mat <- exprs(gse) > >> >> > >> > > >> > Hi, Abhi. > >> > > >> > The "mat" variable above will give you expression measures as > submitted > >> > by > >> > the authors. NCBI GEO provides a description: > >> > > >> > "The data were normalised using normal-exponential convolution > >> > model-based > >> > background correction and quantile normalization. Merging of the data, > >> > background removal and normalization processes were performed using > the > >> > limma R package. All of the batches were normalized at once after > >> > excluding > >> > probes with low quality." > >> > > >> > If you do not want to use those normalized values, then you will need > to > >> > define for yourself what the best approach is. I don't know of an > >> > accepted > >> > "best" approach for such arrays. > >> > > >> > Sean > >> > > >> > > >> >> > >> >> Appreciate any pointers > >> >> > >> >> Thanks! > >> >> -Abhi > >> >> > >> >> _______________________________________________ > >> >> Bioconductor mailing list > >> >> Bioconductor@r-project.org > >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >> Search the archives: > >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > >> > > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago Sean Davis 21k

0

Entering edit mode

Thanks Sean. I guess if I need to dig deeper I will have to follow up with authors. For now I will just explore the normalized data that is present at GEO. Cheers! -Abhi On Wed, Jul 23, 2014 at 9:06 AM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > I think the file named: > > GSE58037_meningiomas.raw.txt.gz > > is the file with the data in it. Again, since the illumina software allows > dumping for raw or normalized data, it is not clear what is actually in the > file and an email to authors may be required. > > Sean > > > > On Wed, Jul 23, 2014 at 11:58 AM, Abhishek Pratap <abhishek.vit at="" gmail.com=""> > wrote: >> >> Hi Sean >> >> I did download and open the files under raw data and strangely enough >> they have two files which seem like annotation for the Illumina >> probes. >> >> >> http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE58037&format=file& file=GSE58037%5Fmeningiomas%2Eraw%2Etxt%2Egz >> >> -A >> >> On Wed, Jul 23, 2014 at 8:52 AM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: >> > Hi, Abhi. >> > >> > It looks like the GSE record contains raw data. However, you may need >> > to >> > write to the authors to confirm what was done to create the .txt files >> > that >> > are present in the raw data archives. >> > >> > Sean >> > >> > >> > >> > On Wed, Jul 23, 2014 at 11:43 AM, Abhishek Pratap >> > <abhishek.vit at="" gmail.com=""> >> > wrote: >> >> >> >> Hi Sean >> >> >> >> Thanks for the details. I actually was wondering if I can get the raw >> >> data so I can do my own normalization. For example in the case of affy >> >> based GEO studies I normally see the raw CEL files also present which >> >> can be used with fRMA for producing normalized data. >> >> >> >> In this specific study >> >> http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE58037 I dont see >> >> the raw data similar to CEL files in affy studies. Just wondering if >> >> this particular case where that is missing or beadarray based studies >> >> dont tend to have raw data in GEO. >> >> >> >> Cheers! >> >> -Abhi >> >> >> >> On Wed, Jul 23, 2014 at 3:40 AM, Sean Davis <sdavis2 at="" mail.nih.gov=""> >> >> wrote: >> >> > >> >> > >> >> > >> >> > On Thu, Jul 17, 2014 at 8:17 PM, Abhishek Pratap >> >> > <abhishek.vit at="" gmail.com=""> >> >> > wrote: >> >> >> >> >> >> Hi Guys >> >> >> >> >> >> I would like to know the basic analysis workflow for downloading and >> >> >> processing a Illumina HTV12 expression data from GEO. I have seen >> >> >> the >> >> >> beadArray vignette but not sure which normalization process to use. >> >> >> >> >> >> For example with Affy datasets I normally download the raw data and >> >> >> normalize it with fRMA package to produce a final expression matrix >> >> >> of >> >> >> genes. >> >> >> >> >> >> Here is some code but basically the final goal is to produce a >> >> >> normalized expression matrix at genelevel. >> >> >> >> >> >> library( GEOquery ) >> >> >> gse <- getGEO("GSE58037") >> >> >> gse <- gse[[1]] >> >> >> mat <- exprs(gse) >> >> >> >> >> > >> >> > Hi, Abhi. >> >> > >> >> > The "mat" variable above will give you expression measures as >> >> > submitted >> >> > by >> >> > the authors. NCBI GEO provides a description: >> >> > >> >> > "The data were normalised using normal-exponential convolution >> >> > model-based >> >> > background correction and quantile normalization. Merging of the >> >> > data, >> >> > background removal and normalization processes were performed using >> >> > the >> >> > limma R package. All of the batches were normalized at once after >> >> > excluding >> >> > probes with low quality." >> >> > >> >> > If you do not want to use those normalized values, then you will need >> >> > to >> >> > define for yourself what the best approach is. I don't know of an >> >> > accepted >> >> > "best" approach for such arrays. >> >> > >> >> > Sean >> >> > >> >> > >> >> >> >> >> >> Appreciate any pointers >> >> >> >> >> >> Thanks! >> >> >> -Abhi >> >> >> >> >> >> _______________________________________________ >> >> >> Bioconductor mailing list >> >> >> Bioconductor at r-project.org >> >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> >> Search the archives: >> >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > >> >> > >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 11.5 years ago Abhishek Pratap ▴ 190

Login before adding your answer.