Conversion of affymetrix cell file to raw text file

0

Entering edit mode

viritha kaza ▴ 580

@viritha-kaza-4318

Last seen 11.4 years ago

Hi group, If I want to create raw txt file of microarray data from the (affymetrix) cell file, how do I create the expression set with raw signal intensity.I know that only cell file with the version 3 can be opened as excel file as it is in ascii format. In one such cell file the intensity is indicated as: CellHeader=X Y MEAN STDV NPIXELS 0 0 137.3 25.1 36 1 0 10730.5 2009.9 36 2 0 136.3 21.2 36 But I am not sure how to assign the probe numbers to the CellHeaders and I would also like to know if the raw intensity taken is just the mean intensity? Can this be performed in R? Waiting for your response, Thank you in advance, Viritha [[alternative HTML version deleted]]

Microarray probe ASSIGN Microarray probe ASSIGN • 3.6k views

ADD COMMENT • link 15.1 years ago viritha kaza ▴ 580

0

Entering edit mode

viritha kaza ▴ 580

@viritha-kaza-4318

Last seen 11.4 years ago

Hi Group, Let me clearly explain.I have the [Mouse430_2] Affymetrix Mouse Genome 430 2.0 Array.I want to create an unnormalised expression microarray data set.I have the cell files and cdf file for this.I want the intensities in the probe level.Is this possible in R or any other source? or how can I get this expression microarray dataset? Thank you in advance, Viritha On Wed, Dec 15, 2010 at 4:05 PM, viritha kaza <viritha.k@gmail.com> wrote: > Hi group, > If I want to create raw txt file of microarray data from the (affymetrix) > cell file, how do I create the expression set with raw signal intensity.I > know that only cell file with the version 3 can be opened as excel file as > it is in ascii format. > In one such cell file the intensity is indicated as: > CellHeader=X Y MEAN STDV NPIXELS 0 0 137.3 25.1 36 1 0 10730.5 2009.9 > 36 2 0 136.3 21.2 36 > But I am not sure how to assign the probe numbers to the > CellHeaders and I would also like to know if the raw intensity taken is just > the mean intensity? Can this be performed in R? > Waiting for your response, > Thank you in advance, > Viritha > [[alternative HTML version deleted]]

ADD COMMENT • link 15.1 years ago viritha kaza ▴ 580

0

Entering edit mode

Hi Viritha, On 12/16/2010 10:45 AM, viritha kaza wrote: > Hi Group, > Let me clearly explain.I have the [Mouse430_2] Affymetrix Mouse Genome 430 > 2.0 Array.I want to create an unnormalised expression microarray data set.I > have the cell files and cdf file for this.I want the intensities in the > probe level.Is this possible in R or any other source? or how can I get this > expression microarray dataset? library(affy) dat <- ReadAffy() pms <- pm(dat, LISTRUE=TRUE) fun <- function(q,r){ row.names(r) <- rep(q, ncol(r)) r } pms <- mapply(fun, names(pms), pms, SIMPLIFY = FALSE) pms <- do.call("rbind", pms) write.table(pms, "Raw PM data.txt", quote = FALSE, row.names = TRUE, col.names = TRUE, sep = "\t") You can do similar for MM probes if you desire. Best, Jim > Thank you in advance, > Viritha > > On Wed, Dec 15, 2010 at 4:05 PM, viritha kaza<viritha.k at="" gmail.com=""> wrote: > >> Hi group, >> If I want to create raw txt file of microarray data from the (affymetrix) >> cell file, how do I create the expression set with raw signal intensity.I >> know that only cell file with the version 3 can be opened as excel file as >> it is in ascii format. >> In one such cell file the intensity is indicated as: >> CellHeader=X Y MEAN STDV NPIXELS 0 0 137.3 25.1 36 1 0 10730.5 2009.9 >> 36 2 0 136.3 21.2 36 >> But I am not sure how to assign the probe numbers to the >> CellHeaders and I would also like to know if the raw intensity taken is just >> the mean intensity? Can this be performed in R? >> Waiting for your response, >> Thank you in advance, >> Viritha >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 15.1 years ago James W. MacDonald 68k

0

Entering edit mode

Hi James, Thanks for your reply, I am new to R statistics. Do I have to give the values for q or r because I am getting the following error when I type mapply command - Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent There are 5 arrays in the experiment. Thank you, Viritha On Thu, Dec 16, 2010 at 11:22 AM, James W. MacDonald <jmacdon@med.umich.edu>wrote: > Hi Viritha, > > > On 12/16/2010 10:45 AM, viritha kaza wrote: > >> Hi Group, >> Let me clearly explain.I have the [Mouse430_2] Affymetrix Mouse Genome 430 >> 2.0 Array.I want to create an unnormalised expression microarray data >> set.I >> have the cell files and cdf file for this.I want the intensities in the >> probe level.Is this possible in R or any other source? or how can I get >> this >> expression microarray dataset? >> > > library(affy) > dat <- ReadAffy() > pms <- pm(dat, LISTRUE=TRUE) > fun <- function(q,r){ > row.names(r) <- rep(q, ncol(r)) > r > } > > pms <- mapply(fun, names(pms), pms, SIMPLIFY = FALSE) > pms <- do.call("rbind", pms) > write.table(pms, "Raw PM data.txt", quote = FALSE, row.names = TRUE, > col.names = TRUE, sep = "\t") > > You can do similar for MM probes if you desire. > > Best, > > Jim > > > > Thank you in advance, >> Viritha >> >> On Wed, Dec 15, 2010 at 4:05 PM, viritha kaza<viritha.k@gmail.com> >> wrote: >> >> Hi group, >>> If I want to create raw txt file of microarray data from the (affymetrix) >>> cell file, how do I create the expression set with raw signal intensity.I >>> know that only cell file with the version 3 can be opened as excel file >>> as >>> it is in ascii format. >>> In one such cell file the intensity is indicated as: >>> CellHeader=X Y MEAN STDV NPIXELS 0 0 137.3 25.1 36 1 0 10730.5 2009.9 >>> 36 2 0 136.3 21.2 36 >>> But I am not sure how to assign the probe numbers to the >>> CellHeaders and I would also like to know if the raw intensity taken is >>> just >>> the mean intensity? Can this be performed in R? >>> Waiting for your response, >>> Thank you in advance, >>> Viritha >>> >>> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues > [[alternative HTML version deleted]]

ADD REPLY • link 15.1 years ago viritha kaza ▴ 580

0

Entering edit mode

Make that fun <- function(q,r){ row.names(r) <- rep(q, nrow(r)) r } Which of course makes more sense. Jim On 12/16/2010 12:04 PM, viritha kaza wrote: > Hi James, > Thanks for your reply, > I am new to R statistics. > Do I have to give the values for q or r because I am getting the following > error when I type mapply command - > > Error in dimnames(x)<- dn : > length of 'dimnames' [1] not equal to array extent > > There are 5 arrays in the experiment. > > Thank you, > Viritha > > > On Thu, Dec 16, 2010 at 11:22 AM, James W. MacDonald > <jmacdon at="" med.umich.edu="">wrote: > >> Hi Viritha, >> >> >> On 12/16/2010 10:45 AM, viritha kaza wrote: >> >>> Hi Group, >>> Let me clearly explain.I have the [Mouse430_2] Affymetrix Mouse Genome 430 >>> 2.0 Array.I want to create an unnormalised expression microarray data >>> set.I >>> have the cell files and cdf file for this.I want the intensities in the >>> probe level.Is this possible in R or any other source? or how can I get >>> this >>> expression microarray dataset? >>> >> >> library(affy) >> dat<- ReadAffy() >> pms<- pm(dat, LISTRUE=TRUE) >> fun<- function(q,r){ >> row.names(r)<- rep(q, ncol(r)) >> r >> } >> >> pms<- mapply(fun, names(pms), pms, SIMPLIFY = FALSE) >> pms<- do.call("rbind", pms) >> write.table(pms, "Raw PM data.txt", quote = FALSE, row.names = TRUE, >> col.names = TRUE, sep = "\t") >> >> You can do similar for MM probes if you desire. >> >> Best, >> >> Jim >> >> >> >> Thank you in advance, >>> Viritha >>> >>> On Wed, Dec 15, 2010 at 4:05 PM, viritha kaza<viritha.k at="" gmail.com=""> >>> wrote: >>> >>> Hi group, >>>> If I want to create raw txt file of microarray data from the (affymetrix) >>>> cell file, how do I create the expression set with raw signal intensity.I >>>> know that only cell file with the version 3 can be opened as excel file >>>> as >>>> it is in ascii format. >>>> In one such cell file the intensity is indicated as: >>>> CellHeader=X Y MEAN STDV NPIXELS 0 0 137.3 25.1 36 1 0 10730.5 2009.9 >>>> 36 2 0 136.3 21.2 36 >>>> But I am not sure how to assign the probe numbers to the >>>> CellHeaders and I would also like to know if the raw intensity taken is >>>> just >>>> the mean intensity? Can this be performed in R? >>>> Waiting for your response, >>>> Thank you in advance, >>>> Viritha >>>> >>>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> Douglas Lab >> University of Michigan >> Department of Human Genetics >> 5912 Buhl >> 1241 E. Catherine St. >> Ann Arbor MI 48109-5618 >> 734-615-7826 >> ********************************************************** >> Electronic Mail is not secure, may not be read every day, and should not be >> used for urgent or sensitive issues >> -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 15.1 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks James.There was no error. But I see that I get 11 values for the same probe.Why does it happen? If I perform MM as well then again I would get another file.How do I finally get one value for each probe in an array? Thanks, Viritha On Thu, Dec 16, 2010 at 2:18 PM, James W. MacDonald <jmacdon@med.umich.edu>wrote: > Make that > > fun <- function(q,r){ > row.names(r) <- rep(q, nrow(r)) > r > } > > Which of course makes more sense. > > Jim > > > > > On 12/16/2010 12:04 PM, viritha kaza wrote: > >> Hi James, >> Thanks for your reply, >> I am new to R statistics. >> Do I have to give the values for q or r because I am getting the following >> error when I type mapply command - >> >> Error in dimnames(x)<- dn : >> length of 'dimnames' [1] not equal to array extent >> >> There are 5 arrays in the experiment. >> >> Thank you, >> Viritha >> >> >> On Thu, Dec 16, 2010 at 11:22 AM, James W. MacDonald >> <jmacdon@med.umich.edu>wrote: >> >> Hi Viritha, >>> >>> >>> On 12/16/2010 10:45 AM, viritha kaza wrote: >>> >>> Hi Group, >>>> Let me clearly explain.I have the [Mouse430_2] Affymetrix Mouse Genome >>>> 430 >>>> 2.0 Array.I want to create an unnormalised expression microarray data >>>> set.I >>>> have the cell files and cdf file for this.I want the intensities in the >>>> probe level.Is this possible in R or any other source? or how can I get >>>> this >>>> expression microarray dataset? >>>> >>>> >>> library(affy) >>> dat<- ReadAffy() >>> pms<- pm(dat, LISTRUE=TRUE) >>> fun<- function(q,r){ >>> row.names(r)<- rep(q, ncol(r)) >>> r >>> } >>> >>> pms<- mapply(fun, names(pms), pms, SIMPLIFY = FALSE) >>> pms<- do.call("rbind", pms) >>> write.table(pms, "Raw PM data.txt", quote = FALSE, row.names = TRUE, >>> col.names = TRUE, sep = "\t") >>> >>> You can do similar for MM probes if you desire. >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> Thank you in advance, >>> >>>> Viritha >>>> >>>> On Wed, Dec 15, 2010 at 4:05 PM, viritha kaza<viritha.k@gmail.com> >>>> wrote: >>>> >>>> Hi group, >>>> >>>>> If I want to create raw txt file of microarray data from the >>>>> (affymetrix) >>>>> cell file, how do I create the expression set with raw signal >>>>> intensity.I >>>>> know that only cell file with the version 3 can be opened as excel file >>>>> as >>>>> it is in ascii format. >>>>> In one such cell file the intensity is indicated as: >>>>> CellHeader=X Y MEAN STDV NPIXELS 0 0 137.3 25.1 36 1 0 10730.5 >>>>> 2009.9 >>>>> 36 2 0 136.3 21.2 36 >>>>> But I am not sure how to assign the probe numbers to the >>>>> CellHeaders and I would also like to know if the raw intensity taken is >>>>> just >>>>> the mean intensity? Can this be performed in R? >>>>> Waiting for your response, >>>>> Thank you in advance, >>>>> Viritha >>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> Douglas Lab >>> University of Michigan >>> Department of Human Genetics >>> 5912 Buhl >>> 1241 E. Catherine St. >>> Ann Arbor MI 48109-5618 >>> 734-615-7826 >>> ********************************************************** >>> Electronic Mail is not secure, may not be read every day, and should not >>> be >>> used for urgent or sensitive issues >>> >>> > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues > [[alternative HTML version deleted]]

ADD REPLY • link 15.1 years ago viritha kaza ▴ 580

0

Entering edit mode

On 12/16/2010 3:35 PM, viritha kaza wrote: > Thanks James.There was no error. > But I see that I get 11 values for the same probe.Why does it happen? If I > perform MM as well then again I would get another file.How do I finally get > one value for each probe in an array? I think we need to back up a bit here. On Affy chips there are multiple probes used to interrogate a single transcript. As you note, for this particular chip there are usually 11 probes. All of the probes for a given transcript make up a probeset. When we process these data, we first background correct and normalize the probe values to eliminate as much non-biological variability as possible, and then we summarize all the probes in each probeset to generate the final value, which we hope is proportional to the expression of the transcript we are trying to measure. So we have to be precise about our terminology. You originally asked for a text file containing unnormalized probe values, which is what the code I supplied does. Evidently that is not what you wanted, so can you precisely state what it is that you do want? Best, Jim > Thanks, > Viritha > > On Thu, Dec 16, 2010 at 2:18 PM, James W. MacDonald > <jmacdon at="" med.umich.edu="">wrote: > >> Make that >> >> fun<- function(q,r){ >> row.names(r)<- rep(q, nrow(r)) >> r >> } >> >> Which of course makes more sense. >> >> Jim >> >> >> >> >> On 12/16/2010 12:04 PM, viritha kaza wrote: >> >>> Hi James, >>> Thanks for your reply, >>> I am new to R statistics. >>> Do I have to give the values for q or r because I am getting the following >>> error when I type mapply command - >>> >>> Error in dimnames(x)<- dn : >>> length of 'dimnames' [1] not equal to array extent >>> >>> There are 5 arrays in the experiment. >>> >>> Thank you, >>> Viritha >>> >>> >>> On Thu, Dec 16, 2010 at 11:22 AM, James W. MacDonald >>> <jmacdon at="" med.umich.edu="">wrote: >>> >>> Hi Viritha, >>>> >>>> >>>> On 12/16/2010 10:45 AM, viritha kaza wrote: >>>> >>>> Hi Group, >>>>> Let me clearly explain.I have the [Mouse430_2] Affymetrix Mouse Genome >>>>> 430 >>>>> 2.0 Array.I want to create an unnormalised expression microarray data >>>>> set.I >>>>> have the cell files and cdf file for this.I want the intensities in the >>>>> probe level.Is this possible in R or any other source? or how can I get >>>>> this >>>>> expression microarray dataset? >>>>> >>>>> >>>> library(affy) >>>> dat<- ReadAffy() >>>> pms<- pm(dat, LISTRUE=TRUE) >>>> fun<- function(q,r){ >>>> row.names(r)<- rep(q, ncol(r)) >>>> r >>>> } >>>> >>>> pms<- mapply(fun, names(pms), pms, SIMPLIFY = FALSE) >>>> pms<- do.call("rbind", pms) >>>> write.table(pms, "Raw PM data.txt", quote = FALSE, row.names = TRUE, >>>> col.names = TRUE, sep = "\t") >>>> >>>> You can do similar for MM probes if you desire. >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> >>>> >>>> Thank you in advance, >>>> >>>>> Viritha >>>>> >>>>> On Wed, Dec 15, 2010 at 4:05 PM, viritha kaza<viritha.k at="" gmail.com=""> >>>>> wrote: >>>>> >>>>> Hi group, >>>>> >>>>>> If I want to create raw txt file of microarray data from the >>>>>> (affymetrix) >>>>>> cell file, how do I create the expression set with raw signal >>>>>> intensity.I >>>>>> know that only cell file with the version 3 can be opened as excel file >>>>>> as >>>>>> it is in ascii format. >>>>>> In one such cell file the intensity is indicated as: >>>>>> CellHeader=X Y MEAN STDV NPIXELS 0 0 137.3 25.1 36 1 0 10730.5 >>>>>> 2009.9 >>>>>> 36 2 0 136.3 21.2 36 >>>>>> But I am not sure how to assign the probe numbers to the >>>>>> CellHeaders and I would also like to know if the raw intensity taken is >>>>>> just >>>>>> the mean intensity? Can this be performed in R? >>>>>> Waiting for your response, >>>>>> Thank you in advance, >>>>>> Viritha >>>>>> >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> >>>> -- >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> Douglas Lab >>>> University of Michigan >>>> Department of Human Genetics >>>> 5912 Buhl >>>> 1241 E. Catherine St. >>>> Ann Arbor MI 48109-5618 >>>> 734-615-7826 >>>> ********************************************************** >>>> Electronic Mail is not secure, may not be read every day, and should not >>>> be >>>> used for urgent or sensitive issues >>>> >>>> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> Douglas Lab >> University of Michigan >> Department of Human Genetics >> 5912 Buhl >> 1241 E. Catherine St. >> Ann Arbor MI 48109-5618 >> 734-615-7826 >> ********************************************************** >> Electronic Mail is not secure, may not be read every day, and should not be >> used for urgent or sensitive issues >> > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 15.1 years ago James W. MacDonald 68k

0

Entering edit mode

Hi James, I am actually interested in getting a raw (unnormalised) microarray expression dataset. Since I am interested in performing this for many datasets, I would like to perform normalization as one of the paper suggests to remove bias due to the sample preparation and different platforms- "Briefly, for each expression data set, individual probe intensity of each array was divided by the averaged probe intensity across all arrays within the data set, then each value was log (base 2) transformed. For normalization, first, average expression value of all probes in each array was calculated. Then for each array, expression value of each probe was subtracted by the averaged expression value. By doing so, average expression value of all probes in each array in each expression data set will be zero." Hence to perform above steps I thought I would need a raw expression dataset from the cell files afterwhich I can normalise by the above strategy to remove bias.So I am expecting to get a single value for each probe in an array. I hope this helps in understanding what exactly I want the expression dataset to be. Thanks, Viritha On Fri, Dec 17, 2010 at 10:00 AM, James W. MacDonald <jmacdon@med.umich.edu>wrote: > > > On 12/16/2010 3:35 PM, viritha kaza wrote: > >> Thanks James.There was no error. >> But I see that I get 11 values for the same probe.Why does it happen? If I >> perform MM as well then again I would get another file.How do I finally >> get >> one value for each probe in an array? >> > > I think we need to back up a bit here. On Affy chips there are multiple > probes used to interrogate a single transcript. As you note, for this > particular chip there are usually 11 probes. All of the probes for a given > transcript make up a probeset. > > When we process these data, we first background correct and normalize the > probe values to eliminate as much non-biological variability as possible, > and then we summarize all the probes in each probeset to generate the final > value, which we hope is proportional to the expression of the transcript we > are trying to measure. > > So we have to be precise about our terminology. You originally asked for a > text file containing unnormalized probe values, which is what the code I > supplied does. Evidently that is not what you wanted, so can you precisely > state what it is that you do want? > > Best, > > Jim > > > > > > Thanks, >> Viritha >> >> On Thu, Dec 16, 2010 at 2:18 PM, James W. MacDonald >> <jmacdon@med.umich.edu>wrote: >> >> Make that >>> >>> fun<- function(q,r){ >>> row.names(r)<- rep(q, nrow(r)) >>> r >>> } >>> >>> Which of course makes more sense. >>> >>> Jim >>> >>> >>> >>> >>> On 12/16/2010 12:04 PM, viritha kaza wrote: >>> >>> Hi James, >>>> Thanks for your reply, >>>> I am new to R statistics. >>>> Do I have to give the values for q or r because I am getting the >>>> following >>>> error when I type mapply command - >>>> >>>> Error in dimnames(x)<- dn : >>>> length of 'dimnames' [1] not equal to array extent >>>> >>>> There are 5 arrays in the experiment. >>>> >>>> Thank you, >>>> Viritha >>>> >>>> >>>> On Thu, Dec 16, 2010 at 11:22 AM, James W. MacDonald >>>> <jmacdon@med.umich.edu>wrote: >>>> >>>> Hi Viritha, >>>> >>>>> >>>>> >>>>> On 12/16/2010 10:45 AM, viritha kaza wrote: >>>>> >>>>> Hi Group, >>>>> >>>>>> Let me clearly explain.I have the [Mouse430_2] Affymetrix Mouse Genome >>>>>> 430 >>>>>> 2.0 Array.I want to create an unnormalised expression microarray data >>>>>> set.I >>>>>> have the cell files and cdf file for this.I want the intensities in >>>>>> the >>>>>> probe level.Is this possible in R or any other source? or how can I >>>>>> get >>>>>> this >>>>>> expression microarray dataset? >>>>>> >>>>>> >>>>>> library(affy) >>>>> dat<- ReadAffy() >>>>> pms<- pm(dat, LISTRUE=TRUE) >>>>> fun<- function(q,r){ >>>>> row.names(r)<- rep(q, ncol(r)) >>>>> r >>>>> } >>>>> >>>>> pms<- mapply(fun, names(pms), pms, SIMPLIFY = FALSE) >>>>> pms<- do.call("rbind", pms) >>>>> write.table(pms, "Raw PM data.txt", quote = FALSE, row.names = TRUE, >>>>> col.names = TRUE, sep = "\t") >>>>> >>>>> You can do similar for MM probes if you desire. >>>>> >>>>> Best, >>>>> >>>>> Jim >>>>> >>>>> >>>>> >>>>> Thank you in advance, >>>>> >>>>> Viritha >>>>>> >>>>>> On Wed, Dec 15, 2010 at 4:05 PM, viritha kaza<viritha.k@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Hi group, >>>>>> >>>>>> If I want to create raw txt file of microarray data from the >>>>>>> (affymetrix) >>>>>>> cell file, how do I create the expression set with raw signal >>>>>>> intensity.I >>>>>>> know that only cell file with the version 3 can be opened as excel >>>>>>> file >>>>>>> as >>>>>>> it is in ascii format. >>>>>>> In one such cell file the intensity is indicated as: >>>>>>> CellHeader=X Y MEAN STDV NPIXELS 0 0 137.3 25.1 36 1 0 10730.5 >>>>>>> 2009.9 >>>>>>> 36 2 0 136.3 21.2 36 >>>>>>> But I am not sure how to assign the probe numbers to the >>>>>>> CellHeaders and I would also like to know if the raw intensity taken >>>>>>> is >>>>>>> just >>>>>>> the mean intensity? Can this be performed in R? >>>>>>> Waiting for your response, >>>>>>> Thank you in advance, >>>>>>> Viritha >>>>>>> >>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor@r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>>> >>>>>> -- >>>>> James W. MacDonald, M.S. >>>>> Biostatistician >>>>> Douglas Lab >>>>> University of Michigan >>>>> Department of Human Genetics >>>>> 5912 Buhl >>>>> 1241 E. Catherine St. >>>>> Ann Arbor MI 48109-5618 >>>>> 734-615-7826 >>>>> ********************************************************** >>>>> Electronic Mail is not secure, may not be read every day, and should >>>>> not >>>>> be >>>>> used for urgent or sensitive issues >>>>> >>>>> >>>>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> Douglas Lab >>> University of Michigan >>> Department of Human Genetics >>> 5912 Buhl >>> 1241 E. Catherine St. >>> Ann Arbor MI 48109-5618 >>> 734-615-7826 >>> ********************************************************** >>> Electronic Mail is not secure, may not be read every day, and should not >>> be >>> used for urgent or sensitive issues >>> >>> >> > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues > [[alternative HTML version deleted]]

ADD REPLY • link 15.1 years ago viritha kaza ▴ 580

0

Entering edit mode

Hi Viritha, On 12/17/2010 11:11 AM, viritha kaza wrote: > Hi James, > I am actually interested in getting a raw (unnormalised) microarray > expression dataset. Since I am interested in performing this for many > datasets, I would like to perform normalization as one of the paper suggests > to remove bias due to the sample preparation and different platforms- > "Briefly, for each expression data set, individual probe intensity of each > array was divided by the averaged probe intensity across all arrays within > the data set, then each value was log (base 2) transformed. For > normalization, first, average expression value of all probes in each array > was calculated. Then for each array, expression value of each probe was > subtracted by the averaged expression value. By doing so, average expression > value of all probes in each array in each expression data set will be zero." Two things here: 1.) That normalization is as naive as you can possibly get. We have gone _way_ past the stage where people think a simple location normalization is a reasonable thing to do. All this does is shift the data so the means line up, not taking into account that there might be more subtle technical artifacts that should be removed. You will be much better served by using the stock normalization in rma(), or if you really want to get fancy, you might want to use vsn. But you will be regressing to maybe the year 2000 if you use the normalization you suggest here. 2.) The normalization you are considering is designed for spotted arrays, where each spot measures transcript from two different samples. Because of that fact, the data are usually reported as a ratio (e.g., cy3/cy5). For these data, exact equivalence of transcript would be expected to be a 1 (e.g., equal amounts of cy3 and cy5 fluorescence). If you then take logs, equivalence will then be equal to zero. In that case, taking the mean and subtracting (centering on the mean) is a reasonable but naive thing to do. However, in your case, the data range from approximately 2^6 - 2^14 or so. If you take log_2 of these data, they will then range from 6 - 14. Because they aren't ratios, and they aren't really symmetrically distributed there isn't a compelling reason to normalize to zero. If you still want to progress with this idea, note that pretty much all of the summarization methods have a normalize argument, so you can simply set normalize = FALSE, and you will then get unnormalized, summarized data. See e.g., ?rma Best, Jim > Hence to perform above steps I thought I would need a raw expression dataset > from the cell files afterwhich I can normalise by the above strategy to > remove bias.So I am expecting to get a single value for each probe in an > array. > I hope this helps in understanding what exactly I want the expression > dataset to be. > Thanks, > Viritha > > On Fri, Dec 17, 2010 at 10:00 AM, James W. MacDonald > <jmacdon at="" med.umich.edu="">wrote: > >> >> >> On 12/16/2010 3:35 PM, viritha kaza wrote: >> >>> Thanks James.There was no error. >>> But I see that I get 11 values for the same probe.Why does it happen? If I >>> perform MM as well then again I would get another file.How do I finally >>> get >>> one value for each probe in an array? >>> >> >> I think we need to back up a bit here. On Affy chips there are multiple >> probes used to interrogate a single transcript. As you note, for this >> particular chip there are usually 11 probes. All of the probes for a given >> transcript make up a probeset. >> >> When we process these data, we first background correct and normalize the >> probe values to eliminate as much non-biological variability as possible, >> and then we summarize all the probes in each probeset to generate the final >> value, which we hope is proportional to the expression of the transcript we >> are trying to measure. >> >> So we have to be precise about our terminology. You originally asked for a >> text file containing unnormalized probe values, which is what the code I >> supplied does. Evidently that is not what you wanted, so can you precisely >> state what it is that you do want? >> >> Best, >> >> Jim >> >> >> >> >> >> Thanks, >>> Viritha >>> >>> On Thu, Dec 16, 2010 at 2:18 PM, James W. MacDonald >>> <jmacdon at="" med.umich.edu="">wrote: >>> >>> Make that >>>> >>>> fun<- function(q,r){ >>>> row.names(r)<- rep(q, nrow(r)) >>>> r >>>> } >>>> >>>> Which of course makes more sense. >>>> >>>> Jim >>>> >>>> >>>> >>>> >>>> On 12/16/2010 12:04 PM, viritha kaza wrote: >>>> >>>> Hi James, >>>>> Thanks for your reply, >>>>> I am new to R statistics. >>>>> Do I have to give the values for q or r because I am getting the >>>>> following >>>>> error when I type mapply command - >>>>> >>>>> Error in dimnames(x)<- dn : >>>>> length of 'dimnames' [1] not equal to array extent >>>>> >>>>> There are 5 arrays in the experiment. >>>>> >>>>> Thank you, >>>>> Viritha >>>>> >>>>> >>>>> On Thu, Dec 16, 2010 at 11:22 AM, James W. MacDonald >>>>> <jmacdon at="" med.umich.edu="">wrote: >>>>> >>>>> Hi Viritha, >>>>> >>>>>> >>>>>> >>>>>> On 12/16/2010 10:45 AM, viritha kaza wrote: >>>>>> >>>>>> Hi Group, >>>>>> >>>>>>> Let me clearly explain.I have the [Mouse430_2] Affymetrix Mouse Genome >>>>>>> 430 >>>>>>> 2.0 Array.I want to create an unnormalised expression microarray data >>>>>>> set.I >>>>>>> have the cell files and cdf file for this.I want the intensities in >>>>>>> the >>>>>>> probe level.Is this possible in R or any other source? or how can I >>>>>>> get >>>>>>> this >>>>>>> expression microarray dataset? >>>>>>> >>>>>>> >>>>>>> library(affy) >>>>>> dat<- ReadAffy() >>>>>> pms<- pm(dat, LISTRUE=TRUE) >>>>>> fun<- function(q,r){ >>>>>> row.names(r)<- rep(q, ncol(r)) >>>>>> r >>>>>> } >>>>>> >>>>>> pms<- mapply(fun, names(pms), pms, SIMPLIFY = FALSE) >>>>>> pms<- do.call("rbind", pms) >>>>>> write.table(pms, "Raw PM data.txt", quote = FALSE, row.names = TRUE, >>>>>> col.names = TRUE, sep = "\t") >>>>>> >>>>>> You can do similar for MM probes if you desire. >>>>>> >>>>>> Best, >>>>>> >>>>>> Jim >>>>>> >>>>>> >>>>>> >>>>>> Thank you in advance, >>>>>> >>>>>> Viritha >>>>>>> >>>>>>> On Wed, Dec 15, 2010 at 4:05 PM, viritha kaza<viritha.k at="" gmail.com=""> >>>>>>> wrote: >>>>>>> >>>>>>> Hi group, >>>>>>> >>>>>>> If I want to create raw txt file of microarray data from the >>>>>>>> (affymetrix) >>>>>>>> cell file, how do I create the expression set with raw signal >>>>>>>> intensity.I >>>>>>>> know that only cell file with the version 3 can be opened as excel >>>>>>>> file >>>>>>>> as >>>>>>>> it is in ascii format. >>>>>>>> In one such cell file the intensity is indicated as: >>>>>>>> CellHeader=X Y MEAN STDV NPIXELS 0 0 137.3 25.1 36 1 0 10730.5 >>>>>>>> 2009.9 >>>>>>>> 36 2 0 136.3 21.2 36 >>>>>>>> But I am not sure how to assign the probe numbers to the >>>>>>>> CellHeaders and I would also like to know if the raw intensity taken >>>>>>>> is >>>>>>>> just >>>>>>>> the mean intensity? Can this be performed in R? >>>>>>>> Waiting for your response, >>>>>>>> Thank you in advance, >>>>>>>> Viritha >>>>>>>> >>>>>>>> >>>>>>>> [[alternative HTML version deleted]] >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at r-project.org >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: >>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>> >>>>>>> >>>>>>> -- >>>>>> James W. MacDonald, M.S. >>>>>> Biostatistician >>>>>> Douglas Lab >>>>>> University of Michigan >>>>>> Department of Human Genetics >>>>>> 5912 Buhl >>>>>> 1241 E. Catherine St. >>>>>> Ann Arbor MI 48109-5618 >>>>>> 734-615-7826 >>>>>> ********************************************************** >>>>>> Electronic Mail is not secure, may not be read every day, and should >>>>>> not >>>>>> be >>>>>> used for urgent or sensitive issues >>>>>> >>>>>> >>>>>> -- >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> Douglas Lab >>>> University of Michigan >>>> Department of Human Genetics >>>> 5912 Buhl >>>> 1241 E. Catherine St. >>>> Ann Arbor MI 48109-5618 >>>> 734-615-7826 >>>> ********************************************************** >>>> Electronic Mail is not secure, may not be read every day, and should not >>>> be >>>> used for urgent or sensitive issues >>>> >>>> >>> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> Douglas Lab >> University of Michigan >> Department of Human Genetics >> 5912 Buhl >> 1241 E. Catherine St. >> Ann Arbor MI 48109-5618 >> 734-615-7826 >> ********************************************************** >> Electronic Mail is not secure, may not be read every day, and should not be >> used for urgent or sensitive issues >> -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 15.1 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks James!!!! The paper that I refered was a recent one 2010 so I thought was easier to follow. I think as you said it might be better to choose an another method. On Fri, Dec 17, 2010 at 11:51 AM, James W. MacDonald <jmacdon@med.umich.edu>wrote: > Hi Viritha, > > > On 12/17/2010 11:11 AM, viritha kaza wrote: > >> Hi James, >> I am actually interested in getting a raw (unnormalised) microarray >> expression dataset. Since I am interested in performing this for many >> datasets, I would like to perform normalization as one of the paper >> suggests >> to remove bias due to the sample preparation and different platforms- >> "Briefly, for each expression data set, individual probe intensity of each >> array was divided by the averaged probe intensity across all arrays within >> the data set, then each value was log (base 2) transformed. For >> normalization, first, average expression value of all probes in each array >> was calculated. Then for each array, expression value of each probe was >> subtracted by the averaged expression value. By doing so, average >> expression >> value of all probes in each array in each expression data set will be >> zero." >> > > Two things here: > > 1.) That normalization is as naive as you can possibly get. We have gone > _way_ past the stage where people think a simple location normalization is a > reasonable thing to do. > > All this does is shift the data so the means line up, not taking into > account that there might be more subtle technical artifacts that should be > removed. You will be much better served by using the stock normalization in > rma(), or if you really want to get fancy, you might want to use vsn. But > you will be regressing to maybe the year 2000 if you use the normalization > you suggest here. > > 2.) The normalization you are considering is designed for spotted arrays, > where each spot measures transcript from two different samples. Because of > that fact, the data are usually reported as a ratio (e.g., cy3/cy5). For > these data, exact equivalence of transcript would be expected to be a 1 > (e.g., equal amounts of cy3 and cy5 fluorescence). If you then take logs, > equivalence will then be equal to zero. > > In that case, taking the mean and subtracting (centering on the mean) is a > reasonable but naive thing to do. However, in your case, the data range from > approximately 2^6 - 2^14 or so. If you take log_2 of these data, they will > then range from 6 - 14. Because they aren't ratios, and they aren't really > symmetrically distributed there isn't a compelling reason to normalize to > zero. > > If you still want to progress with this idea, note that pretty much all of > the summarization methods have a normalize argument, so you can simply set > normalize = FALSE, and you will then get unnormalized, summarized data. > > See e.g., ?rma > > Best, > > Jim > > > > Hence to perform above steps I thought I would need a raw expression >> dataset >> from the cell files afterwhich I can normalise by the above strategy to >> remove bias.So I am expecting to get a single value for each probe in an >> array. >> I hope this helps in understanding what exactly I want the expression >> dataset to be. >> Thanks, >> Viritha >> >> On Fri, Dec 17, 2010 at 10:00 AM, James W. MacDonald >> <jmacdon@med.umich.edu>wrote: >> >> >>> >>> On 12/16/2010 3:35 PM, viritha kaza wrote: >>> >>> Thanks James.There was no error. >>>> But I see that I get 11 values for the same probe.Why does it happen? If >>>> I >>>> perform MM as well then again I would get another file.How do I finally >>>> get >>>> one value for each probe in an array? >>>> >>>> >>> I think we need to back up a bit here. On Affy chips there are multiple >>> probes used to interrogate a single transcript. As you note, for this >>> particular chip there are usually 11 probes. All of the probes for a >>> given >>> transcript make up a probeset. >>> >>> When we process these data, we first background correct and normalize the >>> probe values to eliminate as much non-biological variability as possible, >>> and then we summarize all the probes in each probeset to generate the >>> final >>> value, which we hope is proportional to the expression of the transcript >>> we >>> are trying to measure. >>> >>> So we have to be precise about our terminology. You originally asked for >>> a >>> text file containing unnormalized probe values, which is what the code I >>> supplied does. Evidently that is not what you wanted, so can you >>> precisely >>> state what it is that you do want? >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> >>> >>> Thanks, >>> >>>> Viritha >>>> >>>> On Thu, Dec 16, 2010 at 2:18 PM, James W. MacDonald >>>> <jmacdon@med.umich.edu>wrote: >>>> >>>> Make that >>>> >>>>> >>>>> fun<- function(q,r){ >>>>> row.names(r)<- rep(q, nrow(r)) >>>>> r >>>>> } >>>>> >>>>> Which of course makes more sense. >>>>> >>>>> Jim >>>>> >>>>> >>>>> >>>>> >>>>> On 12/16/2010 12:04 PM, viritha kaza wrote: >>>>> >>>>> Hi James, >>>>> >>>>>> Thanks for your reply, >>>>>> I am new to R statistics. >>>>>> Do I have to give the values for q or r because I am getting the >>>>>> following >>>>>> error when I type mapply command - >>>>>> >>>>>> Error in dimnames(x)<- dn : >>>>>> length of 'dimnames' [1] not equal to array extent >>>>>> >>>>>> There are 5 arrays in the experiment. >>>>>> >>>>>> Thank you, >>>>>> Viritha >>>>>> >>>>>> >>>>>> On Thu, Dec 16, 2010 at 11:22 AM, James W. MacDonald >>>>>> <jmacdon@med.umich.edu>wrote: >>>>>> >>>>>> Hi Viritha, >>>>>> >>>>>> >>>>>>> >>>>>>> On 12/16/2010 10:45 AM, viritha kaza wrote: >>>>>>> >>>>>>> Hi Group, >>>>>>> >>>>>>> Let me clearly explain.I have the [Mouse430_2] Affymetrix Mouse >>>>>>>> Genome >>>>>>>> 430 >>>>>>>> 2.0 Array.I want to create an unnormalised expression microarray >>>>>>>> data >>>>>>>> set.I >>>>>>>> have the cell files and cdf file for this.I want the intensities in >>>>>>>> the >>>>>>>> probe level.Is this possible in R or any other source? or how can I >>>>>>>> get >>>>>>>> this >>>>>>>> expression microarray dataset? >>>>>>>> >>>>>>>> >>>>>>>> library(affy) >>>>>>>> >>>>>>> dat<- ReadAffy() >>>>>>> pms<- pm(dat, LISTRUE=TRUE) >>>>>>> fun<- function(q,r){ >>>>>>> row.names(r)<- rep(q, ncol(r)) >>>>>>> r >>>>>>> } >>>>>>> >>>>>>> pms<- mapply(fun, names(pms), pms, SIMPLIFY = FALSE) >>>>>>> pms<- do.call("rbind", pms) >>>>>>> write.table(pms, "Raw PM data.txt", quote = FALSE, row.names = TRUE, >>>>>>> col.names = TRUE, sep = "\t") >>>>>>> >>>>>>> You can do similar for MM probes if you desire. >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Jim >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thank you in advance, >>>>>>> >>>>>>> Viritha >>>>>>> >>>>>>>> >>>>>>>> On Wed, Dec 15, 2010 at 4:05 PM, viritha kaza<viritha.k@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi group, >>>>>>>> >>>>>>>> If I want to create raw txt file of microarray data from the >>>>>>>> >>>>>>>>> (affymetrix) >>>>>>>>> cell file, how do I create the expression set with raw signal >>>>>>>>> intensity.I >>>>>>>>> know that only cell file with the version 3 can be opened as excel >>>>>>>>> file >>>>>>>>> as >>>>>>>>> it is in ascii format. >>>>>>>>> In one such cell file the intensity is indicated as: >>>>>>>>> CellHeader=X Y MEAN STDV NPIXELS 0 0 137.3 25.1 36 1 0 10730.5 >>>>>>>>> 2009.9 >>>>>>>>> 36 2 0 136.3 21.2 36 >>>>>>>>> But I am not sure how to assign the probe numbers to the >>>>>>>>> CellHeaders and I would also like to know if the raw intensity >>>>>>>>> taken >>>>>>>>> is >>>>>>>>> just >>>>>>>>> the mean intensity? Can this be performed in R? >>>>>>>>> Waiting for your response, >>>>>>>>> Thank you in advance, >>>>>>>>> Viritha >>>>>>>>> >>>>>>>>> >>>>>>>>> [[alternative HTML version deleted]] >>>>>>>>> >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioconductor mailing list >>>>>>>> Bioconductor@r-project.org >>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>> Search the archives: >>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>> James W. MacDonald, M.S. >>>>>>> Biostatistician >>>>>>> Douglas Lab >>>>>>> University of Michigan >>>>>>> Department of Human Genetics >>>>>>> 5912 Buhl >>>>>>> 1241 E. Catherine St. >>>>>>> Ann Arbor MI 48109-5618 >>>>>>> 734-615-7826 >>>>>>> ********************************************************** >>>>>>> Electronic Mail is not secure, may not be read every day, and should >>>>>>> not >>>>>>> be >>>>>>> used for urgent or sensitive issues >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>> James W. MacDonald, M.S. >>>>> Biostatistician >>>>> Douglas Lab >>>>> University of Michigan >>>>> Department of Human Genetics >>>>> 5912 Buhl >>>>> 1241 E. Catherine St. >>>>> Ann Arbor MI 48109-5618 >>>>> 734-615-7826 >>>>> ********************************************************** >>>>> Electronic Mail is not secure, may not be read every day, and should >>>>> not >>>>> be >>>>> used for urgent or sensitive issues >>>>> >>>>> >>>>> >>>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> Douglas Lab >>> University of Michigan >>> Department of Human Genetics >>> 5912 Buhl >>> 1241 E. Catherine St. >>> Ann Arbor MI 48109-5618 >>> 734-615-7826 >>> ********************************************************** >>> Electronic Mail is not secure, may not be read every day, and should not >>> be >>> used for urgent or sensitive issues >>> >>> > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues > [[alternative HTML version deleted]]

ADD REPLY • link 15.1 years ago viritha kaza ▴ 580

Login before adding your answer.