request for simple usage of probe level normalisations

0

Entering edit mode

Ido M. Tamir ▴ 320

@ido-m-tamir-1268

Last seen 11.2 years ago

Dear All, a) I don't know, if sequence based models like GCRMA, which I read stands actually for "GeneChip (tm)" not GC content, can be extended to other platforms. I am just looking at single color agilent chips and there is a gc content bias: log2(intensity)~gc percentage: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.023391 0.105189 9.729 <2e-16 *** gcp 0.121826 0.002503 48.666 <2e-16 *** b) I know one should not request/ask open source deveoplers for something but: if GCRMA is applicable to other platforms, then it would be nice if it could be used in a simple way with these other platforms, and new platforms for oligo chips are getting more and more common. I read the information from the oligo package and of the makePDpackage which seems to be superseeded in the future by the pdInfoBuilder. Would it be possible to make this simpler somehow? I don't know exactly what information is actually needed by the downstream analysis with GCRMA, but wouldn't it be sufficient that for the creation of a new environment I would need just 2 simple tab delimited text files*. Then one could simply make a script that converts ones own format (which are not .ndf or .cdf) to this _simple_ tab delimited format whose specification is clearly outlined in the package vignette. Maybe I am underestimating the complexity (ignoring spatial information on chip) or its already there (yes, cdf etc.. files can be faked). thank you very much, ido *eg.: file1: oligo name, sequence, gene name (for grouping multiple oligos) file2: annotation gene or oligo name (if not grouped), annotations....

cdf gcrma oligo pdInfoBuilder cdf gcrma oligo pdInfoBuilder • 1.1k views

ADD COMMENT • link updated 18.1 years ago by Kasper Daniel Hansen ★ 6.5k • written 18.1 years ago by Ido M. Tamir ▴ 320

0

Entering edit mode

Kasper Daniel Hansen ★ 6.5k

@kasper-daniel-hansen-2979

Last seen 2.4 years ago

United States

On Oct 8, 2007, at 2:30 AM, Ido M. Tamir wrote: > Dear All, > > a) I don't know, if sequence based models like GCRMA, which I read > stands > actually for "GeneChip (tm)" not GC content, can be extended to > other platforms. > I am just looking at single color agilent chips and there is a gc > content > bias: > log2(intensity)~gc percentage: > Coefficients: Estimate Std. Error t value Pr(>|t|) > (Intercept) 1.023391 0.105189 9.729 <2e-16 *** > gcp 0.121826 0.002503 48.666 <2e-16 *** As you should know GCRMA actually consists of 3 steps: background correction, normalization and summarization. The last two steps are the same as in RMA and the third step (summarization) requires the concept of a probeset, ie. several probes targeting the same gene (or transcript or whatever you are trying to measure). It is not clear to me that Agilent arrays have probesets, although it of course depends on the design. The background correction is really the thing where GCRMA uses probe sequence information. What the authors of GCRMA have done, is estimate some parameters related to 25-mer oligos in a reference experiment (well to be precise I remember it as a large pool of many experiments). These parameters are then _postulated_ to be relevant for all affy chips (with some justification). As a minimum, if you want to use GCRMA on another platform you would need to do some kind of estimation of these parameters - especially if the other platform uses different length oligos, as Agilent does (although I guess you could get 25mer arrays from Agilent). Then you would need to have some kind of spike in experiment to show that it really helps you on this other platform. With such reference data it would not be hard to use the GCRMA algorithm for another chip - at least the background correction part. > b) I know one should not request/ask open source deveoplers for > something > > but: > if GCRMA is applicable to other platforms, then it would be nice if > it could > be used in a simple way with these other platforms, and new > platforms for > oligo chips are getting more and more common. > > I read the information from the oligo package and of the makePDpackage > which seems to be superseeded in the future by the pdInfoBuilder. > > Would it be possible to make this simpler somehow? I don't know > exactly what > information is actually needed by the downstream analysis with > GCRMA, but > wouldn't it be sufficient that for the creation of a new > environment I would > need just 2 simple tab delimited text files*. Then one could simply > make a > script that converts ones own format (which are not .ndf or .cdf) > to this > _simple_ tab delimited format whose specification is clearly > outlined in the > package vignette. > > Maybe I am underestimating the complexity (ignoring spatial > information on > chip) or its already there (yes, cdf etc.. files can be faked). The pdInfo path taken by oligo is only (as far as I know) developed to be applicable to Affy and Nimblegen arrays. The designers of that package have taken a comprehensive approach to their design where they construct data structures having a ton of information about the chip. In principle you are right: most of the info (I am not certain about all, because I am not fully up to date) could be constructed as you say, but since they are only really trying to design for Affy and Nimblegen I assume they are using standard files from these manufacturers. Kasper > thank you very much, > ido > > *eg.: > file1: > oligo name, sequence, gene name (for grouping multiple oligos) > file2: annotation > gene or oligo name (if not grouped), annotations.... > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor

ADD COMMENT • link 18.1 years ago Kasper Daniel Hansen ★ 6.5k

Login before adding your answer.