General question about library files

0

Entering edit mode

Arkady ▴ 60

@arkady-2936

Last seen 11.3 years ago

Hi everyone, This is more of a general question. I'm fairly new to array analysis (jumping right into the deep end here, looking at whole-genome tiling arrays), and I'm having trouble sorting out in my head exactly what data is stored in each Affy filetype. It seems obvious that the CEL files contain the raw intensities from the arrays themselves. Still, I'm not sure how these CELs are organized--is it one CEL per chip? How do I know which metadata files match with a specific CEL? I also see that the BPMAP files contain design information for the arrays. What I'm less clear on is why these have genome builds in the names. For example, I got NCBIv36 bpmaps from Harvard, but Affy makes an earlier build available (v34, I think). The probes are, of course, the same (right?). Thus, does it matter to Bioconductor which build I'm using? I'm much less clear on CIFs and CDFs. How do these differ, and what information do they contain? Affymetrix provides only very vague descriptions on its website: "The CDF file describes the layout for an Affymetrix GeneChip array." Gee, thanks. How does that differ from a BPMAP? Why do the makePdInfoBuilder code samples use CIFs instead of CDFs? I've been looking for a good resource to help me get a handle on this stuff. I see lots of tutorials and stuff for analyzing microarrays, but little for tiling arrays (yay cutting edge). Anyone have any pointers? Thanks so much for the help. Cheers, John Woods

cdf affy cdf affy • 1.3k views

ADD COMMENT • link updated 17.3 years ago by Steve Lianoglou ★ 13k • written 17.3 years ago by Arkady ▴ 60

0

Entering edit mode

Henrik Bengtsson ★ 2.4k

@henrik-bengtsson-4333

Last seen 19 months ago

United States

Hi, On Fri, Aug 15, 2008 at 1:12 PM, John O. Woods <bamboowarrior at="" gmail.com=""> wrote: > Hi everyone, > > This is more of a general question. I'm fairly new to array analysis > (jumping right into the deep end here, looking at whole-genome tiling > arrays), and I'm having trouble sorting out in my head exactly what > data is stored in each Affy filetype. The document 'Affymetrix Data File Formats' by Affymetrix may be useful (though technical but it does not leave anything behind): http://www.affymetrix.com/support/developer/powertools/changelog/gcos- agcc/ /Henrik > > It seems obvious that the CEL files contain the raw intensities from > the arrays themselves. Still, I'm not sure how these CELs are > organized--is it one CEL per chip? How do I know which metadata files > match with a specific CEL? > > I also see that the BPMAP files contain design information for the > arrays. What I'm less clear on is why these have genome builds in the > names. For example, I got NCBIv36 bpmaps from Harvard, but Affy makes > an earlier build available (v34, I think). The probes are, of course, > the same (right?). Thus, does it matter to Bioconductor which build > I'm using? > > I'm much less clear on CIFs and CDFs. How do these differ, and what > information do they contain? Affymetrix provides only very vague > descriptions on its website: "The CDF file describes the layout for an > Affymetrix GeneChip array." Gee, thanks. How does that differ from a > BPMAP? Why do the makePdInfoBuilder code samples use CIFs instead of > CDFs? > > I've been looking for a good resource to help me get a handle on this > stuff. I see lots of tutorials and stuff for analyzing microarrays, > but little for tiling arrays (yay cutting edge). Anyone have any > pointers? > > Thanks so much for the help. > > Cheers, > John Woods > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 17.3 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 4 hours ago

United States

Hi John, John O. Woods wrote: > Hi everyone, > > This is more of a general question. I'm fairly new to array analysis > (jumping right into the deep end here, looking at whole-genome tiling > arrays), and I'm having trouble sorting out in my head exactly what > data is stored in each Affy filetype. > > It seems obvious that the CEL files contain the raw intensities from > the arrays themselves. Still, I'm not sure how these CELs are > organized--is it one CEL per chip? How do I know which metadata files > match with a specific CEL? Yes, one celfile contains data from one chip. You can get header information from the celfile using readCelHeader() in affxparser: headerinfo <- readCelHeader(celfilename) > > I also see that the BPMAP files contain design information for the > arrays. What I'm less clear on is why these have genome builds in the > names. For example, I got NCBIv36 bpmaps from Harvard, but Affy makes > an earlier build available (v34, I think). The probes are, of course, > the same (right?). Thus, does it matter to Bioconductor which build > I'm using? Well, the probes are mapped to the genome based on whatever build you are using. Since the genome is still pretty fluid, the mapping from probe to genome location may change from build to build. > > I'm much less clear on CIFs and CDFs. How do these differ, and what > information do they contain? Affymetrix provides only very vague > descriptions on its website: "The CDF file describes the layout for an > Affymetrix GeneChip array." Gee, thanks. How does that differ from a > BPMAP? Why do the makePdInfoBuilder code samples use CIFs instead of > CDFs? The cdf is an older file format that Affy appears to be migrating away from. It really only gave mappings from (x, y) coordinates to probeset ID, whereas the bpmap and clf files contain more information. Since Affy doesn't support the cdf file format for a lot of the new chips, makePdInfoBuilder uses the supported format. Best, Jim > > I've been looking for a good resource to help me get a handle on this > stuff. I see lots of tutorials and stuff for analyzing microarrays, > but little for tiling arrays (yay cutting edge). Anyone have any > pointers? > > Thanks so much for the help. > > Cheers, > John Woods > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662

ADD COMMENT • link 17.3 years ago James W. MacDonald 68k

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 27 days ago

United States

Hi John, As the previous two replies from James and Henrik seem to have addressed most of your questions, I'll just go to the one that wasn't: On Aug 15, 2008, at 4:12 PM, John O. Woods wrote: <snip> > I've been looking for a good resource to help me get a handle on this > stuff. I see lots of tutorials and stuff for analyzing microarrays, > but little for tiling arrays (yay cutting edge). Anyone have any > pointers? </snip> The thing about tiling arrays is that there aren't that many tutorials readily available for you to apply out of the box. You didn't mention what you are using the tiling array for. I'll assume you're using it for something like transcript mapping, just because that's what I was trying to do. If you're using it for ChIP- chip, the Ringo and oligo packages (I've been told) are a good place to start. (1) the BioC::tillingArray and BioC:davidTiling packages could be a good place to start: http://www.bioconductor.org/packages/2.2/bioc/html/tilingArray.html http://www.bioconductor.org/packages/release/data/experiment/html/davi dTiling.html You'll most likely find it hard to directly apply the tilingArray package to your problem at hand, so the publications that describe the technique are also quite helpful to get you moving in the right direction: Transcript mapping with high-density oligonucleotide tiling arrays http://bioinformatics.oxfordjournals.org/cgi/content/abstract/ 22/16/1963 [method] A high-resolution map of transcription in the yeast genome http://www.pnas.org/content/103/14/5320.abstract [application] It's definitely a good read to start with. One issue is that their normalization method requires experiments that use genomic DNA hybridized to the chip. Since this isn't "standard," it's quite likely (and unfortunate) that you won't be able to use it. Still, you should read it :-) If you can't use their normalization, you may still be able to use the tilingArray package for its segmentation algorithm. (2) If you can't go with (1), you can look at: At-TAX: a whole genome tiling array resource for developmental expression analysis and transcript identification in Arabidopsis thaliana [application/method] http://genomebiology.com/2008/9/7/R112/abstract Transcript Normalization and Segmentation of Tiling Array Data http://www.fml.mpg.de/raetsch/projects/PSBTiling The latter is a thorough write-up of the technique used in the former. Their normalization code is made available through the second link as well. It is written in MATLAB, however. In order to use it, you will have to annotate your probe-data and shoe-horn it into their pipeline w/ the appropriate tiling array data. Hope that helps, -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Cornell Medical College of Cornell University http://cbio.mskcc.org/~lianos

ADD COMMENT • link 17.3 years ago Steve Lianoglou ★ 13k

Login before adding your answer.