General question about library files
3
0
Entering edit mode
Arkady ▴ 60
@arkady-2936
Last seen 10.2 years ago
Hi everyone, This is more of a general question. I'm fairly new to array analysis (jumping right into the deep end here, looking at whole-genome tiling arrays), and I'm having trouble sorting out in my head exactly what data is stored in each Affy filetype. It seems obvious that the CEL files contain the raw intensities from the arrays themselves. Still, I'm not sure how these CELs are organized--is it one CEL per chip? How do I know which metadata files match with a specific CEL? I also see that the BPMAP files contain design information for the arrays. What I'm less clear on is why these have genome builds in the names. For example, I got NCBIv36 bpmaps from Harvard, but Affy makes an earlier build available (v34, I think). The probes are, of course, the same (right?). Thus, does it matter to Bioconductor which build I'm using? I'm much less clear on CIFs and CDFs. How do these differ, and what information do they contain? Affymetrix provides only very vague descriptions on its website: "The CDF file describes the layout for an Affymetrix GeneChip array." Gee, thanks. How does that differ from a BPMAP? Why do the makePdInfoBuilder code samples use CIFs instead of CDFs? I've been looking for a good resource to help me get a handle on this stuff. I see lots of tutorials and stuff for analyzing microarrays, but little for tiling arrays (yay cutting edge). Anyone have any pointers? Thanks so much for the help. Cheers, John Woods
cdf affy cdf affy • 1.1k views
ADD COMMENT
0
Entering edit mode
@henrik-bengtsson-4333
Last seen 6 months ago
United States
Hi, On Fri, Aug 15, 2008 at 1:12 PM, John O. Woods <bamboowarrior at="" gmail.com=""> wrote: > Hi everyone, > > This is more of a general question. I'm fairly new to array analysis > (jumping right into the deep end here, looking at whole-genome tiling > arrays), and I'm having trouble sorting out in my head exactly what > data is stored in each Affy filetype. The document 'Affymetrix Data File Formats' by Affymetrix may be useful (though technical but it does not leave anything behind): http://www.affymetrix.com/support/developer/powertools/changelog/gcos- agcc/ /Henrik > > It seems obvious that the CEL files contain the raw intensities from > the arrays themselves. Still, I'm not sure how these CELs are > organized--is it one CEL per chip? How do I know which metadata files > match with a specific CEL? > > I also see that the BPMAP files contain design information for the > arrays. What I'm less clear on is why these have genome builds in the > names. For example, I got NCBIv36 bpmaps from Harvard, but Affy makes > an earlier build available (v34, I think). The probes are, of course, > the same (right?). Thus, does it matter to Bioconductor which build > I'm using? > > I'm much less clear on CIFs and CDFs. How do these differ, and what > information do they contain? Affymetrix provides only very vague > descriptions on its website: "The CDF file describes the layout for an > Affymetrix GeneChip array." Gee, thanks. How does that differ from a > BPMAP? Why do the makePdInfoBuilder code samples use CIFs instead of > CDFs? > > I've been looking for a good resource to help me get a handle on this > stuff. I see lots of tutorials and stuff for analyzing microarrays, > but little for tiling arrays (yay cutting edge). Anyone have any > pointers? > > Thanks so much for the help. > > Cheers, > John Woods > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 19 hours ago
United States
Hi John, John O. Woods wrote: > Hi everyone, > > This is more of a general question. I'm fairly new to array analysis > (jumping right into the deep end here, looking at whole-genome tiling > arrays), and I'm having trouble sorting out in my head exactly what > data is stored in each Affy filetype. > > It seems obvious that the CEL files contain the raw intensities from > the arrays themselves. Still, I'm not sure how these CELs are > organized--is it one CEL per chip? How do I know which metadata files > match with a specific CEL? Yes, one celfile contains data from one chip. You can get header information from the celfile using readCelHeader() in affxparser: headerinfo <- readCelHeader(celfilename) > > I also see that the BPMAP files contain design information for the > arrays. What I'm less clear on is why these have genome builds in the > names. For example, I got NCBIv36 bpmaps from Harvard, but Affy makes > an earlier build available (v34, I think). The probes are, of course, > the same (right?). Thus, does it matter to Bioconductor which build > I'm using? Well, the probes are mapped to the genome based on whatever build you are using. Since the genome is still pretty fluid, the mapping from probe to genome location may change from build to build. > > I'm much less clear on CIFs and CDFs. How do these differ, and what > information do they contain? Affymetrix provides only very vague > descriptions on its website: "The CDF file describes the layout for an > Affymetrix GeneChip array." Gee, thanks. How does that differ from a > BPMAP? Why do the makePdInfoBuilder code samples use CIFs instead of > CDFs? The cdf is an older file format that Affy appears to be migrating away from. It really only gave mappings from (x, y) coordinates to probeset ID, whereas the bpmap and clf files contain more information. Since Affy doesn't support the cdf file format for a lot of the new chips, makePdInfoBuilder uses the supported format. Best, Jim > > I've been looking for a good resource to help me get a handle on this > stuff. I see lots of tutorials and stuff for analyzing microarrays, > but little for tiling arrays (yay cutting edge). Anyone have any > pointers? > > Thanks so much for the help. > > Cheers, > John Woods > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 21 months ago
United States
Hi John, As the previous two replies from James and Henrik seem to have addressed most of your questions, I'll just go to the one that wasn't: On Aug 15, 2008, at 4:12 PM, John O. Woods wrote: <snip> > I've been looking for a good resource to help me get a handle on this > stuff. I see lots of tutorials and stuff for analyzing microarrays, > but little for tiling arrays (yay cutting edge). Anyone have any > pointers? </snip> The thing about tiling arrays is that there aren't that many tutorials readily available for you to apply out of the box. You didn't mention what you are using the tiling array for. I'll assume you're using it for something like transcript mapping, just because that's what I was trying to do. If you're using it for ChIP- chip, the Ringo and oligo packages (I've been told) are a good place to start. (1) the BioC::tillingArray and BioC:davidTiling packages could be a good place to start: http://www.bioconductor.org/packages/2.2/bioc/html/tilingArray.html http://www.bioconductor.org/packages/release/data/experiment/html/davi dTiling.html You'll most likely find it hard to directly apply the tilingArray package to your problem at hand, so the publications that describe the technique are also quite helpful to get you moving in the right direction: Transcript mapping with high-density oligonucleotide tiling arrays http://bioinformatics.oxfordjournals.org/cgi/content/abstract/ 22/16/1963 [method] A high-resolution map of transcription in the yeast genome http://www.pnas.org/content/103/14/5320.abstract [application] It's definitely a good read to start with. One issue is that their normalization method requires experiments that use genomic DNA hybridized to the chip. Since this isn't "standard," it's quite likely (and unfortunate) that you won't be able to use it. Still, you should read it :-) If you can't use their normalization, you may still be able to use the tilingArray package for its segmentation algorithm. (2) If you can't go with (1), you can look at: At-TAX: a whole genome tiling array resource for developmental expression analysis and transcript identification in Arabidopsis thaliana [application/method] http://genomebiology.com/2008/9/7/R112/abstract Transcript Normalization and Segmentation of Tiling Array Data http://www.fml.mpg.de/raetsch/projects/PSBTiling The latter is a thorough write-up of the technique used in the former. Their normalization code is made available through the second link as well. It is written in MATLAB, however. In order to use it, you will have to annotate your probe-data and shoe-horn it into their pipeline w/ the appropriate tiling array data. Hope that helps, -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Cornell Medical College of Cornell University http://cbio.mskcc.org/~lianos
ADD COMMENT

Login before adding your answer.

Traffic: 703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6