Help with Oligo Package: Extracting and Graphing

0

Entering edit mode

Noah Dowell ▴ 410

@noah-dowell-3791

Last seen 9.6 years ago

Hello All, I am completely new to using Bioconductor and R so please excuse the simplistic questions. I have combed through this message board and the general web for help over the past several weeks before posting, but I may have missed the answers to my inquires. My experimental system: I have done a chromatin IP experiment and hybridized the DNA to the Saccharomyces cerevisiae Affymetrix 1.0R TIling Array. I also hybridized genomic DNA to another chip as a control that I plan to use to determine ChIP enriched regions. I have biological replicates for both experiments (Input control and protein IP). Most of the packages on the Bioconductor site appear focused on extraction and analysis on Affymetrix mRNA Expression data, including the tilingArray package for transcriptome mapping. I tried using the Starr package but had problems creating and using a GFF annotation file for the whole genome(maybe topics for another post.) Now I am trying to use the oligo package. What I have done: I was able to use the pdInfoBuilder package vignette to make an annotation file: pd.sc03b.mr.v04. I even got this file installed and was able to read in my 6 CEL files (4 experimental and 2 control). I now have a "TilingFeatureSet" that I have explored a little with simple R commands. I have been able to extract the PM probes using: > pm (data) #grabbing all PM probe intensities >pmChr(data) >pmPosition(data) I then used the preProcessingCore functions to do the following: > normPMint < normalize.quantiles(pm_int_data) # I think this is correct but maybe I should have normalized on both the PM and MM probes. # It appears like I should not be using the popular rma normalization due to the lack of "probe sets" on the tiling array that would need to have their signal analyzed as a # whole and then the median determined. I am not sure if a rank percentile type normalization should be used and if I can access that from the oligo package? I also tried some simple subsetting using R commands: >norm_Input <- normPM_int[,3:4] # The PM function put the all of the experimental data into a matrix so I just took out the relevant columns. >avg_nrmInput <- rowMeans(norm_Input) # Decided to average the replicates. > ExptRatio <- avg_nrm_Expt/avgnrmInput # Took the ratio of the ChIP Experimental samples over the Input samples to get an "enrichment over genomic DNA." # Then I put this together with the chromosome location and genomic position info: > ExptRatioMatrix <- as.matrix(ExptRatio) > ExptChrmPos <- as.data.frame(cbind(ExptRatioMatrix[,1],chrmMatrix[, 1],positionMatrix[,1])) Questions I have: 1. Is my workflow completely wrong or unnecessary? I think the oligo package created an ExpressionSet after reading in my CEL files. Should I simply be focusing my efforts on using functions to manipulate ExpressionSets. It is not clear if I have to load in another library to call expression set functions. It appears as if the output of oligo is different than affy so I CANNOT use the functions in affy that operate on that ExpressionSet. 2. Are there multiple structures for ExpressionSets so that they have common elements but are put together differently thus making them exclusive and requiring distinct functions? 3. How do I graph/visualize the output of the oligo package? I have been able to plot the histogram and boxplots of the unnormalized and normalized data which has been helpful for the primary analysis of data quality and what the statistical processing is doing. I know there are packages for plotting along chromosomes or against genomic features (the Transcription Start Site would be awesome) but they all seem to be written for affy package outputs so it is unclear to me how I can crossover to using the extensive graphics created for those applications now that I have gone down the oligo road. Any help on this would be greatly appreciated so I can move into looking at the biological implications of the data. 4. The position vector in the feature set is one number corresponding to a probe's genomic position. Is it the center or the 3' (or 5') end of the 25 nucleotide long probe? I thought it should have a start and stop point in the genome, but I must be missing something. 5. Do I need to create a file to blend my "Intensity, Chromosome, Position" file with yeast genomic features like ARS, genes, telomeres, etc? I am not sure if this info is already in my TilingFeatureSet. I want this info to graph my ChIP enriched regions with respect to genes or telomeres, etc. This is where Starr broke down for me so I might be making large systemic errors. Thank you for any input and assistance. Noah > sessionInfo() R version 2.10.0 (2009-10-26) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] pd.sc03b.mr.v04_0.0.1 RSQLite_0.7-3 DBI_0.2-4 oligo_1.10.0 [5] preprocessCore_1.8.0 oligoClasses_1.8.0 Biobase_2.6.0 loaded via a namespace (and not attached): [1] affxparser_1.18.0 affyio_1.14.0 Biostrings_2.14.0 IRanges_1.4.0 splines_2.10.0 [6] tools_2.10.0 [[alternative HTML version deleted]]

Transcription Normalization Yeast probe affy graph tilingArray oligo pdInfoBuilder Starr • 1.2k views

ADD COMMENT • link updated 14.5 years ago by Tobias Straub ▴ 430 • written 14.5 years ago by Noah Dowell ▴ 410

0

Entering edit mode

Tobias Straub ▴ 430

@tobias-straub-2182

Last seen 9.6 years ago

Hi Noah, did you contact Benedikt, the author of the Starr package? the department he's working in is performing tiling array analysis in S. cerevisiae (most likely on exactly the same array as yours) and i am pretty sure they can provide not only the proper pd file but also a 'working' gff file. i guess that the Starr package would provide most of the functions you are asking for. best t. On Nov 10, 2009, at 8:39 PM, Noah Dowell wrote: > Hello All, > > I am completely new to using Bioconductor and R so please excuse the > simplistic questions. I have combed through this message board and > the general web for help over the past several weeks before posting, > but I may have missed the answers to my inquires. > > My experimental system: > I have done a chromatin IP experiment and hybridized the DNA to the > Saccharomyces cerevisiae Affymetrix 1.0R TIling Array. I also > hybridized genomic DNA to another chip as a control that I plan to use > to determine ChIP enriched regions. I have biological replicates for > both experiments (Input control and protein IP). > > Most of the packages on the Bioconductor site appear focused on > extraction and analysis on Affymetrix mRNA Expression data, including > the tilingArray package for transcriptome mapping. I tried using the > Starr package but had problems creating and using a GFF annotation > file for the whole genome(maybe topics for another post.) Now I am > trying to use the oligo package. > > What I have done: > I was able to use the pdInfoBuilder package vignette to make an > annotation file: pd.sc03b.mr.v04. I even got this file installed and > was able to read in my 6 CEL files (4 experimental and 2 control). I > now have a "TilingFeatureSet" that I have explored a little with > simple R commands. I have been able to extract the PM probes using: > >> pm (data) #grabbing all PM probe intensities >> pmChr(data) >> pmPosition(data) > > I then used the preProcessingCore functions to do the following: > >> normPMint > # I think this is correct but maybe I should have normalized on both > the PM and MM probes. > # It appears like I should not be using the popular rma normalization > due to the lack of "probe sets" on the tiling array that would need to > have their signal analyzed as a > # whole and then the median determined. I am not sure if a rank > percentile type normalization should be used and if I can access that > from the oligo package? > > I also tried some simple subsetting using R commands: > >> norm_Input <- normPM_int[,3:4] # The PM function put the all of the > experimental data into a matrix so I just took out the relevant > columns. >> avg_nrmInput <- rowMeans(norm_Input) # Decided to average the > replicates. > >> ExptRatio <- avg_nrm_Expt/avgnrmInput # Took the ratio of the ChIP > Experimental samples over the Input samples to get an "enrichment over > genomic DNA." > > # Then I put this together with the chromosome location and genomic > position info: > >> ExptRatioMatrix <- as.matrix(ExptRatio) >> ExptChrmPos <- as.data.frame(cbind(ExptRatioMatrix[,1],chrmMatrix[, > 1],positionMatrix[,1])) > > > Questions I have: > > 1. Is my workflow completely wrong or unnecessary? I think the oligo > package created an ExpressionSet after reading in my CEL files. > Should I simply be focusing my efforts on using functions to > manipulate ExpressionSets. It is not clear if I have to load in > another library to call expression set functions. It appears as if > the output of oligo is different than affy so I CANNOT use the > functions in affy that operate on that ExpressionSet. > > 2. Are there multiple structures for ExpressionSets so that they have > common elements but are put together differently thus making them > exclusive and requiring distinct functions? > > 3. How do I graph/visualize the output of the oligo package? I have > been able to plot the histogram and boxplots of the unnormalized and > normalized data which has been helpful for the primary analysis of > data quality and what the statistical processing is doing. I know > there are packages for plotting along chromosomes or against genomic > features (the Transcription Start Site would be awesome) but they all > seem to be written for affy package outputs so it is unclear to me how > I can crossover to using the extensive graphics created for those > applications now that I have gone down the oligo road. Any help on > this would be greatly appreciated so I can move into looking at the > biological implications of the data. > > 4. The position vector in the feature set is one number corresponding > to a probe's genomic position. Is it the center or the 3' (or 5') end > of the 25 nucleotide long probe? I thought it should have a start and > stop point in the genome, but I must be missing something. > > 5. Do I need to create a file to blend my "Intensity, Chromosome, > Position" file with yeast genomic features like ARS, genes, telomeres, > etc? I am not sure if this info is already in my TilingFeatureSet. I > want this info to graph my ChIP enriched regions with respect to genes > or telomeres, etc. This is where Starr broke down for me so I might > be making large systemic errors. > > Thank you for any input and assistance. > > Noah > >> sessionInfo() > R version 2.10.0 (2009-10-26) > i386-apple-darwin9.8.0 > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] pd.sc03b.mr.v04_0.0.1 RSQLite_0.7-3 DBI_0.2-4 > oligo_1.10.0 > [5] preprocessCore_1.8.0 oligoClasses_1.8.0 Biobase_2.6.0 > > loaded via a namespace (and not attached): > [1] affxparser_1.18.0 affyio_1.14.0 Biostrings_2.14.0 > IRanges_1.4.0 splines_2.10.0 > [6] tools_2.10.0 > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------------------------------------------------------------------- Dr. Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, M?nchen D

ADD COMMENT • link 14.5 years ago Tobias Straub ▴ 430

Login before adding your answer.