Entering edit mode
                    Noah Dowell
        
    
        ▴
    
    410
        @noah-dowell-3791
        Last seen 11.1 years ago
        
    Hello All,
I am completely new to using Bioconductor and R so please excuse the
simplistic questions.  I have combed through this message board and
the general web for help over the past several weeks before posting,
but I may have missed the answers to my inquires.
My experimental system:
I have done a chromatin IP experiment and hybridized the DNA to the
Saccharomyces cerevisiae Affymetrix 1.0R TIling Array.  I also
hybridized genomic DNA to another chip as a control that I plan to use
to determine ChIP enriched regions.  I have biological replicates for
both experiments (Input control and protein IP).
Most of the packages on the Bioconductor site appear focused on
extraction and analysis on Affymetrix mRNA Expression data, including
the tilingArray package for transcriptome mapping.  I tried using the
Starr package but had problems creating and using a GFF annotation
file for the whole genome(maybe topics for another post.)  Now I am
trying to use the oligo package.
What I have done:
I was able to use the pdInfoBuilder package vignette to make an
annotation file: pd.sc03b.mr.v04.  I even got this file installed and
was able to read in my 6 CEL files (4 experimental and 2 control).  I
now have a "TilingFeatureSet" that I have explored a little with
simple R commands.  I have been able to extract the PM probes using:
 > pm (data)  #grabbing all PM probe intensities
 >pmChr(data)
 >pmPosition(data)
I then used the preProcessingCore functions to do the following:
 > normPMint < normalize.quantiles(pm_int_data)
#  I think this is correct but maybe I should have normalized on both
the PM and MM probes.
# It appears like I should not be using the popular rma normalization
due to the lack of "probe sets" on the tiling array that would need to
have their signal analyzed as a
# whole and then the median determined.  I am not sure if a rank
percentile type normalization should be used and if I can access that
from the oligo package?
I also tried some simple subsetting using R commands:
 >norm_Input <- normPM_int[,3:4] # The PM function put the all of the
experimental data into a matrix so I just took out the relevant
columns.
 >avg_nrmInput <- rowMeans(norm_Input)  # Decided to average the
replicates.
 > ExptRatio <- avg_nrm_Expt/avgnrmInput # Took the ratio of the ChIP
Experimental samples over the Input samples to get an "enrichment over
genomic DNA."
# Then I put this together with the chromosome location and genomic
position info:
 > ExptRatioMatrix <- as.matrix(ExptRatio)
 > ExptChrmPos <- as.data.frame(cbind(ExptRatioMatrix[,1],chrmMatrix[,
1],positionMatrix[,1]))
Questions I have:
1.  Is my workflow completely wrong or unnecessary?  I think the oligo
package created an ExpressionSet after reading in my CEL files.
Should I simply be focusing my efforts on using functions to
manipulate ExpressionSets.  It is not clear if I have to load in
another library to call expression set functions.  It appears as if
the output of oligo is different than affy so I CANNOT use the
functions in affy that operate on that ExpressionSet.
2. Are there multiple structures for ExpressionSets so that they have
common elements but are put together differently thus making them
exclusive and requiring distinct functions?
3.  How do I graph/visualize the output of the oligo package?  I have
been able to plot the histogram and boxplots of the unnormalized and
normalized data which has been helpful for the primary analysis of
data quality and what the statistical processing is doing.  I know
there are packages for plotting along chromosomes or against genomic
features (the Transcription Start Site would be awesome) but they all
seem to be written for affy package outputs so it is unclear to me how
I can crossover to using the extensive graphics created for those
applications now that I have gone down the oligo road.  Any help on
this would be greatly appreciated so I can move into looking at the
biological implications of the data.
4.  The position vector in the feature set is one number corresponding
to a probe's genomic position.  Is it the center or the 3' (or 5') end
of the 25 nucleotide long probe?  I thought it should have a start and
stop point in the genome, but I must be missing something.
5.  Do I need to create a file to blend my "Intensity, Chromosome,
Position" file with yeast genomic features like ARS, genes, telomeres,
etc?  I am not sure if this info is already in my TilingFeatureSet.  I
want this info to graph my ChIP enriched regions with respect to genes
or telomeres, etc.  This is where Starr broke down for me so I might
be making large systemic errors.
Thank you for any input and assistance.
Noah
 > sessionInfo()
R version 2.10.0 (2009-10-26)
i386-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
other attached packages:
[1] pd.sc03b.mr.v04_0.0.1 RSQLite_0.7-3         DBI_0.2-4
oligo_1.10.0
[5] preprocessCore_1.8.0  oligoClasses_1.8.0    Biobase_2.6.0
loaded via a namespace (and not attached):
[1] affxparser_1.18.0 affyio_1.14.0     Biostrings_2.14.0
IRanges_1.4.0     splines_2.10.0
[6] tools_2.10.0
        [[alternative HTML version deleted]]
                    
                
                