Hi, Xian
I would suggest having a look at the flowClust package. It is used for
clustering / gating flow data in one or multiple dimensions via
mixture modelling and should be well suited for estimating proportions
of cell populations in different phases of the cell cycle. flowClust
outputs a model object that contains the proportions of each component
in the model (fraction of total cells represented by that component),
as well as the mean (location), standard deviation, and other model
parameters. The proportions and means are probably all that you are
looking for. If you need to match components across multiple samples,
use the estimated component means. A very basic example is below. See
the package vignette for further details.
I don't usually work with the type of data you describe so there are
probably domain specific subtleties I'm not familiar with. If you run
into problems, please let me know, I'll be glad to tweak the package
to make it more effective / useful for such an application. At the
least I'll update the vignette to provide further examples.
Hope this helps,
Greg.
#Simple example - flowClust in 1D fitting 3 components.
require(flowClust)
require(flowViz)
#Some artificial data, Sample X, 3 components, real proportions are
33.3%, 16.66%, and 50%
X<-as.matrix(c(rnorm(1000,mean=10,sd=sqrt(2)),rnorm(500,mean=20,sd=sqr
t(2)),rnorm(1500,mean=30,sd=sqrt(2))))
colnames(X)<-"A"
X<-round(X)
X<-flowFrame(X)
#Sample Y , 3 components, real proportions are 25%, 25%, and 50%,
peaks are shifted slightly.
Y<-as.matrix(c(rnorm(750,mean=11,sd=sqrt(2)),rnorm(750,mean=23,sd=sqrt
(2)),rnorm(1500,mean=29,sd=sqrt(2))))
colnames(Y)<-"A"
Y<-round(Y)
Y<-flowFrame(Y)
par(mfrow=c(1,2))
plot(X,breaks=256)
plot(Y,breaks=256)
#If we know the data has 3 components:
f1<-flowClust(X,K=3,varNames=c("A"))
f2<-flowClust(Y,K=3,varNames=c("A"))
#plot the result
par(mfrow=c(1,2))
hist(f1,data=X)
hist(f2,data=Y)
#The order of components may be different above. Use the estimated
means to reorder them
f1 at w[order(f1 at mu)] #Proportions ordered by increasing mean for
the component, model 1
f2 at w[order(f2 at mu)] #Proportions ordered by increasing mean for
the component, model 2
#If you don't know the number of components, you would use the BIC to
estimate the best fit:
f1<-flowClust(X,varNames="A",K=1:5) #Fit multiple numbers of clusters;
f2<-flowClust(Y,varNames="A",K=1:5)
par(mfrow=c(1,2))
plot(1:5,BIC(f1),type="o",xlab="K",ylab="BIC");
plot(1:5,BIC(f2),type="o",xlab="K",ylab="BIC");
which.max(BIC(f1)) #The maximum should be at 3.
which.max(BIC(f2)) #The maximum should be at 3.
f1<-f1[[which.max(BIC(f1))]] #Extract the best fitting model
f2<-f2[[which.max(BIC(f2))]] #Extract the best fitting model
f1 at w[order(f1 at mu)] #Proportions ordered by increasing mean for
the component, model 1
f2 at w[order(f2 at mu)] #Proportions ordered by increasing mean for
the component, model 2
On 2011-02-23, at 5:20 AM, Xian Zhang wrote:
> Dear Bioconductor users,
>
> We have a univariate readout (DNA content) to study cell cycle
> subpopulations. The data looks like this, with around 3000 cells per
sample.
>
>
> cell1 cell2 cell3 ...
> sample1 28 26 30
> sample2 25 27 15
> sample3 30 40 45
> ...
>
>
> Based on which, one should be able to calculate fractions of cell
cycle
> subpopulations (G1, S, G2+M). However, the data needs to be first
normalized
> (scaling and peak alignment etc), before gating the cells into
> subpopulations.
>
> The flowCore and related packages offer similar functions, but seem
to be an
> overkill for a univariate readout. I wonder if there are other
> methods/packages available.
>
> Thanks a lot in advance!
>
> Xian
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Greg Finak, PhD
Post-doctoral Research Associate
PS Statistics, Vaccine and Infectious Disease Division.
Fred Hutchinson Cancer Research Center
Seattle, WA
(206)667-3116
gfinak at fhcrc.org