Cell DNA content data normalization and gating
1
0
Entering edit mode
Xian Zhang ▴ 10
@xian-zhang-4503
Last seen 10.2 years ago
Dear Bioconductor users, We have a univariate readout (DNA content) to study cell cycle subpopulations. The data looks like this, with around 3000 cells per sample. cell1 cell2 cell3 ... sample1 28 26 30 sample2 25 27 15 sample3 30 40 45 ... Based on which, one should be able to calculate fractions of cell cycle subpopulations (G1, S, G2+M). However, the data needs to be first normalized (scaling and peak alignment etc), before gating the cells into subpopulations. The flowCore and related packages offer similar functions, but seem to be an overkill for a univariate readout. I wonder if there are other methods/packages available. Thanks a lot in advance! Xian [[alternative HTML version deleted]]
Alignment flowCore Alignment flowCore • 1.7k views
ADD COMMENT
0
Entering edit mode
Greg Finak ▴ 240
@greg-finak-4299
Last seen 7.8 years ago
United States
Hi, Xian I would suggest having a look at the flowClust package. It is used for clustering / gating flow data in one or multiple dimensions via mixture modelling and should be well suited for estimating proportions of cell populations in different phases of the cell cycle. flowClust outputs a model object that contains the proportions of each component in the model (fraction of total cells represented by that component), as well as the mean (location), standard deviation, and other model parameters. The proportions and means are probably all that you are looking for. If you need to match components across multiple samples, use the estimated component means. A very basic example is below. See the package vignette for further details. I don't usually work with the type of data you describe so there are probably domain specific subtleties I'm not familiar with. If you run into problems, please let me know, I'll be glad to tweak the package to make it more effective / useful for such an application. At the least I'll update the vignette to provide further examples. Hope this helps, Greg. #Simple example - flowClust in 1D fitting 3 components. require(flowClust) require(flowViz) #Some artificial data, Sample X, 3 components, real proportions are 33.3%, 16.66%, and 50% X<-as.matrix(c(rnorm(1000,mean=10,sd=sqrt(2)),rnorm(500,mean=20,sd=sqr t(2)),rnorm(1500,mean=30,sd=sqrt(2)))) colnames(X)<-"A" X<-round(X) X<-flowFrame(X) #Sample Y , 3 components, real proportions are 25%, 25%, and 50%, peaks are shifted slightly. Y<-as.matrix(c(rnorm(750,mean=11,sd=sqrt(2)),rnorm(750,mean=23,sd=sqrt (2)),rnorm(1500,mean=29,sd=sqrt(2)))) colnames(Y)<-"A" Y<-round(Y) Y<-flowFrame(Y) par(mfrow=c(1,2)) plot(X,breaks=256) plot(Y,breaks=256) #If we know the data has 3 components: f1<-flowClust(X,K=3,varNames=c("A")) f2<-flowClust(Y,K=3,varNames=c("A")) #plot the result par(mfrow=c(1,2)) hist(f1,data=X) hist(f2,data=Y) #The order of components may be different above. Use the estimated means to reorder them f1 at w[order(f1 at mu)] #Proportions ordered by increasing mean for the component, model 1 f2 at w[order(f2 at mu)] #Proportions ordered by increasing mean for the component, model 2 #If you don't know the number of components, you would use the BIC to estimate the best fit: f1<-flowClust(X,varNames="A",K=1:5) #Fit multiple numbers of clusters; f2<-flowClust(Y,varNames="A",K=1:5) par(mfrow=c(1,2)) plot(1:5,BIC(f1),type="o",xlab="K",ylab="BIC"); plot(1:5,BIC(f2),type="o",xlab="K",ylab="BIC"); which.max(BIC(f1)) #The maximum should be at 3. which.max(BIC(f2)) #The maximum should be at 3. f1<-f1[[which.max(BIC(f1))]] #Extract the best fitting model f2<-f2[[which.max(BIC(f2))]] #Extract the best fitting model f1 at w[order(f1 at mu)] #Proportions ordered by increasing mean for the component, model 1 f2 at w[order(f2 at mu)] #Proportions ordered by increasing mean for the component, model 2 On 2011-02-23, at 5:20 AM, Xian Zhang wrote: > Dear Bioconductor users, > > We have a univariate readout (DNA content) to study cell cycle > subpopulations. The data looks like this, with around 3000 cells per sample. > > > cell1 cell2 cell3 ... > sample1 28 26 30 > sample2 25 27 15 > sample3 30 40 45 > ... > > > Based on which, one should be able to calculate fractions of cell cycle > subpopulations (G1, S, G2+M). However, the data needs to be first normalized > (scaling and peak alignment etc), before gating the cells into > subpopulations. > > The flowCore and related packages offer similar functions, but seem to be an > overkill for a univariate readout. I wonder if there are other > methods/packages available. > > Thanks a lot in advance! > > Xian > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Greg Finak, PhD Post-doctoral Research Associate PS Statistics, Vaccine and Infectious Disease Division. Fred Hutchinson Cancer Research Center Seattle, WA (206)667-3116 gfinak at fhcrc.org
ADD COMMENT

Login before adding your answer.

Traffic: 845 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6