Cell DNA content data normalization and gating

0

Entering edit mode

Xian Zhang ▴ 10

@xian-zhang-4503

Last seen 9.6 years ago

Dear Bioconductor users, We have a univariate readout (DNA content) to study cell cycle subpopulations. The data looks like this, with around 3000 cells per sample. cell1 cell2 cell3 ... sample1 28 26 30 sample2 25 27 15 sample3 30 40 45 ... Based on which, one should be able to calculate fractions of cell cycle subpopulations (G1, S, G2+M). However, the data needs to be first normalized (scaling and peak alignment etc), before gating the cells into subpopulations. The flowCore and related packages offer similar functions, but seem to be an overkill for a univariate readout. I wonder if there are other methods/packages available. Thanks a lot in advance! Xian [[alternative HTML version deleted]]

Alignment flowCore Alignment flowCore • 1.6k views

ADD COMMENT • link updated 13.2 years ago by Greg Finak ▴ 240 • written 13.2 years ago by Xian Zhang ▴ 10

0

Entering edit mode

Greg Finak ▴ 240

@greg-finak-4299

Last seen 7.3 years ago

United States

Hi, Xian I would suggest having a look at the flowClust package. It is used for clustering / gating flow data in one or multiple dimensions via mixture modelling and should be well suited for estimating proportions of cell populations in different phases of the cell cycle. flowClust outputs a model object that contains the proportions of each component in the model (fraction of total cells represented by that component), as well as the mean (location), standard deviation, and other model parameters. The proportions and means are probably all that you are looking for. If you need to match components across multiple samples, use the estimated component means. A very basic example is below. See the package vignette for further details. I don't usually work with the type of data you describe so there are probably domain specific subtleties I'm not familiar with. If you run into problems, please let me know, I'll be glad to tweak the package to make it more effective / useful for such an application. At the least I'll update the vignette to provide further examples. Hope this helps, Greg. #Simple example - flowClust in 1D fitting 3 components. require(flowClust) require(flowViz) #Some artificial data, Sample X, 3 components, real proportions are 33.3%, 16.66%, and 50% X<-as.matrix(c(rnorm(1000,mean=10,sd=sqrt(2)),rnorm(500,mean=20,sd=sqr t(2)),rnorm(1500,mean=30,sd=sqrt(2)))) colnames(X)<-"A" X<-round(X) X<-flowFrame(X) #Sample Y , 3 components, real proportions are 25%, 25%, and 50%, peaks are shifted slightly. Y<-as.matrix(c(rnorm(750,mean=11,sd=sqrt(2)),rnorm(750,mean=23,sd=sqrt (2)),rnorm(1500,mean=29,sd=sqrt(2)))) colnames(Y)<-"A" Y<-round(Y) Y<-flowFrame(Y) par(mfrow=c(1,2)) plot(X,breaks=256) plot(Y,breaks=256) #If we know the data has 3 components: f1<-flowClust(X,K=3,varNames=c("A")) f2<-flowClust(Y,K=3,varNames=c("A")) #plot the result par(mfrow=c(1,2)) hist(f1,data=X) hist(f2,data=Y) #The order of components may be different above. Use the estimated means to reorder them f1 at w[order(f1 at mu)] #Proportions ordered by increasing mean for the component, model 1 f2 at w[order(f2 at mu)] #Proportions ordered by increasing mean for the component, model 2 #If you don't know the number of components, you would use the BIC to estimate the best fit: f1<-flowClust(X,varNames="A",K=1:5) #Fit multiple numbers of clusters; f2<-flowClust(Y,varNames="A",K=1:5) par(mfrow=c(1,2)) plot(1:5,BIC(f1),type="o",xlab="K",ylab="BIC"); plot(1:5,BIC(f2),type="o",xlab="K",ylab="BIC"); which.max(BIC(f1)) #The maximum should be at 3. which.max(BIC(f2)) #The maximum should be at 3. f1<-f1[[which.max(BIC(f1))]] #Extract the best fitting model f2<-f2[[which.max(BIC(f2))]] #Extract the best fitting model f1 at w[order(f1 at mu)] #Proportions ordered by increasing mean for the component, model 1 f2 at w[order(f2 at mu)] #Proportions ordered by increasing mean for the component, model 2 On 2011-02-23, at 5:20 AM, Xian Zhang wrote: > Dear Bioconductor users, > > We have a univariate readout (DNA content) to study cell cycle > subpopulations. The data looks like this, with around 3000 cells per sample. > > > cell1 cell2 cell3 ... > sample1 28 26 30 > sample2 25 27 15 > sample3 30 40 45 > ... > > > Based on which, one should be able to calculate fractions of cell cycle > subpopulations (G1, S, G2+M). However, the data needs to be first normalized > (scaling and peak alignment etc), before gating the cells into > subpopulations. > > The flowCore and related packages offer similar functions, but seem to be an > overkill for a univariate readout. I wonder if there are other > methods/packages available. > > Thanks a lot in advance! > > Xian > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Greg Finak, PhD Post-doctoral Research Associate PS Statistics, Vaccine and Infectious Disease Division. Fred Hutchinson Cancer Research Center Seattle, WA (206)667-3116 gfinak at fhcrc.org

ADD COMMENT • link 13.2 years ago Greg Finak ▴ 240

Login before adding your answer.