initializing DirichletMultinomial::dmn
1
0
Entering edit mode
Charles Berry ▴ 290
@charles-berry-5754
Last seen 5.5 years ago
United States
I'd like to be able to specify the starting 'centers' for dmn(). Details: IIUC DirichletMultinomial::dmn(count, k) will initialize the EM algorithm using a kmeans heuristic for selecting the starting point. Replicate runs on the same data can yield stark differences in the result. I have a dataset in which it seems that naively chosen random starting centers rarely minimize a goodness-of-fit criterion. The release version of dmn() does not currently allow for specification of starting values. I wonder if there are plans to extend it in this manner? Best, Chuck
• 889 views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 10 weeks ago
United States
On 07/10/2014 02:45 PM, Charles Berry wrote: > > I'd like to be able to specify the starting 'centers' for dmn(). > > Details: > > IIUC DirichletMultinomial::dmn(count, k) will initialize the EM algorithm > using a kmeans heuristic for selecting the starting point. Replicate runs on > the same data can yield stark differences in the result. > > I have a dataset in which it seems that naively chosen random starting > centers rarely minimize a goodness-of-fit criterion. > > The release version of dmn() does not currently allow for specification of > starting values. I wonder if there are plans to extend it in this manner? I'll look into this, thanks for the suggestion. Is there a more general issue that makes the random centers choice a poor one? And presumably setting the random number seed allows for replication (I think that's a 'this is the way it should work' rather than a statement of fact...). Martin > > Best, > > Chuck > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
On Fri, 11 Jul 2014, Martin Morgan wrote: > On 07/10/2014 02:45 PM, Charles Berry wrote: >> >> I'd like to be able to specify the starting 'centers' for dmn(). >> [snip] > I'll look into this, thanks for the suggestion. Is there a more general issue > that makes the random centers choice a poor one? And presumably setting the > random number seed allows for replication (I think that's a 'this is the way > it should work' rather than a statement of fact...). > Thanks, Martin. There is another issue. The data may have distinct samples that are duplicates. In my case, there are thousands of sparse multinomial samples (even thousands with N==1) and loads of duplicate rows in 'count'. If the random centers are a sample of the rows, then it may contain duplicates and some values of p_j that are zero. So sampling from the rows will fail. I don't know if problems will arise with centers that are randomly chosen from the space of the multinomial parameter pi, but if something is known about the structure there might be a smart way to choose starting values that is based on the data. If one is particularly interested in knowing if the multinomial parameter concentrates near certain edges or vertices of pi, then setting starting centers near them might be indicated to be sure that that part of the space has been given a try. So I was thinking that having the flexibility to set ones own initial values might be useful as long as one does not make a pathological choice. Best, Chuck > Martin > >> >> Best, >> >> Chuck >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > Charles C. Berry Dept of Family/Preventive Medicine cberry at ucsd edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, CA 92093-0901
ADD REPLY

Login before adding your answer.

Traffic: 632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6