Bootstrapping paired samples and tiny groups
1
0
Entering edit mode
Benjamin Otto ▴ 830
@benjamin-otto-1519
Last seen 8.2 years ago
Hi guys, in principle the problem is how to compute a statistic for ultra-tiny group sizes with paired samples. Here is the Model: ------------------------- Assumption 1: A data set of microarrays consists of four classes describing the disease phenotype: type 1, type 2, type 3 and control group. Now as the type 1 and type 2 phenotype of the disease is extremely rare there are only two sample in these two groups. The data set now consists of control: 8 samples type 1: 7 samples type 2: 2 samples type 3: 2 samples Assumption 2: We assume, that gender and age might have an influence on the phenotype. Therefore samples in the control groups were selected so that age and gender match the samples in the other three groups. Unfortunately, as the disease is so rare, the age and gender of the patients in the groups are not all the same. So we end up with some kind of semi-paired comparisons, "paired" because for each type1/2/3 sample we pick a control sample defined by age and gender and "semi" because it is not really the same patient the control sample come from. We suppose (but that IS an assumption) that differences between type1/2/3 samples and controls with non-matching age and gender might naturally exhibit bigger (disease-unrelated) variance, so the selection of the control-disease pairs is targeted. At the end type 1/2/3 groups shall be compared with control group. As group 1 has 7 samples a paired analysis is possible. The problem lies within groups 2 and 3. Here is a suggested analysis approach: ------------------------------------------------------ As there is no real statistical test that can be applied for samples with groups of size 2 it would be a thought introducing a bootstrapping approach where for each gene no statistic but only the fold change is computed. From the set of computed fold changes the location of the native fold change(s) (e.g. the mean fold change for the correct pairs) within the distribution is used as significance statistic. Now here are the questions: -------------------------------------- 1) As the samples are "paired", is it at all convincing to resolve the pairings to be able to perform a bootstrapping? Is such a bootstrapping the correct approach for "paired" samples anyway in such a case? 2) Should the samples of group 2/3 "only" randomly be remapped to other control samples than the initial ones. Or does it make more sense to randomly assign the control and type 2/3 samples to the groups? 3) If the samples were randomly assigned to the groups, does always at least one "disease"-sample have to remain in the type 2/3 group? Or would it be legit in this case to use a permutation where two control samples are compared to two other permutations? 4) Any preferable idea how to calculate a statistic here? Thanks and best regards, Benjamin ___________________________________________ Benjamin Otto, PhD University Medical Center Hamburg-Eppendorf Institute For Clinical Chemistry / Central Laboratories Campus Forschung N27 Martinistr. 52, D-20246 Hamburg Tel.: +49 40 7410 51908 Fax.: +49 40 7410 54971 ___________________________________________ -- Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG): Universit?tsklinikum Hamburg-Eppendorf K?rperschaft des ?ffentlichen Rechts Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. J?rg F. Debatin (Vorsitzender) Dr. Alexander Kirstein Joachim Pr?l? Prof. Dr. Dr. Uwe Koch-Gromus
ASSIGN ASSIGN • 976 views
0
Entering edit mode
Naomi Altman ★ 6.0k
@naomi-altman-380
Last seen 19 months ago
United States
I really think this problem is beyond what the mailing list can do. You need to chat with a statistician. Naomi Altman At 05:46 AM 9/9/2010, Benjamin Otto wrote: >Hi guys, > >in principle the problem is how to compute a >statistic for ultra-tiny group sizes with paired samples. > > >Here is the Model: >------------------------- > >Assumption 1: > >A data set of microarrays consists of four >classes describing the disease phenotype: type >1, type 2, type 3 and control group. Now as the >type 1 and type 2 phenotype of the disease is >extremely rare there are only two sample in >these two groups. The data set now consists of > >control: 8 samples >type 1: 7 samples >type 2: 2 samples >type 3: 2 samples > >Assumption 2: >We assume, that gender and age might have an >influence on the phenotype. Therefore samples in >the control groups were selected so that age and >gender match the samples in the other three >groups. Unfortunately, as the disease is so >rare, the age and gender of the patients in the >groups are not all the same. So we end up with >some kind of semi-paired comparisons, "paired" >because for each type1/2/3 sample we pick a >control sample defined by age and gender and >"semi" because it is not really the same patient the control sample come from. > >We suppose (but that IS an assumption) that >differences between type1/2/3 samples and >controls with non-matching age and gender might >naturally exhibit bigger (disease-unrelated) >variance, so the selection of the control-disease pairs is targeted. > >At the end type 1/2/3 groups shall be compared >with control group. As group 1 has 7 samples a >paired analysis is possible. The problem lies within groups 2 and 3. > > > >Here is a suggested analysis approach: >------------------------------------------------------ > >As there is no real statistical test that can be >applied for samples with groups of size 2 it >would be a thought introducing a bootstrapping >approach where for each gene no statistic but >only the fold change is computed. From the set >of computed fold changes the location of the >native fold change(s) (e.g. the mean fold change >for the correct pairs) within the distribution >is used as significance statistic. > > > >Now here are the questions: >-------------------------------------- > >1) As the samples are "paired", is it at all >convincing to resolve the pairings to be able to >perform a bootstrapping? Is such a bootstrapping >the correct approach for "paired" samples anyway in such a case? > >2) Should the samples of group 2/3 "only" >randomly be remapped to other control samples >than the initial ones. Or does it make more >sense to randomly assign the control and type 2/3 samples to the groups? >3) If the samples were randomly assigned to the >groups, does always at least one >"disease"-sample have to remain in the type 2/3 >group? Or would it be legit in this case to use >a permutation where two control samples are compared to two other permutations? > >4) Any preferable idea how to calculate a statistic here? > > > >Thanks and best regards, > >Benjamin > >___________________________________________ >Benjamin Otto, PhD >University Medical Center Hamburg-Eppendorf >Institute For Clinical Chemistry / Central Laboratories >Campus Forschung N27 >Martinistr. 52, >D-20246 Hamburg > >Tel.: +49 40 7410 51908 >Fax.: +49 40 7410 54971 >___________________________________________ > > > > > >-- >Pflichtangaben gem?? Gesetz ?ber elektronische >Handelsregister und Genossenschaftsregister >sowie das Unternehmensregister (EHUG): > >Universit?tsklinikum Hamburg-Eppendorf >K?rperschaft des ?ffentlichen Rechts >Gerichtsstand: Hamburg > >Vorstandsmitglieder: >Prof. Dr. J?rg F. Debatin (Vorsitzender) >Dr. Alexander Kirstein >Joachim Pr?l? >Prof. Dr. Dr. Uwe Koch-Gromus > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor