variancePartition - testing for individuals
rina ▴ 20
Last seen 6 months ago

Hi all!

I am analyzing the variance sources of TCGA expression data using variancePartition. I want to check among others the effect of individuals in the variance, but when I specify it at the formula, I get the following errors:

> form <- ~  submitter_id
> varPart <- fitExtractVarPartModel(Filt_EXP1, form, clin)
Error in checkModelStatus(fit, showWarnings = showWarnings, colinearityCutoff) : 
  Colinear score = 1 > 0.999 
Covariates in the formula are so strongly correlated that the
parameter estimates from this model are not meaningful.
Dropping one or more of the covariates will fix this problem

> form <- ~ (1|submitter_id)
> varPart <- fitExtractVarPartModel(Filt_EXP1, form, clin)
Error: number of levels of each grouping factor must be < number of observations

> form <- ~ (0|submitter_id)
> varPart <- fitExtractVarPartModel(Filt_EXP1, form, clin)
Error in (function (cl, name, valueClass)  : 
  assignment of an object of class “numeric” is not valid for @‘Dim’ in an object of class “dgTMatrix”; is(value, "integer") is not TRUE

I can see why the above return an error, but at the vignette effect of individuals can be tested for. Is it a specific way I should specify it?

Thanks in advance.


variance formula expression • 510 views
Last seen 11 months ago

Dear Rina,

To identify the proportion of variance due to between-individual differences, you should provide the information in the metadata.

Your second version of the formula is the correct way to define it (the first formula will treat submitter_id as a continuous variable). And keep in mind to define the variable submitter_idas a factor.



Hi Mikhael,

Thank you for your response. I have defined the submitter_id as a factor, but I keep getting the error Error: number of levels of each grouping factor must be < number of observations.

Here is the structure of my data, in case it helps.

 > glimpse(clin[1:3,])
    Observations: 3
    Variables: 42
    $ submitter_id                      <fct> TCGA-3L-AA1B, TCGA-5M-AATE, TCGA-A6-2677
    $ classification_of_tumor           <chr> "not reported", "not reported", "not reported"
    $ last_known_disease_status         <chr> "not reported", "not reported", "not reported"
    $ updated_datetime                  <chr> "2018-09-06T16:20:48.972378-05:00", "2018-09-06T16:20:48.972378-05:00", "2018-09-06T16:20:...
    $ primary_diagnosis                 <chr> "Adenocarcinoma, NOS", "Adenocarcinoma, NOS", "Adenocarcinoma, NOS"
    $ tumor_stage                       <chr> "stage i", "stage iia", "stage iiic"
    $ age_at_diagnosis                  <int> 22379, 27870, 25143
    $ vital_status                      <chr> "alive", "alive", "dead"
    $ morphology                        <chr> "8140/3", "8140/3", "8140/3"
    $ days_to_death                     <dbl> NA, NA, 740
    $ days_to_last_known_disease_status <lgl> NA, NA, NA
    $ created_datetime                  <lgl> NA, NA, NA
    $ state                             <chr> "released", "released", "released"
    $ days_to_recurrence                <lgl> NA, NA, NA
    $ diagnosis_id                      <chr> "6eb0d5b6-cb00-519f-838e-119b548ac582", "77253688-4400-5836-886d-80d5758d41c6", "1b22285c-...
    $ tumor_grade                       <chr> "not reported", "not reported", "not reported"
    $ tissue_or_organ_of_origin         <chr> "Cecum", "Ascending colon", "Colon, NOS"
    $ days_to_birth                     <dbl> -22379, -27870, -25143
    $ progression_or_recurrence         <chr> "not reported", "not reported", "not reported"
    $ prior_malignancy                  <chr> "not reported", "not reported", "not reported"
    $ site_of_resection_or_biopsy       <chr> "Cecum", "Ascending colon", "Cecum"
    $ days_to_last_follow_up            <dbl> 475, 1200, 541
    $ cigarettes_per_day                <lgl> NA, NA, NA
    $ weight                            <dbl> 63.3, 75.4, 55.2
    $ alcohol_history                   <lgl> NA, NA, NA
    $ alcohol_intensity                 <lgl> NA, NA, NA
    $ bmi                               <dbl> 21.15006, 24.06716, 21.56250
    $ years_smoked                      <lgl> NA, NA, NA
    $ exposure_id                       <chr> "44b839cb-c3d7-5a99-9dea-90b839882b9a", "74316476-27f2-5d5f-b3fb-20f69e8a8960", "59056466-...
    $ height                            <dbl> 173, 177, 160
    $ gender                            <chr> "female", "male", "female"
    $ year_of_birth                     <int> 1952, 1935, 1941
    $ race                              <chr> "black or african american", "black or african american", "white"
    $ demographic_id                    <chr> "2a3b1379-9507-580d-9628-4b502a720cc4", "d2b2e4d7-419d-5370-94f3-2adeb4606b07", "0fa0b722-...
    $ ethnicity                         <chr> "not hispanic or latino", "hispanic or latino", "not hispanic or latino"
    $ year_of_death                     <int> NA, NA, NA
    $ treatment_id                      <chr> "08bfaf92-3b30-5724-ac84-dac862df44bc", "44ff7b56-834d-5e65-a072-0f8ca2e577fd", "9e66ff79-...
    $ therapeutic_agents                <lgl> NA, NA, NA
    $ treatment_intent_type             <lgl> NA, NA, NA
    $ treatment_or_therapy              <lgl> NA, NA, NA
    $ bcr_patient_barcode               <chr> "TCGA-3L-AA1B", "TCGA-5M-AATE", "TCGA-A6-2677"
    $ disease                           <chr> "COAD", "COAD", "COAD"

    > clin$submitter_id

    137 Levels: TCGA-3L-AA1B TCGA-5M-AATE TCGA-A6-2677 TCGA-A6-2680 TCGA-A6-2683 TCGA-A6-4107 TCGA-A6-5656 TCGA-A6-5659 ... TCGA-T9-A92H

    > glimpse(Filt_EXP1[1:5,1:3])
    Observations: 5
    Variables: 3
    $ `TCGA-G4-6317` <dbl> 10349, 30, 3331, 403, 333
    $ `TCGA-CM-6164` <dbl> 9263, 51, 3263, 364, 315
    $ `TCGA-CM-4750` <dbl> 11120, 457, 2937, 483, 265

Hope that helps!

Thank you.

How many samples do you have for every submitter_id? I suspect that the warning is caused because you have an equal number of observations and unique submitter_id.

Exactly. It's one sample per patient. This is what I thought the problem was about too, but I don't know how I can solve for this.

What if you include other variables as random effects (such as tumor_stage)?

For form <- ~ (1|tumor_stage)analysis runs as normal

But in the case of form <- ~ (1|tumor_stage) + (1|submitter_id) Error: number of levels of each grouping factor must be < number of observations


