Question: variancePartition - testing for individuals
0
gravatar for rina
5 months ago by
rina0
rina0 wrote:

Hi all!

I am analyzing the variance sources of TCGA expression data using variancePartition. I want to check among others the effect of individuals in the variance, but when I specify it at the formula, I get the following errors:

> form <- ~  submitter_id
> varPart <- fitExtractVarPartModel(Filt_EXP1, form, clin)
Error in checkModelStatus(fit, showWarnings = showWarnings, colinearityCutoff) : 
  Colinear score = 1 > 0.999 
Covariates in the formula are so strongly correlated that the
parameter estimates from this model are not meaningful.
Dropping one or more of the covariates will fix this problem

> form <- ~ (1|submitter_id)
> varPart <- fitExtractVarPartModel(Filt_EXP1, form, clin)
Error: number of levels of each grouping factor must be < number of observations

> form <- ~ (0|submitter_id)
> varPart <- fitExtractVarPartModel(Filt_EXP1, form, clin)
Error in (function (cl, name, valueClass)  : 
  assignment of an object of class “numeric” is not valid for @‘Dim’ in an object of class “dgTMatrix”; is(value, "integer") is not TRUE

I can see why the above return an error, but at the vignette effect of individuals can be tested for. Is it a specific way I should specify it?

Thanks in advance.

R.

expression formula variance • 173 views
ADD COMMENTlink modified 5 months ago by mikhael.manurung90 • written 5 months ago by rina0
Answer: variancePartition - testing for individuals
0
gravatar for mikhael.manurung
5 months ago by
mikhael.manurung90 wrote:

Dear Rina,

To identify the proportion of variance due to between-individual differences, you should provide the information in the metadata.

Your second version of the formula is the correct way to define it (the first formula will treat submitter_id as a continuous variable). And keep in mind to define the variable submitter_idas a factor.

Best,

Mikhael

ADD COMMENTlink written 5 months ago by mikhael.manurung90

Hi Mikhael,

Thank you for your response. I have defined the submitter_id as a factor, but I keep getting the error Error: number of levels of each grouping factor must be < number of observations.

Here is the structure of my data, in case it helps.

 > glimpse(clin[1:3,])
    Observations: 3
    Variables: 42
    $ submitter_id                      <fct> TCGA-3L-AA1B, TCGA-5M-AATE, TCGA-A6-2677
    $ classification_of_tumor           <chr> "not reported", "not reported", "not reported"
    $ last_known_disease_status         <chr> "not reported", "not reported", "not reported"
    $ updated_datetime                  <chr> "2018-09-06T16:20:48.972378-05:00", "2018-09-06T16:20:48.972378-05:00", "2018-09-06T16:20:...
    $ primary_diagnosis                 <chr> "Adenocarcinoma, NOS", "Adenocarcinoma, NOS", "Adenocarcinoma, NOS"
    $ tumor_stage                       <chr> "stage i", "stage iia", "stage iiic"
    $ age_at_diagnosis                  <int> 22379, 27870, 25143
    $ vital_status                      <chr> "alive", "alive", "dead"
    $ morphology                        <chr> "8140/3", "8140/3", "8140/3"
    $ days_to_death                     <dbl> NA, NA, 740
    $ days_to_last_known_disease_status <lgl> NA, NA, NA
    $ created_datetime                  <lgl> NA, NA, NA
    $ state                             <chr> "released", "released", "released"
    $ days_to_recurrence                <lgl> NA, NA, NA
    $ diagnosis_id                      <chr> "6eb0d5b6-cb00-519f-838e-119b548ac582", "77253688-4400-5836-886d-80d5758d41c6", "1b22285c-...
    $ tumor_grade                       <chr> "not reported", "not reported", "not reported"
    $ tissue_or_organ_of_origin         <chr> "Cecum", "Ascending colon", "Colon, NOS"
    $ days_to_birth                     <dbl> -22379, -27870, -25143
    $ progression_or_recurrence         <chr> "not reported", "not reported", "not reported"
    $ prior_malignancy                  <chr> "not reported", "not reported", "not reported"
    $ site_of_resection_or_biopsy       <chr> "Cecum", "Ascending colon", "Cecum"
    $ days_to_last_follow_up            <dbl> 475, 1200, 541
    $ cigarettes_per_day                <lgl> NA, NA, NA
    $ weight                            <dbl> 63.3, 75.4, 55.2
    $ alcohol_history                   <lgl> NA, NA, NA
    $ alcohol_intensity                 <lgl> NA, NA, NA
    $ bmi                               <dbl> 21.15006, 24.06716, 21.56250
    $ years_smoked                      <lgl> NA, NA, NA
    $ exposure_id                       <chr> "44b839cb-c3d7-5a99-9dea-90b839882b9a", "74316476-27f2-5d5f-b3fb-20f69e8a8960", "59056466-...
    $ height                            <dbl> 173, 177, 160
    $ gender                            <chr> "female", "male", "female"
    $ year_of_birth                     <int> 1952, 1935, 1941
    $ race                              <chr> "black or african american", "black or african american", "white"
    $ demographic_id                    <chr> "2a3b1379-9507-580d-9628-4b502a720cc4", "d2b2e4d7-419d-5370-94f3-2adeb4606b07", "0fa0b722-...
    $ ethnicity                         <chr> "not hispanic or latino", "hispanic or latino", "not hispanic or latino"
    $ year_of_death                     <int> NA, NA, NA
    $ treatment_id                      <chr> "08bfaf92-3b30-5724-ac84-dac862df44bc", "44ff7b56-834d-5e65-a072-0f8ca2e577fd", "9e66ff79-...
    $ therapeutic_agents                <lgl> NA, NA, NA
    $ treatment_intent_type             <lgl> NA, NA, NA
    $ treatment_or_therapy              <lgl> NA, NA, NA
    $ bcr_patient_barcode               <chr> "TCGA-3L-AA1B", "TCGA-5M-AATE", "TCGA-A6-2677"
    $ disease                           <chr> "COAD", "COAD", "COAD"




    > clin$submitter_id

    137 Levels: TCGA-3L-AA1B TCGA-5M-AATE TCGA-A6-2677 TCGA-A6-2680 TCGA-A6-2683 TCGA-A6-4107 TCGA-A6-5656 TCGA-A6-5659 ... TCGA-T9-A92H

    > glimpse(Filt_EXP1[1:5,1:3])
    Observations: 5
    Variables: 3
    $ `TCGA-G4-6317` <dbl> 10349, 30, 3331, 403, 333
    $ `TCGA-CM-6164` <dbl> 9263, 51, 3263, 364, 315
    $ `TCGA-CM-4750` <dbl> 11120, 457, 2937, 483, 265

Hope that helps!

Thank you.

ADD REPLYlink written 4 months ago by rina0

How many samples do you have for every submitter_id? I suspect that the warning is caused because you have an equal number of observations and unique submitter_id.

ADD REPLYlink written 4 months ago by mikhael.manurung90

Exactly. It's one sample per patient. This is what I thought the problem was about too, but I don't know how I can solve for this.

ADD REPLYlink written 4 months ago by rina0

What if you include other variables as random effects (such as tumor_stage)?

ADD REPLYlink written 4 months ago by mikhael.manurung90

For form <- ~ (1|tumor_stage)analysis runs as normal

But in the case of form <- ~ (1|tumor_stage) + (1|submitter_id) Error: number of levels of each grouping factor must be < number of observations

ADD REPLYlink modified 4 months ago • written 4 months ago by rina0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 272 users visited in the last hour