Question: DESeq2 design model for treatment analysis controlling for age and sex
0
gravatar for newmanss
4.4 years ago by
newmanss0
United States
newmanss0 wrote:

Hello,

I am an inexperienced user and in my searching haven't found a clear example to this basic question.  My experiment design is comprised of male and female subjects of varying ages grouped by treatment response, flexible and inflexible factors.  The mean age is significantly different between the flexible and inflexible groups.  I'd like to set up my design model to control for sex and age and find the differential gene expression between the flexible and inflexible groups.  How do I write the code to control for two batch effects, sex and age?

I also plan to analyze each group, male and female, separately controlling for age.

DESeqTable<- DESeqDataSetFromMatrix(countData=InputData, colData=ExpDesign,
                                    design = ~Age + Phenotype)

Here is my targets.txt table:

SampleName

Gender Phenotype Group Age
FI_051811 Female inflexible FI 63
FI_038376 Female inflexible FI 23
FI_017339 Female inflexible FI 50
FI_002069 Female inflexible FI 55
FI_047603 Female inflexible FI 46
FI_042542 Female inflexible FI 29
FI_023519 Female inflexible FI 36
FI_019217 Female inflexible FI 29
MI_044212 Male inflexible MI 29
MI_044557 Male inflexible MI 21
MI_046607 Male inflexible MI 23
MI_005822 Male inflexible MI 26
MI_047381 Male inflexible MI 23
MI_044365 Male inflexible MI 35
MI_035916 Male inflexible MI 29
MI_034076 Male inflexible MI 30
FF_058850 Female flexible FF 45
FF_049968 Female flexible FF 36
FF_014541 Female flexible FF 24
FF_026011 Female flexible FF 23
FF_025841 Female flexible FF 24
FF_013967 Female flexible FF 24
FF_045155 Female flexible FF 27
FF_044863 Female flexible FF 26.13
MF_037916 Male flexible MF 22
MF_040441 Male flexible MF 22
MF_048347 Male flexible MF 20
MF_045140 Male flexible MF 26
MF_011612 Male flexible MF 28
MF_049731 Male flexible MF 22
MF_014343 Male flexible MF 28
MF_022981 Male flexible MF 25
ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by newmanss0
Answer: DESeq2 design model for treatment analysis controlling for age and sex
1
gravatar for Michael Love
4.4 years ago by
Michael Love23k
United States
Michael Love23k wrote:

Usually by "batch effects", we mean effects on the measured variables due to the batches in which the samples were processed. But you're right that covariates like age and sex enter the model in the same way as batch effects.

I'd recommend splitting the age variable into a factor with two groups using the cut function: cut(Age, breaks=c(0, x, Inf)), where you have to define a value 'x' which represents a meaningful split in your data. It looks like, while the phenotype and the age are correlated, there is enough overlap so that they are not perfectly confounded.

You would then use a design:

design(dds) = ~ Age + Sex + Phenotype + Sex:Phenotype

To investigate the sex-specific phenotype effects controlling for age. It seems to make sense to make inflexible the base level

dds$Phenotype = relevel(dds$Phenotype, "inflexible")

then run the model,

dds = DESeq(dds) 

and,

resultsNames(dds)

to see the effects which were fit.

The base level phenotype effect (the effect for Female, if the levels are set alphabetically) is obtained, e.g. with:

results(dds, contrast=list("Phenotype_flexible_vs_inflexible"))

The phenotype effect for Male would be:

results(dds, contrast=list(c("Phenotype_flexible_vs_inflexible","Phenotypeflexible.SexMale")))
ADD COMMENTlink written 4.4 years ago by Michael Love23k
Answer: DESeq2 design model for treatment analysis controlling for age and sex
0
gravatar for newmanss
4.4 years ago by
newmanss0
United States
newmanss0 wrote:

Thank you for a very clear outline and explanation of the process steps.  For the age cut, can you provide more detail as to selecting the cut point since there is a lot of overlap between the phenotype groups.  Do I just disregard phenotype and let it cut between young and old or should I look at the ages of each phenotype group and do my best to select a point which best includes most of each group within a selected low and high age range?

ADD COMMENTlink written 4.4 years ago by newmanss0

I'd aim for a biologically meaningful cut.

ADD REPLYlink written 4.4 years ago by Michael Love23k
Answer: DESeq2 design model for treatment analysis controlling for age and sex
0
gravatar for newmanss
4.4 years ago by
newmanss0
United States
newmanss0 wrote:

Dear Michael,

I selected an Age cutoff, median of the overlapping ages for the groups male and female.  I used cut() successfully but when the DESeq() analysis was running, it halted with this error:

Error in optim(betaRow, objectiveFn, method = "L-BFGS-B", lower = -large, :
L-BFGS-B needs finite values of 'fn'

I checked my DESeq2 version 1.4.5. 

R version 3.1.1 (2014-07-10)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.4.5              RcppArmadillo_0.4.450.1.0 Rcpp_0.11.3              
 [4] GenomicRanges_1.14.4      XVector_0.2.0             IRanges_1.20.7           
 [7] BiocGenerics_0.8.0        genefilter_1.44.0         lattice_0.20-29          
[10] gtools_3.4.1             

loaded via a namespace (and not attached):
 [1] annotate_1.40.1      AnnotationDbi_1.24.0 Biobase_2.22.0       BiocStyle_1.0.0     
 [5] DBI_0.3.1            geneplotter_1.40.0   grid_3.1.1           locfit_1.5-9.1      
 [9] RColorBrewer_1.0-5   RSQLite_0.11.4       splines_3.1.1        stats4_3.1.1        
[13] survival_2.37-7      tools_3.1.1          XML_3.98-1.1         xtable_1.7-4       

I worked around the error by converting Age to two factors, O and Y and proceeded successfully with DESeq()

My current snag is extracting the correct results from the analyzed dataset.  First as per your instruction,

> resultsNames(DE_Analysis)
[1] "Intercept"                        "Age_Y_vs_O"                      
[3] "Gender_Male_vs_Female"            "Phenotype_flexible_vs_inflexible"
[5] "GenderMale.Phenotypeflexible" 
> results(DE_Analysis, contrast=list("Phenotype_flexible_vs_inflexible"))
Error in results(DE_Analysis, contrast = list("Phenotype_flexible_vs_inflexible")) : 
  'contrast', as a list, should have length 2,
see the manual page of ?results for more information
> results(DE_Analysis, contrast=list(c("Phenotype_flexible_vs_inflexible","GenderMale.Phenotypeflexible")))
Error in results(DE_Analysis, contrast = list(c("Phenotype_flexible_vs_inflexible",  : 
  'contrast', as a list, should have length 2,
see the manual page of ?results for more information

I am successful with:

> results(DE_Analysis, contrast=c("Phenotype","flexible", "inflexible"))
log2 fold change (MAP): Phenotype flexible vs inflexible 
Wald test p-value: Phenotype flexible vs inflexible 
DataFrame with 30553 rows and 6 columns
                        baseMean log2FoldChange     lfcSE       stat    pvalue      padj
                       <numeric>      <numeric> <numeric>  <numeric> <numeric> <numeric>
A*01:01:01:01 allele   0.0000000             NA        NA         NA        NA        NA
A*03:01:0:01 allele    0.0000000             NA        NA         NA        NA        NA
A1BG                   0.3813450      1.9276013  1.608610  1.1983025 0.2307993        NA
A1BG-AS1               0.5606999      0.1980567  1.235204  0.1603433 0.8726107        NA
A1CF                   1.6435664      0.9704531  0.911482  1.0646980 0.2870126        NA

and

> results(DE_Analysis, name="GenderMale.Phenotypeflexible")
log2 fold change (MAP): GenderMale.Phenotypeflexible 
Wald test p-value: GenderMale.Phenotypeflexible 
DataFrame with 30553 rows and 6 columns
                        baseMean log2FoldChange     lfcSE        stat     pvalue      padj
                       <numeric>      <numeric> <numeric>   <numeric>  <numeric> <numeric>
A*01:01:01:01 allele   0.0000000             NA        NA          NA         NA        NA
A*03:01:0:01 allele    0.0000000             NA        NA          NA         NA        NA
A1BG                   0.3813450     0.97909416 0.8398631  1.16577824  0.2437041  0.999993
A1BG-AS1               0.5606999    -0.08720169 0.9308521 -0.09367943  0.9253638  0.999993
A1CF                   1.6435664    -1.21334676 0.9052759 -1.34030598  0.1801459  0.999993

and the results when I use results(DE_Analysis)

log2 fold change (MAP): GenderMale.Phenotypeflexible 
Wald test p-value: GenderMale.Phenotypeflexible 
DataFrame with 30553 rows and 6 columns
                        baseMean log2FoldChange     lfcSE        stat     pvalue
                       <numeric>      <numeric> <numeric>   <numeric>  <numeric>
A*01:01:01:01 allele   0.0000000             NA        NA          NA         NA
A*03:01:0:01 allele    0.0000000             NA        NA          NA         NA
A1BG                   0.3813450     0.97909416 0.8398631  1.16577824  0.2437041
A1BG-AS1               0.5606999    -0.08720169 0.9308521 -0.09367943  0.9253638
A1CF                   1.6435664    -1.21334676 0.9052759 -1.34030598  0.1801459
...                          ...            ...       ...         ...        ...

 

but not successful for female

> results(DE_Analysis, name="GenderFemale.Phenotypeflexible")
Error in results(DE_Analysis, name = "GenderFemale.Phenotypeflexible") : 
  cannot find appropriate results in the DESeqDataSet.
possibly nbinomWaldTest or nbinomLRT has not yet been run.

How would I code to obtain this contrast?

 

ADD COMMENTlink written 4.4 years ago by newmanss0
Thanks for the note on cut. My code is for the current release which is v1.6. If you can update to the current Bioconductor release, that would be best. See bioconductor.org/install
ADD REPLYlink written 4.4 years ago by Michael Love23k

If you cannot update for some reason (don't have control over the version of R on the cluster, etc), you can add an empty character vector for the second list item (this is not necessary in the latest version of DESeq2):

results(dds, contrast=list(c("Phenotype_flexible_vs_inflexible","GenderMale.Phenotypeflexible"),character()))

or use a numeric contrast:

# a numeric vector corresponding to resultsNames
results(dds, contrast=c(0,0,0,1,1))

Remember, as female is the base level, the base phenotype flexible vs inflexible effect is for female. The phenotype effect plus the interaction (this code directly above) is the effect for male.

ADD REPLYlink written 4.4 years ago by Michael Love23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 176 users visited in the last hour