Testing effect of multiple continuous variables using DESeq2
1
0
Entering edit mode
zrf1 • 0
@zrf1-24022
Last seen 5.2 years ago

Having trouble analyzing using DESeq2 (from phyloseq object). Goal is to test association of dietary intake with microbiome (effect of diet on microbiome).

Phyloseq object ps contains otutable, taxtable, samdata, and phytree. Variables I want to test are dietary intakes (carb, prot, and fiber), which are numerical/continuous and in colnames(ps@sam_data). I want to test for each of their effects on/association with the microbiome.

Code so far is:

`dds <- phyloseq_to_deseq2(ps, design = ~ carb + prot + fiber)`
`gm_mean <- function(x, na.rm=TRUE){`
`    exp(sum(log(x[x > 0]), na.rm=na.rm)/length(x))}`
`geoMeans <- apply(counts(dds), 1, gm_mean) `
`dds <- estimateSizeFactors(dds, geoMeans = geoMeans)`
`dds <- DESeq(dds)`

but this gives out the error:

`# converting counts to integer mode`
`# some variables in design formula are characters, converting to factorsError in checkFullRank(modelMatrix) : `
`#   the model matrix is not full rank, so the model cannot be fit as specified.`
`#   One or more variables or interaction terms in the design formula are linear`
`#   combinations of the others and must be removed.`

`#   Please read the vignette section 'Model matrix not full rank':`

`#   vignette('DESeq2')`

I tried using only

`dds <- phyloseq_to_deseq2(ps, design = ~ carb)`

and it was able to run that line (but I stopped the run as it was taking hours). Not sure how to go about the design model.

Is it also possible to be able to adjust for other factors such as age and sex while testing for the effect of diet (carb, prot, fiber)?

deseq2 • 1.2k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 2 days ago
United States

Have you read the vignette section about why that error occurs?

ADD COMMENT
0
Entering edit mode

Says in the vignette it can either be due to linear combinations (i.e. perfect confounding) or individuals nested within groups. Since the dietary intakes are continuous variables, I don't think linear combinations can happen among the columns? As well, no individuals have one or more observations across condition (unless the carb, prot, and fiber for each individual are considered as the conditions in "across condition")?

I am not sure how to proceed.

ADD REPLY
0
Entering edit mode

Sorry - I just noticed the messages above are pointing you to the problem. You are providing DESeq2 with variables that are characters. You should check the class of each column you provide to DESeq2 and if you want them to be factors, convert them. If you want them to be numeric, convert them. It's best to not guess that will happen to characters in various programs but just convert them to what you want them to be manually.

ADD REPLY
0
Entering edit mode

Thank you! Converted them to numeric, but my columns have NA in them (no data for some samples) so this error occurred:

converting counts to integer mode
Error in DESeqDataSet(se, design = design, ignoreRank) : 
  variables in design formula cannot contain NA: carb

I cant convert the NAs into zeroes as that will not be representative of the samples that lack diet data. It would make sense however to remove the rows with NAs. Not sure how to execute this as my sam_data also contains other columns which also has NAs but are not important for this analysis.

ADD REPLY
0
Entering edit mode

How to deal with missing data in the covariates is up to you, we don't have any part of the package that would help with this, and the right approach depends on the experiment and questions.

ADD REPLY

Login before adding your answer.

Traffic: 829 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6