Multi-factor Comparison in DESeq 2 Issue
2
0
Entering edit mode
locon833 • 0
@locon833-7230
Last seen 7.5 years ago
United States

Hi all,

I'm trying to normalize my Illumina MiSeq data using Phyloseq and DESeq2. I want to look at the interactions between my site location and by date. I attempted to use the following command:

Laurendds = phyloseq_to_deseq2(Lauren_data_merged, design = ~Location + Date + Location:Date)

But am getting this error:

converting counts to integer mode
Error in DESeqDataSet(se, design = design, ignoreRank) :
the model matrix is not full rank, so the model cannot be fit as specified.
one or more variables or interaction terms in the design formula
are linear combinations of the others and must be removed

Doing some research into this issue I have seen that others have used the DESeqDataSetFromMatrix command (as in this post https://support.bioconductor.org/p/63134/),  but my metadata (mapping file) and my count table(.biom file) are already merged into my Lauren_data_merged object. I'm not sure where my variables are interacting to cause this error. I'll post an example of my metadata file below:

Sample Data: [6 samples by 11 sample variables]:
X.SampleID BarcodeSequence LinkerPrimerSequence InputFileName MonthSampleTaken Location Salinity WaterTemperature
PE12  NA NA PE12.fasta July M9 26.0 30
PE15 PE15 NA NA PE15.fasta July M9 14.5 30
PE17 PE17 NA NA PE17.fasta July BB 17.0 30
PE18 PE18 NA NA PE18.fasta July M9 17.0 30
PE20 PE20 NA NA PE20.fasta August BB 22.5 30
PE21 PE21 NA NA PE21.fasta August M9 22.5 30
Season Weather Date
Wet Rain 7/10/13
Wet Rain 7/19/13
Wet Rain 7/23/13
Wet Rain 7/23/13
Wet Rain 8/3/13
Wet Rain 8/3/13

deseq2 phyloseq • 1.9k views
0
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

hi,

Edited: It looks like you don't have replicates for fitting an interaction term here. To simplify, given the head of the colData which you have show, you have 2 locations and 4 dates:

loc date
M9    1
M9    2
BB    3
M9    3
BB    4
M9    4

You only observe once each combination of location and date, but you are trying to fit a model which has a different estimated mean for each combination, so there would be no way to estimate the dispersion.

You should remove the interaction term, and thereby control for base level location and date effects, or generate more replicates.

0
Entering edit mode

I have a total of 82 samples (I only posted the first 6 as an example). I do have replicates as I took samples weekly for an entire year at the same two locations just on different dates, so essentially it is a time series data set but I don't know what my interaction term would be. Are you aware of how I could go about setting this up in DESeq2 so that I can get variance stabilized data for further downstream analysis?

0
Entering edit mode
for variance stabilization, just use a design of ~1, for normalization which is not influenced by the experimental design
0
Entering edit mode

Hi Michael,

This might be a very simple question, but since I'm very new to R and coding in general I don't quite understand everything yet, when you say just use a design of ~1 for normalization, do you mean that when I execute this command: Laurendds = phyloseq_to_deseq2(Lauren_data_merged, design = ~Location + Date + Location:Date) I could use just one of my variables: Location or Date  like this:  Laurendds = phyloseq_to_deseq2(Lauren_data_merged, design = ~Location) and it would variance stabilize my data?

I also was thinking about looking at my samples by month rather than by date, that way i would have multiple replicates of each site from the same month for further downstream analysis.

0
Entering edit mode

Read over the vignette section 2.1.1 "Blind dispersion estimation" to see if you want to use blind dispersion estimation.

If you use blind=FALSE, then the design will be used by the transformation functions. If you use blind=TRUE, then it doesn't matter what you put in design, because ~1 will be substituted within the transformation functions.

I'd recommend using ~Location + Date, and blind=FALSE.

(The details are described in the vignette, but summarized: this is still almost unsupervised, because the design is only used to estimate the dispersion, and only the global trend of dispersion~mean is passed on to the transformations.)

0
Entering edit mode
@ryan-c-thompson-5618
Last seen 2.1 years ago
Scripps Research, La Jolla, CA

Do you have multiple samples for each combination of date and location? If not, then you can't fit an interaction term, because you will have as many coefficients as samples. You can reduce the degrees of freedom of the time variable using "ns" from the "splines" package. The limma User's Guide has a section on doing this: Section 9.6.2 "Many time points". I'm not familiar with how to use a custom design matrix in DESeq2, but you can certainly analyze your data in this way using either edgeR or limma-voom.

1
Entering edit mode
You can just put ns() in the design, or in version >= 1.7, provide the matrix to the 'full' argument of DESeq()
0
Entering edit mode

Thanks, much for the suggestions. I will try tomorrow.

0
Entering edit mode

Since time-of-year data is cyclical, is there a way to tell the ns function about that so that it imposes the right boundary conditions to make it periodic?

0
Entering edit mode

hmm don't know this one.