DESeq2 model for two genome system
1
0
Entering edit mode
Sean • 0
@65927e16
Last seen 5 months ago
Canada

Hello,

I am looking for some guidance in how to best set up/carry my hypothesis testing in a situation where I have a phage infecting a bacterium. We have three genotypes, four time points for infected samples, and two time points for uninfected samples.

I have a count matrix that was generated using featureCounts after alignment of reads using bowtie2, and the matrix includes reads mapping to the bacteria as well as the phage.

My situation is somewhat similar to DESeq2: multiple genotypes time course in that we would like to know which genes are differentially expressed between genotypes at each time point. My understanding from that post and from the vignette is that rather than including an interaction term and setting up a more complex model, I can/should just create a "group" factor that is made up of each relevant combination of genotype/time point/treatment, and use the contrast parameter of the results function to see differences at individual time points. This is almost identical to the linked situation except that I have infected/uninfected (treated/untreated) as well.

e.g. t10_wtvabik <- results(dds2, contrast = c("group", "wt_infected_t10", "mut1_infected_t10"))

Is this approach still correct? It feels to me that the inclusion of infected/uninfected results in a larger number of contrasts and that the overall model might be thrown by masses of zeros for the phage genes in the uninfected samples. I initially tried using an LRT with an interaction term ~condition + genotype + time + genotype:condition and a reduced model of ~condition + genotype + time, but I'm not sure if this is a valid approach.

For determining which genes are differentially expressed over time, specifically the viral genes, I think the vignette and ?results are clear enough, along with Michael's answer in the linked question. I will just need to exclude the uninfected samples because A. They have zeros across the board for phage genes and B. We are only interested in how things change between genotypes over time.

Any help would be much appreciated.

RNASeq DESeq2 • 430 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 14 hours ago
United States

You may want to work this statistical design question out with a local statistician or someone familiar with linear models in R.

Unfortunately, I have to limit my answers on the support site to software related questions.

It feels to me that the inclusion of infected/uninfected results in a larger number of contrasts and that the overall model might be thrown by masses of zeros for the phage genes in the uninfected samples.

There is an FAQ in the vignette about including many groups in one big dataset, or subsetting to pairs.

ADD COMMENT

Login before adding your answer.

Traffic: 533 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6