Search
Question: DESeq2 interaction term
0
9 months ago by
United States

Hi,

I have samples from 5 genotypes and 2 conditions. I am reading the 'Extract results from a DESeq analysis'

dds$genotype <- factor(rep(rep(c("I","II","III"),each=3),2)). But do not understand the 'each=3),2)' in this line, as I need to modify it for my 5 genotypes. Also, I am interested in the effect that my treatment has in each genotype and don't have a reference genotype. Would it be better for me to use the group/factor rather than the interaction formula? Thanks, Catalina ADD COMMENTlink modified 9 months ago • written 9 months ago by Catalina Aguilar Hurtado50 Thanks Michael. If I may as you further. I want to get the effect of my treatment to 1 genotype to then compare with the others. I have 2 conditions (Co1 and T1) and 5 genotypes. It seems to me from the manual that I should be getting the results per gene per condition (IIIB and IIIA) while I am getting the results based on the intercept. Do you know what is my mistake? Manual: # the condition effect for genotypeIII results(dds, contrast=c("group", "IIIB", "IIIA")) My data: dds <- DESeqDataSetFromMatrix(countData = countData, colData = coldata, design = ~ geno + cond) dds$group <- factor(paste0(dds$geno, dds$cond))

design(dds) <- ~ group

dds <- DESeq(dds)

> resultsNames(dds)
[1] "Intercept"          "group_AT1_vs_ACo1"  "group_BCo1_vs_ACo1" "group_BT1_vs_ACo1"  "group_CCo1_vs_ACo1" "group_CT1_vs_ACo1"  "group_DCo1_vs_ACo1"
[8] "group_DT1_vs_ACo1"  "group_FCo1_vs_ACo1" "group_FT1_vs_ACo1" 

coldata

ID cond geno

cond_A1_genA1 Co1 A

cond_A1_genA2 Co1 A

cond_A1_genA3 Co1 A

cond_A1_genF1 Co1 F

cond_A1_genB1 Co1 B

cond_A1_genF2 Co1 F

cond_A1_genC1 Co1 C

cond_A1_genC2 Co1 C

cond_A1_genC3 Co1 C

cond_A1_genD1 Co1 D

cond_A1_genD2 Co1 D

cond_A1_genD3 Co1 D

cond_B1_genA2 T1 A

cond_B1_genA1 T1 A

cond_B1_genA3 T1 A

cond_B1_genB1 T1 B

cond_B1_genF1 T1 F

cond_B1_genF2 T1 F

cond_B1_genC1 T1 C

cond_B1_genC2 T1 C

cond_B1_genC3 T1 C

cond_B1_genD1 T1 D

cond_B1_genD2 T1 D

cond_B1_genD3 T1 D

My recommendation of ~group was for your original post, Which didn’t mention this:

I want to get the effect of my treatment to 1 genotype to then compare with the others”

There’s not a specific way to do this / this is not a fully specified analysis. You mean compare the effect in one genotype to the average effect for the other genotypes?

Sorry Michael, I didn't explain properly. Not the effect of genotype 1 in particular, just the effects of any of the genotypes to the treatment. First I want to know what is the response of genotype X to the treatment. Second it would be interesting to know how genotypes X compares with the others in response to the treatment. Overall I want to understand why is genotype X more or less resistant to the treatment. Hope is makes sense.

Again, there’s no one way to do this “genotypes X compares with the others”. You can do all pairs of genotypes, one vs the average of others, etc. Rather than testing, you could also plot the treatment effects in each group against the other groups in a “pairs” plot.

When I refer to the others I mean geno A vs geno C and so on. Then I should use:

design(dds) <- ~ geno + cond + geno:cond

But this will always refer to a reference level when I want to effect of genotype X? it would compare genotype X to reference genotype?

I will try to plots as well, thanks.

If you want to compare pairs of genotypes in their treatment effect you can do:

~geno + geno:condition

And then use contrast=list(..., ...) to compare the effects after. You will see them listed when you use resultsNames(dds). You can use:

results(dds, name=...)

To extract treatment effects for each genotype. This is equivalent to my first suggestion with ~group, but also allows the contrasts, which you didn't mention in your first post, so I didn't recommend the interaction design then.

Thanks Michael!

Just to check when I write as below, I get treatment effect for genoA without reference level

results <-results(dds, name="genoA.condT1")

Then compare the pairs genoA vs genoB:

results <-results(dds, contrast=list(c("genoA.condT1","genoB.condT1")))

yes

Hi Michael, just a quick question. I have 1 genotype that doesn't have a biological replicate (1 sample in control and 1 in treatment). Should I discard it when getting the effects of each genotype to the treatment?

Thanks again

You can use this genotype as well. This is an advantage of having all the samples in one model, as the information about variability among replicates is shared among genotypes.

That is good know. I was getting just a few diff. expressed genes for that genotype and wasn't sure if it was because I don't have a replicate sample.

Hi Michael, I was reading the manual about running a group of samples together or by pairs. I am getting more diff. expressed genes when I ran e.g. genotype A on its own (~condition) than with the other genotypes (~ geno + geno:condition). The genes in common between the two designs have very similar values, does it means I am getting more false positives when I ran it separated? I understand the data will be more normalized through the group but as my transcriptome is poorly annotated more diff. expressed genes would be better. In my samples the response (PCA) groups by genotype rather than by treatment.

Thanks

I think this is answered in the vignette, right? It's possible to get more DEG when looking at a pair if there is less variability in that genotype. When you run with all the samples together, if the other genotypes have more variability, it increases the per-gene dispersion estimate. It's not more FP in the small dataset but less sensitivity for the full dataset. This is all speculative, but that's what I have in the vignette.

Thanks, I see what you mean. Now that I checked all the genotypes I get 1 genotype with less DEG when it is run on its own, which seems be the one with more variability (also strongest physiological response to treatment). I am now thinking if it would be accepted to present results from different models or if it would be bad practices to run a group model to get the data from the most variable genotype.

There's nothing wrong with running the pairs, especially if you think one genotype is behaving differently.

0
9 months ago by
Michael Love20k
United States
Michael Love20k wrote:

Yes go ahead and use ~group.

The code above with factor() and rep() is to quickly mock up some sample information for demonstration.

For an actual analysis we recommend to read in a sample table saved in a CSV or TSV format, where rows are samples and columns are variables.