Dear community members, I am currently using DESeq2 to analyze RNASeq data from plant roots that were colonized by a fungus. But I am facing some problems regarding contrast with a continuous variable. If you have any suggestions or comments it would be great if you can let me know. Thank you in advance!
The purpose of the project is to identify downstream targets of a few plant transcription factors (let’s call them gene A,B,C,D,E,F) that are essential for colonization of a fungus. As experimental set-up, wildtype (WT) plants and mutants of those transcription factors (let’s call them mutant_A, mutant_B, mutant_C, mutant_D, mutant_E, mutant_F) were treated without or with fungal inoculation (let’s call them either mock or inoculated). Then root samples were harvested and used for transcriptome analyses. Either 3 or 4 replicates were used for each genotype x treatment.
However, fungal colonization itself induces a big change of plant transcriptome, that means thousands of genes that are essential for colonization are induced in colonizaed root parts, while genes that are inhibiting colonization are suppressed. As mutants used in this research missing genes that are essential for colonization, I got a quantitative difference in fungal colonization level in my mutants compared to wildtype. This can be reflected by the proportion of fungal reads in each sample. Please see table below. In general, I have ~12-18% of fungal reads in my WT, but only ~6-9% in mutant F, ~2-4% in mutant E, ~0.2-0.5% in mutant C and D..etc.
Sample % Fungal reads WT_Mock1 0.00682819 WT_Mock2 0.008894 WT_Mock3 0.00818782 WT_Mock4 0.00665274 WT_Inoculated1 12.3860566 WT_Inoculated2 18.1908555 WT_Inoculated3 18.4706838 WT_Inoculated4 16.5686164 Mutant_A_Mock1 0.01233013 Mutant_A_Mock2 0.01424303 Mutant_A_Mock3 0.0073198 Mutant_A_Mock4 0.00603092 Mutant_A_Inoculated1 0.01972266 Mutant_A_Inoculated2 0.01211499 Mutant_A_Inoculated3 0.0165109 Mutant_A_Inoculated4 0.01508397 Mutant_B_Mock1 0.00791082 Mutant_B_Mock2 0.00667513 Mutant_B_Mock3 0.0069918 Mutant_B_Mock4 0.00672951 Mutant_B_Inoculated1 0.57389944 Mutant_B_Inoculated2 0.37813846 Mutant_B_Inoculated3 0.02073746 Mutant_B_Inoculated4 0.06448313 Mutant_C_Mock1 0.00470295 Mutant_C_Mock2 0.00802484 Mutant_C_Mock3 0.00217021 Mutant_C_Inoculated1 0.1567094 Mutant_C_Inoculated2 0.46984465 Mutant_C_Inoculated3 0.32510805 Mutant_D_Mock1 0.0035514 Mutant_D_Mock2 0.06521256 Mutant_D_Mock3 0.00407177 Mutant_D_Mock4 0.00368014 Mutant_D_Inoculated1 0.50073689 Mutant_D_Inoculated2 0.27329334 Mutant_D_Inoculated3 0.32021375 Mutant_D_Inoculated4 0.20501901 Mutant_E_Mock1 0.00529136 Mutant_E_Mock2 0.00808311 Mutant_E_Mock3 0.0069625 Mutant_E_Mock4 0.00199783 Mutant_E_Inoculated1 1.90555662 Mutant_E_Inoculated2 2.60344722 Mutant_E_Inoculated3 3.84246065 Mutant_E_Inoculated4 2.48234016 Mutant_F_Mock1 0.00801506 Mutant_F_Mock2 0.00715424 Mutant_F_Mock3 0.00829953 Mutant_F_Mock4 0.00894405 Mutant_F_Inoculated1 8.36066198 Mutant_F_Inoculated2 6.37187557 Mutant_F_Inoculated3 9.0976988 Mutant_F_Inoculated4 7.07773639
I am interested in the plant genes differentlly regulated in the mutants vs WT. However, due to the different colonization level, all colonization-associated genes are differientially expressed if I simply compare my samples by define them as either Inoculated or Mock. To (kind of) correct for the effect of different fungal colonization level on gene expression, I integrated % of fungal reads into my design formula, instead of using categorical variable (e.g. Inoculated and Mock).
My design is:
design= ~ Fungal_biomass + Mutation + Mutation:Fungal_biomass
Mutation: define the genotype, for example WT or Mutant_A. Fungal_biomass: percentage of fungal reads after scaling, I took care that they are numeric.
I got a first question here: After running the model, I got an error message here(please see below). I found that some people had same problem before. Then I followed their solutions. But either filter out low expressed genes or using large maxit such as 50000 didn’t solve this problem.
## 12099 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest
I continued anyway to inspect my data. Then I tried to isolate some contrasts. I got my second question here: I am a bit confused with isolating contrast of a continuous variable. For example, by using code below, do I isolate fungal colonization-associated plant genes in my reference level (here WT) or in all samples?
res <- results(dds,name = "Fungal_biomass") summary(res$padj < 0.05 & res$log2FoldChange >1) ## Mode FALSE TRUE ## logical 22596 410
I thought I isolated colonization-induced genes in WT because it's the reference level, but 410 is much less than the number I expected. If I directly compare WT_Inoculated vs WT_Mock (set a categorical variable as either Inoculated or Mock), I can identify ~1400 genes using the same cut-off.
I tried another strategy to isolate contrast. In my metatable, I have a column named “Fungal_treatment” defining my samples are Inoculated or mock (although this column was not used in my design formula). But I managed to isolate contrast in this way:
WT_Inoculated <- colMeans(mod_mat[dds$Mutation == "WT" & dds$Fungal_treatment == "Inoculated",]) WT_mock <- colMeans(mod_mat[dds$Mutation == "WT" & dds$Fungal_treatment == "Mock",]) res_WT_Inoculated_vs_mock <- results(dds, contrast = WT_Inoculated - WT_mock) summary(res_WT_Inoculated_vs_mock <- $padj < 0.05 & res_WT_Inoculated_vs_mock <- $log2FoldChange >1) ## Mode FALSE TRUE ## logical 21801 1205
In this way, I got a number that is expected (although slightly less than 1400). Here I have my third question: did I isolate the same contrast usign two methods? I thought so but the numbers are so different so I was confused.
Then the Last question: If I want to identify genes that were not induced by fungal colonization in mutant_F but were induced in WT while controling for different fungal colonization level in mutant F and WT (so those genes might be downstream target of gene F, but I want to rule out genes that were affected by the ~50% lower colonization level in mutant F compared to WT), shall I call results(dds,name = "Fungal_biomass.Mutation_Mutant_F" ), or results(dds,name = c(list (“Fungal_biomass”), list("Fungal_biomass.Mutation_Mutant_F" ))?
Below shows my resultnames:
resultsNames(dds) ##  "Intercept" ##  "Fungal_biomass" ##  "Mutation_Mutant_A_vs_WT" ##  "Mutation_Mutant_B_vs_WT" ##  "Mutation_Mutant_C_vs_WT" ##  "Mutation_Mutant_D_vs_WT" ##  "Mutation_Mutant_E_vs_WT" ##  "Mutation_Mutant_F_vs_WT" ##  "Fungal_biomass.Mutation_Mutant_A" ##  "Fungal_biomass.Mutation_Mutant_B" ##  "Fungal_biomass.Mutation_Mutant_C" ##  "Fungal_biomass.Mutation_Mutant_D" ##  "Fungal_biomass.Mutation_Mutant_E" ##  "Fungal_biomass.Mutation_Mutant_F"
Any answers are appreciated. Thanks a lot!