I have performed a RNA-seq experiment that has quite a complex design. I have 40 bee individuals belonging to 4 treatment groups (1 control group and 3 different treatments, each of which I would like to compare to the control). Each treatment has 10 biological replicates (10 individual bees). For each individual the tissue of interest was dissected in three regions, so I have 120 RNA-seq samples. The 40 individuals were collected from ten hives (for each hive I have 4 individuals, one per treatment group), and the samples were always processed in batches by hive (RNA, library prep, sequencing lane), so now I have just one batch effect of these ten batches/hives. Finally, through other methods I have realized that while performing the experiment the individuals were transitioning between two behavioural states (these two behavioural groups are not perfectly balanced across treatments and hives/batches but it is not too bad), and I would expect some differential expression between them (which does seem to show up in exploratory ordination plots). This is how my design matrix looks like
Sample Sample_ID Treatment Region Hive Behaviour 1 1 H1B1_6 MD OL H1 nurse 2 2 H1B1_6 MD MB H1 nurse 3 3 H1B1_6 MD AL H1 nurse 4 4 H1B3_5 Bi OL H1 nurse 5 5 H1B3_5 Bi MB H1 nurse 6 6 H1B3_5 Bi AL H1 nurse 7 7 H1B5_3 CLA OL H1 nurse 8 8 H1B5_3 CLA MB H1 nurse 9 9 H1B5_3 CLA AL H1 nurse 10 10 H1B6_4 CLH OL H1 nurse 11 11 H1B6_4 CLH MB H1 nurse 12 12 H1B6_4 CLH AL H1 nurse 13 13 H2B1_4 Bi OL H2 nurse 14 14 H2B1_4 Bi MB H2 nurse 15 15 H2B1_4 Bi AL H2 nurse 16 16 H2B3_3 CLH OL H2 nurse 17 17 H2B3_3 CLH MB H2 nurse 18 18 H2B3_3 CLH AL H2 nurse 19 19 H2B4_3 CLA OL H2 nurse 20 20 H2B4_3 CLA MB H2 nurse 21 21 H2B4_3 CLA AL H2 nurse 22 22 H2B5_1 MD OL H2 forager 23 23 H2B5_1 MD MB H2 forager 24 24 H2B5_1 MD AL H2 forager 25 25 H3B1_3 CLH OL H3 nurse 26 26 H3B1_3 CLH MB H3 nurse 27 27 H3B1_3 CLH AL H3 nurse 28 28 H3B3_4 Bi OL H3 forager 29 29 H3B3_4 Bi MB H3 forager 30 30 H3B3_4 Bi AL H3 forager 31 31 H3B5_2 CLA OL H3 nurse 32 32 H3B5_2 CLA MB H3 nurse 33 33 H3B5_2 CLA AL H3 nurse 34 34 H3B6_2 MD OL H3 forager 35 35 H3B6_2 MD MB H3 forager 36 36 H3B6_2 MD AL H3 forager 37 37 H4B3_4 CLA OL H4 nurse 38 38 H4B3_4 CLA MB H4 nurse 39 39 H4B3_4 CLA AL H4 nurse 40 40 H4B4_7 MD OL H4 nurse 41 41 H4B4_7 MD MB H4 nurse 42 42 H4B4_7 MD AL H4 nurse 43 43 H4B5_2 Bi OL H4 forager 44 44 H4B5_2 Bi MB H4 forager 45 45 H4B5_2 Bi AL H4 forager 46 46 H4B6_2 CLH OL H4 nurse 47 47 H4B6_2 CLH MB H4 nurse 48 48 H4B6_2 CLH AL H4 nurse 49 49 H5B1_3 CLA OL H5 nurse 50 50 H5B1_3 CLA MB H5 nurse 51 51 H5B1_3 CLA AL H5 nurse 52 52 H5B2_3 Bi OL H5 nurse 53 53 H5B2_3 Bi MB H5 nurse 54 54 H5B2_3 Bi AL H5 nurse 55 55 H5B4_4 CLH OL H5 nurse 56 56 H5B4_4 CLH MB H5 nurse 57 57 H5B4_4 CLH AL H5 nurse 58 58 H5B5_6 MD OL H5 forager 59 59 H5B5_6 MD MB H5 forager 60 60 H5B5_6 MD AL H5 forager 61 61 H6B2_5 CLA OL H6 forager 62 62 H6B2_5 CLA MB H6 forager 63 63 H6B2_5 CLA AL H6 forager 64 64 H6B4_3 CLH OL H6 forager 65 65 H6B4_3 CLH MB H6 forager 66 66 H6B4_3 CLH AL H6 forager 67 67 H6B5_4 Bi OL H6 nurse 68 68 H6B5_4 Bi MB H6 nurse 69 69 H6B5_4 Bi AL H6 nurse 70 70 H6B6_2 MD OL H6 nurse 71 71 H6B6_2 MD MB H6 nurse 72 72 H6B6_2 MD AL H6 nurse 73 73 H7B1_3 CLA OL H7 forager 74 74 H7B1_3 CLA MB H7 forager 75 75 H7B1_3 CLA AL H7 forager 76 76 H7B2_3 CLH OL H7 nurse 77 77 H7B2_3 CLH MB H7 nurse 78 78 H7B2_3 CLH AL H7 nurse 79 79 H7B3_4 MD OL H7 forager 80 80 H7B3_4 MD MB H7 forager 81 81 H7B3_4 MD AL H7 forager 82 82 H7B4_9 Bi OL H7 nurse 83 83 H7B4_9 Bi MB H7 nurse 84 84 H7B4_9 Bi AL H7 nurse 85 85 H8B5_3 MD OL H8 nurse 86 86 H8B5_3 MD MB H8 nurse 87 87 H8B5_3 MD AL H8 nurse 88 88 H8B1_3 Bi OL H8 nurse 89 89 H8B1_3 Bi MB H8 nurse 90 90 H8B1_3 Bi AL H8 nurse 91 91 H8B2_3 CLH OL H8 nurse 92 92 H8B2_3 CLH MB H8 nurse 93 93 H8B2_3 CLH AL H8 nurse 94 94 H8B3_3 CLA OL H8 forager 95 95 H8B3_3 CLA MB H8 forager 96 96 H8B3_3 CLA AL H8 forager 97 97 H9B1_3 CLA OL H9 forager 98 98 H9B1_3 CLA MB H9 forager 99 99 H9B1_3 CLA AL H9 forager 100 100 H9B3_5 CLH OL H9 forager 101 101 H9B3_5 CLH MB H9 forager 102 102 H9B3_5 CLH AL H9 forager 103 103 H9B4_7 Bi OL H9 forager 104 104 H9B4_7 Bi MB H9 forager 105 105 H9B4_7 Bi AL H9 forager 106 106 H9B5_3 MD OL H9 forager 107 107 H9B5_3 MD MB H9 forager 108 108 H9B5_3 MD AL H9 forager 109 109 H10B1_6 CLH OL H10 forager 110 110 H10B1_6 CLH MB H10 forager 111 111 H10B1_6 CLH AL H10 forager 112 112 H10B2_4 MD OL H10 nurse 113 113 H10B2_4 MD MB H10 nurse 114 114 H10B2_4 MD AL H10 nurse 115 115 H10B5_5 Bi OL H10 nurse 116 116 H10B5_5 Bi MB H10 nurse 117 117 H10B5_5 Bi AL H10 nurse 118 118 H10B6_3 CLA OL H10 nurse 119 119 H10B6_3 CLA MB H10 nurse 120 120 H10B6_3 CLA AL H10 nurse
Because the project is about analysing the effects of some specific treatments on a more peripheral tissue than where the treatment was applied, I do not expect the treatment to have had a very strong effect, but I would like to identify these minor effects in a good detail if they are there.
Ideally I would like to know:
1) What genes are DEGs between the control and each treatment overall (across all three tissue regions). 2) Are there any region specific effects of treatment, so the same pairwise comparisons as before, but by specific region (interaction between treatment and region?). 3) Are there any DEGs dependent on behavioural state (again both overall and region-specific). 4) Is there an interaction between treatment and behavioural group, i.e. specific genes may differ between treatments and control only for one of the two behavioural groups. And possibly also by region (interaction between region, behaviour and treatment?)
I first tried to analyze my data by splitting the dataset by region (so 40 samples each in three separate analyses), but like this I do not seem to get any DEG in any pair-wise comparison between treatments. So I then tried to model the whole dataset of 120 samples in one go, and like this I seem to find a few genes of interest, but I am unsure if what I have been doing is indeed correct and recommended, and how to then include and get results for specific comparisons for interaction terms. This is my code so far:
library(edgeR) y=DGEList(counts=count.table) Hive=factor(Design$Hive) Region=factor(Design$Region) Behav=factor(Design$Behaviour) Treatment <- relevel(Design$Treatment, "MD") Treat=factor(Treatment) design=model.matrix(~0+Treat+Hive+Region+Behav) #Filter low expressed genes keep <- filterByExpr(y, design, min.count = 10, min.total.count = 20) y <- y[keep,,keep.lib.sizes=FALSE] dim(y) #Now normalize counts y=calcNormFactors(y, method = "TMM") colnames(design)<-make.names(colnames(design)) v=voom(y,design,plot=TRUE) cor=duplicateCorrelation(v,design,block=Design$Sample_ID) fit=lmFit(v,design,block=Design$Sample_ID, correlation=cor$consensus) head(coef(fit)) cm=makeContrasts(MDvsCLH= TreatCLH-TreatMD, MDvsCLA= TreatCLA-TreatMD, MDvsBi= TreatBi-TreatMD, ForagervsNurse =Behavnurse, levels=make.names(colnames(coef(fit)))) # To test treatment and behaviour effect fit2=contrasts.fit(fit,cm) fit2=eBayes(fit2, robust=TRUE) summary(decideTests(fit2)) MDvsCLH_table=topTable(fit2,coef="MDvsCLH",n=Inf, sort="p", p=0.05) MDvsCLA_table=topTable(fit2,coef="MDvsCLA",n=Inf, sort="p", p=0.05) MDvsBi_table=topTable(fit2,coef="MDvsBi",n=Inf, sort="p", p=0.05) ForagervsNurse_table=topTable(fit2,coef="ForagervsNurse",n=Inf, sort="p", p=0.05) MDvsCLH_table MDvsCLA_table MDvsBi_table ForagervsNurse_table
In essence, I have tried to include Treatment, Region, Behaviour and Hive (the technical batches) as Fixed effects and the individuals from which the three regions came from as a Random effect. I would have included also Hive as a random effect instead of a fixed effect but I am not sure if it is possible to include two different random effects?
Like this I do get some DEGs for pairwise comparisons between treatments and also for the behavioural group comparison (more so than treatment, so, as I was expecting, that has a stronger effect) and plotting the lcpm counts of individual DEGs shows that at least a few are consistently and convincingly up-regulated in treatments compared to the control group (MD) across the three regions.
Would this approach be valid and therefore I should trust that I am not getting false positive results? And if I then want to include interactions and get treatment comparisons for all the specific regions and behaviours how could I do that?
Should I approach the different questions with separate models or should I do that all at once somehow?
Thank you very much for your kind help with this.