Dear Bioconductor support team,
This question is a follow up of the same I asked before, and then I carried out some analysis, but need help:-
I have samples from two time points. The samples belong to two categories: Treatment and Placebo as follows: 005v1 Placebo 005v4 Placebo 008v1 Treatment 008v4 Treatment
The ID number 005 etc represent the same sample at two time points. At V1 time-point there is no treatment given, its just that the samples are assigned to be given a treatment (treatment) or not (placebo). At V4 time-point there would be difference in placebo and treatment sample.
Now, if the question is to identify the changes caused due to the treatment, how should I define the design? Should V1 time-point be taken into consideration or not, given that the treatment intervention was given only after the assignment of samples at V1.
It seems like a paired analysis, where the blocking factor is the IDs of the samples, so using the edgeR guide (3.4), I tried to block on the ID (called as Cid in the Rcode) in the following way:-
Read in and annotate
counts <- read.delim("all-samples.txt", row.names="Symbol", header = T) metrics <- read.delim("metric-data.txt", row.names = 1, header = T)
Categ <-factor(metrics$Group, levels = c("placebo", "treatment")) Cid <-factor(metrics$ID) Visit <-factor(metrics$Visit)
Group <- factor(paste(Categ, Visit, sep=".")) raw <- DGEList(counts=counts, group=Group)
plot (table (rowSums(cpm(raw) >= 1) ) ) keep <- (rowSums(cpm(raw) >= 1) >= 17) raw <- raw[keep, , keep.lib.sizes=FALSE]
norm <- calcNormFactors(raw)
design <- model.matrix(~0+Cid+Group) colnames(design) design <- design[,-19]
ed <- estimateDisp(norm, design) glmfit <- glmFit(ed, design) cont <- makeContrasts(Grouptreatment.V4-Groupplacebo.V4, levels=design) gltr <- glmTreat(glmfit, contrast=cont)#no fc cut-off
out <- topTags(gltr, n = Inf, adjust.method="BH") keep <- out$table$PValue <= 0.05
Mainly, I combined Category (Placebo/Treatment) and Visit (V1/V4) into one factor Group. I then created the design as follows: design <- model.matrix(~0+Cid+Group)
Q1: Is this sufficient to block the effect of IDs?
The design looks like this:
 "Cid5" "Cid8" "Cid9" "Cid22" "Cid28" "Cid38" "Cid43"
 "Cid48" "Cid56" "Cid58" "Cid60" "Cid61" "Cid62" "Cid63"
 "Cid153" "Cid180" "Cid190" "Groupplacebo.V4" "Grouptreatment.V1" "Grouptreatment.V4"
When I tried to estimate dispersion on this design, I get a message:
"Error in glmFit.default(sely, design, offset = seloffset, dispersion = 0.05, : Design matrix not of full rank. The following coefficients not estimable: Grouptreatment.V4"
So I removed Grouptreatment.V1 using design <- design[,-19] and then it runs.
Q2: What should I do if I also want to compare Grouptreatment.V1 vs Groupplacebo.V1 and perhaps other combinations as well?
Appreciate your help and support!
Thanks & Regards, Candida Vaz