Hi there edgeR users,
I am new to edgeR so I need to get some help from the experts.
We have these paired-end RNA-seq datasets between mock (control) and fungus-inoculated wheat samples collected at 0, 6, 12 and 24 hai (hours after infection), each with three reps. That is, 2 treatments (control and treatment) x 4 time-points x 3 reps.
Unfortunately, samples at 0 hai which is supposed to be pre-inoculated samples were collected after inoculation so there is already infection activities at this baseline time-point. Doing pairwise differential expression to each time-point gives a thousands of differentially expressed genes at 0 hai which is unrealistic.
What we did, using edgeR, is to subtract out the expression difference of the baseline time-point from the latter time-point expression differences. That is, example (not complete contrast matrix):
my.contrasts <- makeContrasts (a = (infected.6hai – control.6hai) – (infected.0hai–control.0hai))
Is this a right approach to correct an "erroneous" baseline/reference time-point?
Additionally, to find genes that have responded differently due to treatment (control vs infected) between two time-points, additional comparisons were appended to the contrast matrix. Example, to find genes differentially expressed between 12 and 6 hai time-points:
my.contrasts <- makeContrasts (b = (infected.12hai – infected.6hai) – (control.12hai – control.6hai))
Again, is this right?
The complete code would be:
library("edgeR") #loading a data frame and combining experimental factors targets<-read.delim("targets.txt") Group <- factor(paste(targets$Treat,targets$Time,sep=".")) cbind(targets,Group=Group) #Load the data matrix x <- read.csv("sample.csv",row.names="Name",header=T) y<-DGEList(counts=x,group=Group) #filtering out lowly expressed isoforms and normalization keep <- rowSums(cpm(y)>1) >= 3 y<-y[keep,,keep.lib.sizes=FALSE] y<-calcNormFactors(y) #design matrix design<-model.matrix(~0+Group) #estimating dispersion y<-estimateDisp(y, design, robust=TRUE) colnames(design)<-levels(Group) fit<-glmQLFit(y,design, robust=TRUE) #incomplete or sample contrast matrix my.contrasts <- makeContrasts(a=(i.6-m.6)-(i.0-m.0),b=(i.12-i.6)-(m.12-m.6), ,levels=design) qlf <- glmQLFTest(fit, contrast=my.contrasts[,"a"]) table = topTags(qlf, n=Inf)
Enzo Castillejos
Is "innoculated" a synonym here for "infected", or do these terms mean different things?
Were the control.0hai and infected.0hai samples both innoculated? For how long? If they were extracted 1 hour after infection (for example), could they be treated as 1 hour infected samples?
What about control.6hai? Were they also innoculated, or are they correct controls?
Yes, "inoculated" and "infected" are synonymous.
Were the control.0hai and infected.0hai samples both innoculated? = No. Control.0hai was not inoculated; inoculated.0hai was. All controls (control.0hai, control.6hai, control.12hai, control.24hai) were not infected or inoculated. I don't have any idea of the lag time between inoculation and collection as it was performed by collaborators outside our institute. If we are to assign the 0 hai baseline samples a time like 1 hr (i.e. control.1hai vs infected.1hai) and perform pairwise differential expression, results would give thousands of genes differentially expressed which is not realistic as, theoretically, there should be zero to modest number of genes differentially expressed at this early (baseline/reference) timepoint.
What about control.6hai? Were they also innoculated, or are they correct controls? = These are correct controls without infection or inoculation.
Thanks for the comments/quesions, by the way.