Question

How to perform time-course differential expression with "erroneous" baseline in edgeR?

0

Entering edit mode

Enzo Castillejos • 0

@enzo-castillejos-14814

Last seen 7.1 years ago

Hi there edgeR users,

I am new to edgeR so I need to get some help from the experts.

We have these paired-end RNA-seq datasets between mock (control) and fungus-inoculated wheat samples collected at 0, 6, 12 and 24 hai (hours after infection), each with three reps. That is, 2 treatments (control and treatment) x 4 time-points x 3 reps.

Unfortunately, samples at 0 hai which is supposed to be pre-inoculated samples were collected after inoculation so there is already infection activities at this baseline time-point. Doing pairwise differential expression to each time-point gives a thousands of differentially expressed genes at 0 hai which is unrealistic.

What we did, using edgeR, is to subtract out the expression difference of the baseline time-point from the latter time-point expression differences. That is, example (not complete contrast matrix):

my.contrasts <- makeContrasts (a = (infected.6hai – control.6hai) – (infected.0hai–control.0hai))

Is this a right approach to correct an "erroneous" baseline/reference time-point?

Additionally, to find genes that have responded differently due to treatment (control vs infected) between two time-points, additional comparisons were appended to the contrast matrix. Example, to find genes differentially expressed between 12 and 6 hai time-points:

my.contrasts <- makeContrasts (b = (infected.12hai – infected.6hai) – (control.12hai – control.6hai))

Again, is this right?

The complete code would be:

library("edgeR")

#loading a data frame and combining experimental factors
targets<-read.delim("targets.txt")
Group <- factor(paste(targets$Treat,targets$Time,sep="."))
cbind(targets,Group=Group)

#Load the data matrix
x <- read.csv("sample.csv",row.names="Name",header=T)
y<-DGEList(counts=x,group=Group)

#filtering out lowly expressed isoforms and normalization
keep <- rowSums(cpm(y)>1) >= 3
y<-y[keep,,keep.lib.sizes=FALSE]
y<-calcNormFactors(y)
#design matrix
design<-model.matrix(~0+Group)

#estimating dispersion
y<-estimateDisp(y, design, robust=TRUE)
colnames(design)<-levels(Group)
fit<-glmQLFit(y,design, robust=TRUE)
#incomplete or sample contrast matrix
my.contrasts <- makeContrasts(a=(i.6-m.6)-(i.0-m.0),b=(i.12-i.6)-(m.12-m.6), ,levels=design)
qlf <- glmQLFTest(fit, contrast=my.contrasts[,"a"])
table = topTags(qlf, n=Inf)

Enzo Castillejos

edger differential expression timecourse • 2.6k views

ADD COMMENT • link updated 7.1 years ago by Gordon Smyth 52k • written 7.1 years ago by Enzo Castillejos • 0

0

Entering edit mode

Is "innoculated" a synonym here for "infected", or do these terms mean different things?

Were the control.0hai and infected.0hai samples both innoculated? For how long? If they were extracted 1 hour after infection (for example), could they be treated as 1 hour infected samples?

What about control.6hai? Were they also innoculated, or are they correct controls?

ADD REPLY • link 7.1 years ago Gordon Smyth 52k

0

Entering edit mode

Yes, "inoculated" and "infected" are synonymous.

Were the control.0hai and infected.0hai samples both innoculated? = No. Control.0hai was not inoculated; inoculated.0hai was. All controls (control.0hai, control.6hai, control.12hai, control.24hai) were not infected or inoculated. I don't have any idea of the lag time between inoculation and collection as it was performed by collaborators outside our institute. If we are to assign the 0 hai baseline samples a time like 1 hr (i.e. control.1hai vs infected.1hai) and perform pairwise differential expression, results would give thousands of genes differentially expressed which is not realistic as, theoretically, there should be zero to modest number of genes differentially expressed at this early (baseline/reference) timepoint.

What about control.6hai? Were they also innoculated, or are they correct controls? = These are correct controls without infection or inoculation.

Thanks for the comments/quesions, by the way.

ADD REPLY • link 7.1 years ago Enzo Castillejos • 0

score 1 · Answer 1 · 2018-02-17

That certainly seems sensible to me. It is generally appropriate to correct each infection for the appropriate control, which is what your contrast seems to be doing.

Just a disclaimer though: this site is really intended to advise you on correct use of the software rather than on scientific analysis questions, such as whether the contrasts you have computed answer the appropriate scientific questions. You seem to be using edgeR correctly, so no worries there. While, your analysis also seems scientifically sensible from what you've said, I wouldn't like to take responsibility for that because I can't know all the scientific background of your experiment.