Search
Question: How to perform time-course differential expression with "erroneous" baseline in edgeR?
0
gravatar for Enzo Castillejos
7 months ago by
Enzo Castillejos0 wrote:

Hi there edgeR users,

I am new to edgeR so I need to get some help from the experts.

We have these paired-end RNA-seq datasets between mock (control) and fungus-inoculated wheat samples collected at 0, 6, 12 and 24 hai (hours after infection), each with three reps. That is, 2 treatments (control and treatment) x 4 time-points x 3 reps.

Unfortunately, samples at 0 hai which is supposed to be pre-inoculated samples were collected after inoculation so there is already infection activities at this baseline time-point. Doing pairwise differential expression to each time-point gives a thousands of differentially expressed genes at 0 hai which is unrealistic.

What we did, using edgeR, is to subtract out the expression difference of the baseline time-point from the latter time-point expression differences. That is, example (not complete contrast matrix):

my.contrasts <- makeContrasts (a = (infected.6hai – control.6hai) – (infected.0hai–control.0hai))

Is this a right approach to correct an "erroneous" baseline/reference time-point?

Additionally, to find genes that have responded differently due to treatment (control vs infected) between two time-points, additional comparisons were appended to the contrast matrix. Example, to find genes differentially expressed between 12 and 6 hai time-points:

my.contrasts <- makeContrasts (b = (infected.12hai – infected.6hai) – (control.12hai – control.6hai))

Again, is this right? 

The complete code would be:

library("edgeR")

#loading a data frame and combining experimental factors
targets<-read.delim("targets.txt")
Group <- factor(paste(targets$Treat,targets$Time,sep="."))
cbind(targets,Group=Group)

#Load the data matrix
x <- read.csv("sample.csv",row.names="Name",header=T)
y<-DGEList(counts=x,group=Group)

#filtering out lowly expressed isoforms and normalization
keep <- rowSums(cpm(y)>1) >= 3
y<-y[keep,,keep.lib.sizes=FALSE]
y<-calcNormFactors(y)
#design matrix
design<-model.matrix(~0+Group)

#estimating dispersion
y<-estimateDisp(y, design, robust=TRUE)
colnames(design)<-levels(Group)
fit<-glmQLFit(y,design, robust=TRUE)
#incomplete or sample contrast matrix
my.contrasts <- makeContrasts(a=(i.6-m.6)-(i.0-m.0),b=(i.12-i.6)-(m.12-m.6), ,levels=design)
qlf <- glmQLFTest(fit, contrast=my.contrasts[,"a"])
table = topTags(qlf, n=Inf)

 

Enzo Castillejos

 

 

ADD COMMENTlink modified 7 months ago by Gordon Smyth35k • written 7 months ago by Enzo Castillejos0

Is "innoculated" a synonym here for "infected", or do these terms mean different things?

Were the control.0hai and infected.0hai samples both innoculated? For how long? If they were extracted 1 hour after infection (for example), could they be treated as 1 hour infected samples?

What about control.6hai? Were they also innoculated, or are they correct controls?

ADD REPLYlink modified 7 months ago • written 7 months ago by Gordon Smyth35k

Yes, "inoculated" and "infected" are synonymous.

Were the control.0hai and infected.0hai samples both innoculated? = No. Control.0hai was not inoculated; inoculated.0hai was. All controls (control.0hai, control.6hai, control.12hai, control.24hai) were not infected or inoculated. I don't have any idea of the lag time between inoculation and collection as it was performed by collaborators outside our institute. If we are to assign the 0 hai baseline samples a time like 1 hr (i.e. control.1hai vs infected.1hai) and perform pairwise differential expression, results would give thousands of genes differentially expressed which is not realistic as, theoretically, there should be zero to modest number of genes differentially expressed at this early (baseline/reference) timepoint.

What about control.6hai? Were they also innoculated, or are they correct controls? = These are correct controls without infection or inoculation.

Thanks for the comments/quesions, by the way.

 

ADD REPLYlink written 7 months ago by Enzo Castillejos0
1
gravatar for Gordon Smyth
7 months ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:

That certainly seems sensible to me. It is generally appropriate to correct each infection for the appropriate control, which is what your contrast seems to be doing.

Just a disclaimer though: this site is really intended to advise you on correct use of the software rather than on scientific analysis questions, such as whether the contrasts you have computed answer the appropriate scientific questions. You seem to be using edgeR correctly, so no worries there. While, your analysis also seems scientifically sensible from what you've said, I wouldn't like to take responsibility for that because I can't know all the scientific background of your experiment.

 

ADD COMMENTlink modified 7 months ago • written 7 months ago by Gordon Smyth35k

Thanks for the comments.

Enzo

ADD REPLYlink written 7 months ago by Enzo Castillejos0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 316 users visited in the last hour