Question

Design matrix and contrast formation in edgeR for DE analysis

0

Entering edit mode

govindrajuk17 • 0

@govindrajuk17-13067

Last seen 6.9 years ago

Hi everyone,

I am beginner in edgeR and RNA-seq data analysis, so I need your guidance to set up proper design matrix and contrast for my research questions. I have 64 RNA-seq samples (all biologically independent) from four different populations of same species, which were collected from two different experiments, that is, Acute experiment(acute control + acute stress) and chronic experiment(chronic control + chronic stress) with four biological replicates for each of these conditions.

I want know:

i) If expression levels differ between acute and chronic treatments (within a species)

ii) If expression levels differ between populations within species.

To test above hypothesis, I have followed following steps in edgeR:

> pv_acute_chronic_rawcounts = read.table("pv_acute_chronic_genes_inorder.counts.matrix", header=T, row.names=1, com='')

> pv_acute_chronic_description <- read.table("pv_AC_LC_description_tab_delimited.csv", sep="\t", header=TRUE)

> pv_acute_chronic_group <- factor(paste0(pv_acute_chronic_description$Population, ".", pv_acute_chronic_description$Treatment))

> pv_acute_chronic_dgelist <- DGEList(counts=pv_acute_chronic_rawcounts, group=pv_acute_chronic_group)

> pv_acute_chronic_keep <- rowSums(cpm(pv_acute_chronic_dgelist) > 1) >= 4

> pv_acute_chronic_dgelist <- pv_acute_chronic_dgelist[pv_acute_chronic_keep, , keep.lib.sizes=FALSE]

> pv_acute_chronic_dgelist <- calcNormFactors(pv_acute_chronic_dgelist)

> pv_acute_chronic_design <- model.matrix(~0+pv_acute_chronic_group, data=pv_acute_chronic_dgelist$samples)
> colnames(pv_acute_chronic_design) <- levels(pv_acute_chronic_group)

> pv_acute_chronic_dgelist <- estimateDisp(pv_acute_chronic_dgelist, pv_acute_chronic_design, robust=TRUE)

> pv_acute_chronic_fit <- glmQLFit(pv_acute_chronic_dgelist, pv_acute_chronic_design, robust=TRUE)

##Contrasts to get DE genes between treatment and control for each populations
> contrasts1 <- makeContrasts(NATvsNAC = Population1.Acute.Treatment-Population1.Acute.Control, NCTvsNCC = Population1.Chronic.Treatment-Population1.Chronic.Control, IATvsIAC = Population2.Acute.Treatment-Population2.Acute.Control, ICTvsICC = Population2.Chronic.Treatment-Population2.Chronic.Control, SATvsSAC = Population3.Acute.Treatment-Population3.Acute.Control, SCTvsSCC = Population3.Chronic.Treatment-Population3.Chronic.Control, AATvsAAC = Population4.Acute.Treatment-Population4.Acute.Control, ACTvsACC = Population4.Chronic.Treatment-Population4.Chronic.Control, levels=pv_acute_chronic_design)

##Contrasts to get DE genes between populations for acute experiment:
> contrasts2 <- makeContrasts(Population1vsPopulation2.Acute = (Population1.Acute.Treatment-Population1.Acute.Control)-(Population2.Acute.Treatment-Population2.Acute.Control), Population1vsPopulation3.Acute = (Population1.Acute.Treatment-Population1.Acute.Control)-(Population3.Acute.Treatment-Population3.Acute.Control), Population1vsPopulation4.Acute = (Population1.Acute.Treatment-Population1.Acute.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), Population2vsPopulation3.Acute = (Population2.Acute.Treatment-Population2.Acute.Control)-(Population3.Acute.Treatment-Population3.Acute.Control), Population2vsPopulation4.Acute = (Population2.Acute.Treatment-Population2.Acute.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), Population3vsPopulation4.Acute = (Population3.Acute.Treatment-Population3.Acute.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), levels=pv_acute_chronic_design)

##Contrasts to get DE genes between populations for chronic experiment:
> contrasts3 <- makeContrasts(Population1vsPopulation2.Chronic = (Population1.Chronic.Treatment-Population1.Chronic.Control)-(Population2.Chronic.Treatment-Population2.Chronic.Control), Population1vsPopulation3.Chronic = (Population1.Chronic.Treatment-Population1.Chronic.Control)-(Population3.Chronic.Treatment-Population3.Chronic.Control), Population1vsPopulation4.Chronic = (Population1.Chronic.Treatment-Population1.Chronic.Control)-(Population4.Chronic.Treatment-Population4.Chronic.Control), Population2vsPopulation3.Chronic = (Population2.Chronic.Treatment-Population2.Chronic.Control)-(Population3.Chronic.Treatment-Population3.Chronic.Control), Population2vsPopulation4.Chronic = (Population2.Chronic.Treatment-Population2.Chronic.Control)-(Population4.Chronic.Treatment-Population4.Chronic.Control), Population3vsPopulation4.Chronic = (Population3.Chronic.Treatment-Population3.Chronic.Control)-(Population4.Chronic.Treatment-Population4.Chronic.Control), levels=pv_acute_chronic_design)

##Contrasts to get DE genes between acute and chronic experiments per population:
> contrasts4 <- makeContrasts(Population1.ChronicvsPopulation1.Acute = (Population1.Chronic.Treatment-Population1.Chronic.Control)-(Population1.Acute.Treatment-Population1.Acute.Control), Population2.ChronicvsPopulation2.Acute = (Population2.Chronic.Treatment-Population2.Chronic.Control)-(Population2.Acute.Treatment-Population2.Acute.Control), Population3.ChronicvsPopulation3.Acute = (Population3.Chronic.Treatment-Population3.Chronic.Control)-(Population3.Acute.Treatment-Population3.Acute.Control), Population4.ChronicvsPopulation4.Acute = (Population4.Chronic.Treatment-Population4.Chronic.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), levels=pv_acute_chronic_design)

So, please can anyone shed some light on this matter and let me know if I am doing it right or not. Especially, regarding setting contrasts and design matrix?

Thank you in advance.

Regards,

Govindraj Chavan

edger • 1.1k views

ADD COMMENT • link updated 6.9 years ago by Aaron Lun ★ 28k • written 6.9 years ago by govindrajuk17 • 0

score 0 · Answer 1 · 2017-05-17

0

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 19 hours ago

The city by the bay

Looks fine to me.

ADD COMMENT • link 6.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

Dear Aaron,

Thank you for quick response and confirming my strategy for setting up design matrix and contrast parameters.

Regards,

Govindraj Chavan

ADD REPLY • link 6.9 years ago govindrajuk17 • 0

0

Entering edit mode

Dear Aaron,

Sorry to bother you again. I have one more query regarding same experimental setup described in previous question. In brief, I have three factors as described below:

Factor1: Populations(levels=4 (pop1, pop2, pop3 & pop4)).

Factor2: Acute stress experiment (levels=2 (control & stress)).

Factor3: Chronic stress experiment (levels=2 (control & stress)).

In total 16 levels will be formed by combination of these factors.

And I have used following steps:

> pv_acute_chronic_group <- factor(paste0(pv_acute_chronic_description$Population, ".", pv_acute_chronic_description$Treatment))

> pv_acute_chronic_design <- model.matrix(~0+pv_acute_chronic_group, data=pv_acute_chronic_dgelist$samples)

> pv_acute_chronic_dgelist <- estimateDisp(pv_acute_chronic_dgelist, pv_acute_chronic_design, robust=TRUE)

> pv_acute_chronic_fit <- glmQLFit(pv_acute_chronic_dgelist, pv_acute_chronic_design, robust=TRUE)

##Contrasts to get DE genes between populations for acute experiment:
> contrasts2 <- makeContrasts(Population1vsPopulation2.Acute = (Population1.Acute.Treatment-Population1.Acute.Control)-(Population2.Acute.Treatment-Population2.Acute.Control), Population1vsPopulation3.Acute = (Population1.Acute.Treatment-Population1.Acute.Control)-(Population3.Acute.Treatment-Population3.Acute.Control), Population1vsPopulation4.Acute = (Population1.Acute.Treatment-Population1.Acute.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), Population2vsPopulation3.Acute = (Population2.Acute.Treatment-Population2.Acute.Control)-(Population3.Acute.Treatment-Population3.Acute.Control), Population2vsPopulation4.Acute = (Population2.Acute.Treatment-Population2.Acute.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), Population3vsPopulation4.Acute = (Population3.Acute.Treatment-Population3.Acute.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), levels=pv_acute_chronic_design)

I want to get common set of genes which differentially expressed among four populations. I tried following command to achieve above objective:

> pc_de_btw_pops_anov <- glmQLFTest(pc_acute_chronic_fit, contrast=contrasts2)

> topTags(pc_de_btw_pops_anov)

Coefficient: LR test of 3 contrasts

logFC.Population1vsPopulation2.Acute   logFC.Population1vsPopulation3.Acute logFC.Population1vsPopulation4.Acute   logCPM   F   PValue   FDR
TRINITY_DN277195_c2_g2   0.266962268   3.833956231   3.973849235   5.805308914   78.64811033   7.83E-22   7.99E-17
TRINITY_DN281190_c3_g3   -0.22663792   2.533908963   2.990723426   4.972112813   49.6939098   5.72E-17   2.92E-12
TRINITY_DN283707_c3_g2   0.02528481   -2.97083914   -5.85401231   3.109560074   43.03080562   1.42E-15   4.83E-11
TRINITY_DN248566_c0_g1   -0.20162506   -3.70984053   -3.10586704   2.965669951   41.96948664   2.44E-15   6.23E-11
TRINITY_DN268448_c0_g1   0.434988646   6.151765969   0.963942064   1.35252841   39.87257452   7.32E-15   1.49E-10

Why does not above test produce results for last three contrasts from contrast matrix (contrasts2)? Is something missing in design matrix? Am I doing it in wrong way?

Please can you share your thoughts on this issue.

Thank you.

Regards,

Govindraj Chavan

ADD REPLY • link 6.9 years ago govindrajuk17 • 0

0

Entering edit mode

Make sure you're using the latest version of edgeR. The behaviour of glmLRT (on which glmQLFTest relies) changed in the last release. Previously, log-fold changes for redundant contrasts were not reported, which is why some of your contrasts are missing. (For example, if you tested A=B and B=C, you would not need to specify A=C, as this would be redundant and ignored.) Now they should be reported, as illustrated in the following example:

a <- matrix(rpois(60, lambda=5), ncol=6)
g <- factor(rep(LETTERS[1:3], each=2))
design <- model.matrix(~0 + g)
fit <- glmFit(a, design, dispersion=0, offset=0)
res <- glmLRT(fit, contrast=makeContrasts(AvB=gA-gB,
    AvC=gA-gC, BvC=gB-gC, levels=design))
topTags(res) # gives me log-fold changes for all three contrasts.

ADD REPLY • link 6.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

Dear Aaron,

Thanks a ton for quick feedback. I am using edgeR version 3.16.5 as you can see below in session info output:

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-slackware-linux-gnu (64-bit)
Running under: Slackware 14.2

locale:
[1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=en_US
[4] LC_COLLATE=C         LC_MONETARY=en_US    LC_MESSAGES=en_US
[7] LC_PAPER=en_US       LC_NAME=C            LC_ADDRESS=C
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] edgeR_3.16.5 limma_3.30.8

loaded via a namespace (and not attached):
[1] tools_3.3.1 grid_3.3.1 locfit_1.5-9.1 lattice_0.20-33

So, do you think this older version of edgeR is causing this problem? the latest version of edgeR is 3.18.1 as per Bioconductor website.

There is no information in edgeR user guide regarding these new parameters.

How do I set these two parameters for my experiment:

a <- matrix(rpois(60, lambda=5), ncol=6)
g <- factor(rep(LETTERS[1:3], each=2)).

So please can you clarify my doubts.

Thank you.

Regards,

Govindraj Chavan

ADD REPLY • link 6.9 years ago govindrajuk17 • 0

0

Entering edit mode

Just upgrade to the latest version of edgeR. The code in my last post was just a simple example to demonstrate the current behaviour of glmLRT; its application to your situation should be obvious.

ADD REPLY • link 6.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

Dear Aaron,

Thank you very much for clarifying my doubts. Will update my edgeR to latest version and repeat the analysis with similar commands mentioned above. Hope it works this time..:)

Regards,

Govindraj Chavan

ADD REPLY • link 6.9 years ago govindrajuk17 • 0