Design matrix and contrast formation in edgeR for DE analysis
1
0
Entering edit mode
@govindrajuk17-13067
Last seen 6.9 years ago

Hi everyone,

I am beginner in edgeR and RNA-seq data analysis, so I need your guidance to set up proper design matrix and contrast for my research questions. I have 64 RNA-seq samples (all biologically independent) from four different populations of same species, which were collected from two different experiments, that is, Acute experiment(acute control + acute stress) and chronic experiment(chronic control + chronic stress) with four biological replicates for each of these conditions.

I want know:

i) If expression levels differ between acute and chronic treatments (within a species)

ii) If expression levels differ between populations within species.

To test above hypothesis, I have followed following steps in edgeR:

> pv_acute_chronic_rawcounts = read.table("pv_acute_chronic_genes_inorder.counts.matrix", header=T, row.names=1, com='')

> pv_acute_chronic_description <- read.table("pv_AC_LC_description_tab_delimited.csv", sep="\t", header=TRUE)

> pv_acute_chronic_group <- factor(paste0(pv_acute_chronic_description$Population, ".", pv_acute_chronic_description$Treatment))

> pv_acute_chronic_dgelist <- DGEList(counts=pv_acute_chronic_rawcounts, group=pv_acute_chronic_group)

> pv_acute_chronic_keep <- rowSums(cpm(pv_acute_chronic_dgelist) > 1) >= 4

> pv_acute_chronic_dgelist <- pv_acute_chronic_dgelist[pv_acute_chronic_keep, , keep.lib.sizes=FALSE]

> pv_acute_chronic_dgelist <- calcNormFactors(pv_acute_chronic_dgelist)

> pv_acute_chronic_design <- model.matrix(~0+pv_acute_chronic_group, data=pv_acute_chronic_dgelist$samples)
> colnames(pv_acute_chronic_design) <- levels(pv_acute_chronic_group)

> pv_acute_chronic_dgelist <- estimateDisp(pv_acute_chronic_dgelist, pv_acute_chronic_design, robust=TRUE)

> pv_acute_chronic_fit <- glmQLFit(pv_acute_chronic_dgelist, pv_acute_chronic_design, robust=TRUE)

##Contrasts to get DE genes between treatment and control for each populations
> contrasts1 <- makeContrasts(NATvsNAC = Population1.Acute.Treatment-Population1.Acute.Control, NCTvsNCC = Population1.Chronic.Treatment-Population1.Chronic.Control, IATvsIAC = Population2.Acute.Treatment-Population2.Acute.Control, ICTvsICC = Population2.Chronic.Treatment-Population2.Chronic.Control, SATvsSAC = Population3.Acute.Treatment-Population3.Acute.Control, SCTvsSCC = Population3.Chronic.Treatment-Population3.Chronic.Control, AATvsAAC = Population4.Acute.Treatment-Population4.Acute.Control, ACTvsACC = Population4.Chronic.Treatment-Population4.Chronic.Control, levels=pv_acute_chronic_design)

##Contrasts to get DE genes between populations for acute experiment:
> contrasts2 <- makeContrasts(Population1vsPopulation2.Acute = (Population1.Acute.Treatment-Population1.Acute.Control)-(Population2.Acute.Treatment-Population2.Acute.Control), Population1vsPopulation3.Acute = (Population1.Acute.Treatment-Population1.Acute.Control)-(Population3.Acute.Treatment-Population3.Acute.Control), Population1vsPopulation4.Acute = (Population1.Acute.Treatment-Population1.Acute.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), Population2vsPopulation3.Acute = (Population2.Acute.Treatment-Population2.Acute.Control)-(Population3.Acute.Treatment-Population3.Acute.Control), Population2vsPopulation4.Acute = (Population2.Acute.Treatment-Population2.Acute.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), Population3vsPopulation4.Acute = (Population3.Acute.Treatment-Population3.Acute.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), levels=pv_acute_chronic_design)

##Contrasts to get DE genes between populations for chronic experiment:
> contrasts3 <- makeContrasts(Population1vsPopulation2.Chronic = (Population1.Chronic.Treatment-Population1.Chronic.Control)-(Population2.Chronic.Treatment-Population2.Chronic.Control), Population1vsPopulation3.Chronic = (Population1.Chronic.Treatment-Population1.Chronic.Control)-(Population3.Chronic.Treatment-Population3.Chronic.Control), Population1vsPopulation4.Chronic = (Population1.Chronic.Treatment-Population1.Chronic.Control)-(Population4.Chronic.Treatment-Population4.Chronic.Control), Population2vsPopulation3.Chronic = (Population2.Chronic.Treatment-Population2.Chronic.Control)-(Population3.Chronic.Treatment-Population3.Chronic.Control), Population2vsPopulation4.Chronic = (Population2.Chronic.Treatment-Population2.Chronic.Control)-(Population4.Chronic.Treatment-Population4.Chronic.Control), Population3vsPopulation4.Chronic = (Population3.Chronic.Treatment-Population3.Chronic.Control)-(Population4.Chronic.Treatment-Population4.Chronic.Control), levels=pv_acute_chronic_design)

##Contrasts to get DE genes between acute and chronic experiments per population:
> contrasts4 <- makeContrasts(Population1.ChronicvsPopulation1.Acute = (Population1.Chronic.Treatment-Population1.Chronic.Control)-(Population1.Acute.Treatment-Population1.Acute.Control), Population2.ChronicvsPopulation2.Acute = (Population2.Chronic.Treatment-Population2.Chronic.Control)-(Population2.Acute.Treatment-Population2.Acute.Control), Population3.ChronicvsPopulation3.Acute = (Population3.Chronic.Treatment-Population3.Chronic.Control)-(Population3.Acute.Treatment-Population3.Acute.Control), Population4.ChronicvsPopulation4.Acute = (Population4.Chronic.Treatment-Population4.Chronic.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), levels=pv_acute_chronic_design)

So, please can anyone shed some light on this matter and let me know if I am doing it right or not. Especially, regarding setting contrasts and design matrix?

Thank you in advance.

Regards,

Govindraj Chavan

edger • 1.1k views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 19 hours ago
The city by the bay

Looks fine to me.

ADD COMMENT
0
Entering edit mode

Dear Aaron,

Thank you for quick response and confirming my strategy for setting up design matrix  and contrast parameters.

Regards,

Govindraj Chavan

ADD REPLY
0
Entering edit mode

Dear Aaron,

Sorry to bother you again. I have one more query regarding same experimental setup described in previous question. In brief, I have three factors as described below:

Factor1: Populations(levels=4 (pop1, pop2, pop3 & pop4)).

Factor2: Acute stress experiment (levels=2 (control & stress)).

Factor3: Chronic stress experiment (levels=2 (control & stress)).

In total 16 levels will be formed by combination of these factors.

And I have used following  steps:

> pv_acute_chronic_group <- factor(paste0(pv_acute_chronic_description$Population, ".", pv_acute_chronic_description$Treatment))

> pv_acute_chronic_design <- model.matrix(~0+pv_acute_chronic_group, data=pv_acute_chronic_dgelist$samples)

> pv_acute_chronic_dgelist <- estimateDisp(pv_acute_chronic_dgelist, pv_acute_chronic_design, robust=TRUE)

> pv_acute_chronic_fit <- glmQLFit(pv_acute_chronic_dgelist, pv_acute_chronic_design, robust=TRUE)

##Contrasts to get DE genes between populations for acute experiment:
> contrasts2 <- makeContrasts(Population1vsPopulation2.Acute = (Population1.Acute.Treatment-Population1.Acute.Control)-(Population2.Acute.Treatment-Population2.Acute.Control), Population1vsPopulation3.Acute = (Population1.Acute.Treatment-Population1.Acute.Control)-(Population3.Acute.Treatment-Population3.Acute.Control), Population1vsPopulation4.Acute = (Population1.Acute.Treatment-Population1.Acute.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), Population2vsPopulation3.Acute = (Population2.Acute.Treatment-Population2.Acute.Control)-(Population3.Acute.Treatment-Population3.Acute.Control), Population2vsPopulation4.Acute = (Population2.Acute.Treatment-Population2.Acute.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), Population3vsPopulation4.Acute = (Population3.Acute.Treatment-Population3.Acute.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), levels=pv_acute_chronic_design)

I want to get common set of genes which differentially expressed among four populations. I tried following command to achieve above objective:

> pc_de_btw_pops_anov <- glmQLFTest(pc_acute_chronic_fit, contrast=contrasts2)

> topTags(pc_de_btw_pops_anov)

Coefficient:  LR test of 3 contrasts

logFC.Population1vsPopulation2.Acute    logFC.Population1vsPopulation3.Acute logFC.Population1vsPopulation4.Acute    logCPM    F    PValue    FDR
TRINITY_DN277195_c2_g2    0.266962268    3.833956231    3.973849235    5.805308914    78.64811033    7.83E-22    7.99E-17
TRINITY_DN281190_c3_g3    -0.22663792    2.533908963    2.990723426    4.972112813    49.6939098    5.72E-17    2.92E-12
TRINITY_DN283707_c3_g2    0.02528481    -2.97083914    -5.85401231    3.109560074    43.03080562    1.42E-15    4.83E-11
TRINITY_DN248566_c0_g1    -0.20162506    -3.70984053    -3.10586704    2.965669951    41.96948664    2.44E-15    6.23E-11
TRINITY_DN268448_c0_g1    0.434988646    6.151765969    0.963942064    1.35252841    39.87257452    7.32E-15    1.49E-10

Why does not above test produce results for last three contrasts from contrast matrix (contrasts2)? Is something missing in design matrix? Am I doing it in wrong way?

Please can you share your thoughts on this issue.

Thank you.

Regards,

Govindraj Chavan

 

ADD REPLY
0
Entering edit mode

Make sure you're using the latest version of edgeR. The behaviour of glmLRT (on which glmQLFTest relies) changed in the last release. Previously, log-fold changes for redundant contrasts were not reported, which is why some of your contrasts are missing. (For example, if you tested A=B and B=C, you would not need to specify A=C, as this would be redundant and ignored.) Now they should be reported, as illustrated in the following example:

a <- matrix(rpois(60, lambda=5), ncol=6)
g <- factor(rep(LETTERS[1:3], each=2))
design <- model.matrix(~0 + g)
fit <- glmFit(a, design, dispersion=0, offset=0)
res <- glmLRT(fit, contrast=makeContrasts(AvB=gA-gB,
    AvC=gA-gC, BvC=gB-gC, levels=design))
topTags(res) # gives me log-fold changes for all three contrasts.
ADD REPLY
0
Entering edit mode

Dear Aaron,

Thanks a ton for quick feedback. I am using edgeR version 3.16.5 as you can see below in session info output:

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-slackware-linux-gnu (64-bit)
Running under: Slackware 14.2

locale:
 [1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=en_US       
 [4] LC_COLLATE=C         LC_MONETARY=en_US    LC_MESSAGES=en_US   
 [7] LC_PAPER=en_US       LC_NAME=C            LC_ADDRESS=C        
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.16.5 limma_3.30.8

loaded via a namespace (and not attached):
[1] tools_3.3.1     grid_3.3.1      locfit_1.5-9.1  lattice_0.20-33

So, do you think this older  version of edgeR  is causing this problem? the latest version of edgeR  is 3.18.1 as per Bioconductor website.

There is no information in edgeR user guide regarding these new parameters. 

How do I set these two parameters for my experiment:

a <- matrix(rpois(60, lambda=5), ncol=6)
g <- factor(rep(LETTERS[1:3], each=2)).

So please can you clarify my doubts.

Thank you.

Regards,

Govindraj Chavan

ADD REPLY
0
Entering edit mode

Just upgrade to the latest version of edgeR. The code in my last post was just a simple example to demonstrate the current behaviour of glmLRT; its application to your situation should be obvious.

ADD REPLY
0
Entering edit mode

Dear Aaron,

Thank you very much for clarifying my doubts. Will update my edgeR to latest version and repeat the analysis with similar commands mentioned above. Hope it works this time..:)

Regards,

Govindraj Chavan

 

ADD REPLY

Login before adding your answer.

Traffic: 766 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6