Hi everyone,
I am beginner in edgeR and RNA-seq data analysis, so I need your guidance to set up proper design matrix and contrast for my research questions. I have 64 RNA-seq samples (all biologically independent) from four different populations of same species, which were collected from two different experiments, that is, Acute experiment(acute control + acute stress) and chronic experiment(chronic control + chronic stress) with four biological replicates for each of these conditions.
I want know:
i) If expression levels differ between acute and chronic treatments (within a species)
ii) If expression levels differ between populations within species.
To test above hypothesis, I have followed following steps in edgeR:
> pv_acute_chronic_rawcounts = read.table("pv_acute_chronic_genes_inorder.counts.matrix", header=T, row.names=1, com='')
> pv_acute_chronic_description <- read.table("pv_AC_LC_description_tab_delimited.csv", sep="\t", header=TRUE)
> pv_acute_chronic_group <- factor(paste0(pv_acute_chronic_description$Population, ".", pv_acute_chronic_description$Treatment))
> pv_acute_chronic_dgelist <- DGEList(counts=pv_acute_chronic_rawcounts, group=pv_acute_chronic_group)
> pv_acute_chronic_keep <- rowSums(cpm(pv_acute_chronic_dgelist) > 1) >= 4
> pv_acute_chronic_dgelist <- pv_acute_chronic_dgelist[pv_acute_chronic_keep, , keep.lib.sizes=FALSE]
> pv_acute_chronic_dgelist <- calcNormFactors(pv_acute_chronic_dgelist)
> pv_acute_chronic_design <- model.matrix(~0+pv_acute_chronic_group, data=pv_acute_chronic_dgelist$samples)
> colnames(pv_acute_chronic_design) <- levels(pv_acute_chronic_group)
> pv_acute_chronic_dgelist <- estimateDisp(pv_acute_chronic_dgelist, pv_acute_chronic_design, robust=TRUE)
> pv_acute_chronic_fit <- glmQLFit(pv_acute_chronic_dgelist, pv_acute_chronic_design, robust=TRUE)
##Contrasts to get DE genes between treatment and control for each populations
> contrasts1 <- makeContrasts(NATvsNAC = Population1.Acute.Treatment-Population1.Acute.Control, NCTvsNCC = Population1.Chronic.Treatment-Population1.Chronic.Control, IATvsIAC = Population2.Acute.Treatment-Population2.Acute.Control, ICTvsICC = Population2.Chronic.Treatment-Population2.Chronic.Control, SATvsSAC = Population3.Acute.Treatment-Population3.Acute.Control, SCTvsSCC = Population3.Chronic.Treatment-Population3.Chronic.Control, AATvsAAC = Population4.Acute.Treatment-Population4.Acute.Control, ACTvsACC = Population4.Chronic.Treatment-Population4.Chronic.Control, levels=pv_acute_chronic_design)
##Contrasts to get DE genes between populations for acute experiment:
> contrasts2 <- makeContrasts(Population1vsPopulation2.Acute = (Population1.Acute.Treatment-Population1.Acute.Control)-(Population2.Acute.Treatment-Population2.Acute.Control), Population1vsPopulation3.Acute = (Population1.Acute.Treatment-Population1.Acute.Control)-(Population3.Acute.Treatment-Population3.Acute.Control), Population1vsPopulation4.Acute = (Population1.Acute.Treatment-Population1.Acute.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), Population2vsPopulation3.Acute = (Population2.Acute.Treatment-Population2.Acute.Control)-(Population3.Acute.Treatment-Population3.Acute.Control), Population2vsPopulation4.Acute = (Population2.Acute.Treatment-Population2.Acute.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), Population3vsPopulation4.Acute = (Population3.Acute.Treatment-Population3.Acute.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), levels=pv_acute_chronic_design)
##Contrasts to get DE genes between populations for chronic experiment:
> contrasts3 <- makeContrasts(Population1vsPopulation2.Chronic = (Population1.Chronic.Treatment-Population1.Chronic.Control)-(Population2.Chronic.Treatment-Population2.Chronic.Control), Population1vsPopulation3.Chronic = (Population1.Chronic.Treatment-Population1.Chronic.Control)-(Population3.Chronic.Treatment-Population3.Chronic.Control), Population1vsPopulation4.Chronic = (Population1.Chronic.Treatment-Population1.Chronic.Control)-(Population4.Chronic.Treatment-Population4.Chronic.Control), Population2vsPopulation3.Chronic = (Population2.Chronic.Treatment-Population2.Chronic.Control)-(Population3.Chronic.Treatment-Population3.Chronic.Control), Population2vsPopulation4.Chronic = (Population2.Chronic.Treatment-Population2.Chronic.Control)-(Population4.Chronic.Treatment-Population4.Chronic.Control), Population3vsPopulation4.Chronic = (Population3.Chronic.Treatment-Population3.Chronic.Control)-(Population4.Chronic.Treatment-Population4.Chronic.Control), levels=pv_acute_chronic_design)
##Contrasts to get DE genes between acute and chronic experiments per population:
> contrasts4 <- makeContrasts(Population1.ChronicvsPopulation1.Acute = (Population1.Chronic.Treatment-Population1.Chronic.Control)-(Population1.Acute.Treatment-Population1.Acute.Control), Population2.ChronicvsPopulation2.Acute = (Population2.Chronic.Treatment-Population2.Chronic.Control)-(Population2.Acute.Treatment-Population2.Acute.Control), Population3.ChronicvsPopulation3.Acute = (Population3.Chronic.Treatment-Population3.Chronic.Control)-(Population3.Acute.Treatment-Population3.Acute.Control), Population4.ChronicvsPopulation4.Acute = (Population4.Chronic.Treatment-Population4.Chronic.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), levels=pv_acute_chronic_design)
So, please can anyone shed some light on this matter and let me know if I am doing it right or not. Especially, regarding setting contrasts and design matrix?
Thank you in advance.
Regards,
Govindraj Chavan
Dear Aaron,
Thank you for quick response and confirming my strategy for setting up design matrix and contrast parameters.
Regards,
Govindraj Chavan
Dear Aaron,
Sorry to bother you again. I have one more query regarding same experimental setup described in previous question. In brief, I have three factors as described below:
Factor1: Populations(levels=4 (pop1, pop2, pop3 & pop4)).
Factor2: Acute stress experiment (levels=2 (control & stress)).
Factor3: Chronic stress experiment (levels=2 (control & stress)).
In total 16 levels will be formed by combination of these factors.
And I have used following steps:
> pv_acute_chronic_group <- factor(paste0(pv_acute_chronic_description$Population, ".", pv_acute_chronic_description$Treatment))
> pv_acute_chronic_design <- model.matrix(~0+pv_acute_chronic_group, data=pv_acute_chronic_dgelist$samples)
> pv_acute_chronic_dgelist <- estimateDisp(pv_acute_chronic_dgelist, pv_acute_chronic_design, robust=TRUE)
> pv_acute_chronic_fit <- glmQLFit(pv_acute_chronic_dgelist, pv_acute_chronic_design, robust=TRUE)
##Contrasts to get DE genes between populations for acute experiment:
> contrasts2 <- makeContrasts(Population1vsPopulation2.Acute = (Population1.Acute.Treatment-Population1.Acute.Control)-(Population2.Acute.Treatment-Population2.Acute.Control), Population1vsPopulation3.Acute = (Population1.Acute.Treatment-Population1.Acute.Control)-(Population3.Acute.Treatment-Population3.Acute.Control), Population1vsPopulation4.Acute = (Population1.Acute.Treatment-Population1.Acute.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), Population2vsPopulation3.Acute = (Population2.Acute.Treatment-Population2.Acute.Control)-(Population3.Acute.Treatment-Population3.Acute.Control), Population2vsPopulation4.Acute = (Population2.Acute.Treatment-Population2.Acute.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), Population3vsPopulation4.Acute = (Population3.Acute.Treatment-Population3.Acute.Control)-(Population4.Acute.Treatment-Population4.Acute.Control), levels=pv_acute_chronic_design)
I want to get common set of genes which differentially expressed among four populations. I tried following command to achieve above objective:
> pc_de_btw_pops_anov <- glmQLFTest(pc_acute_chronic_fit, contrast=contrasts2)
> topTags(pc_de_btw_pops_anov)
Coefficient: LR test of 3 contrasts
logFC.Population1vsPopulation2.Acute logFC.Population1vsPopulation3.Acute logFC.Population1vsPopulation4.Acute logCPM F PValue FDR
TRINITY_DN277195_c2_g2 0.266962268 3.833956231 3.973849235 5.805308914 78.64811033 7.83E-22 7.99E-17
TRINITY_DN281190_c3_g3 -0.22663792 2.533908963 2.990723426 4.972112813 49.6939098 5.72E-17 2.92E-12
TRINITY_DN283707_c3_g2 0.02528481 -2.97083914 -5.85401231 3.109560074 43.03080562 1.42E-15 4.83E-11
TRINITY_DN248566_c0_g1 -0.20162506 -3.70984053 -3.10586704 2.965669951 41.96948664 2.44E-15 6.23E-11
TRINITY_DN268448_c0_g1 0.434988646 6.151765969 0.963942064 1.35252841 39.87257452 7.32E-15 1.49E-10
Why does not above test produce results for last three contrasts from contrast matrix (contrasts2)? Is something missing in design matrix? Am I doing it in wrong way?
Please can you share your thoughts on this issue.
Thank you.
Regards,
Govindraj Chavan
Make sure you're using the latest version of edgeR. The behaviour of
glmLRT
(on whichglmQLFTest
relies) changed in the last release. Previously, log-fold changes for redundant contrasts were not reported, which is why some of your contrasts are missing. (For example, if you testedA=B
andB=C
, you would not need to specifyA=C
, as this would be redundant and ignored.) Now they should be reported, as illustrated in the following example:Dear Aaron,
Thanks a ton for quick feedback. I am using edgeR version 3.16.5 as you can see below in session info output:
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-slackware-linux-gnu (64-bit)
Running under: Slackware 14.2
locale:
[1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US
[4] LC_COLLATE=C LC_MONETARY=en_US LC_MESSAGES=en_US
[7] LC_PAPER=en_US LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.16.5 limma_3.30.8
loaded via a namespace (and not attached):
[1] tools_3.3.1 grid_3.3.1 locfit_1.5-9.1 lattice_0.20-33
So, do you think this older version of edgeR is causing this problem? the latest version of edgeR is 3.18.1 as per Bioconductor website.
There is no information in edgeR user guide regarding these new parameters.
How do I set these two parameters for my experiment:
So please can you clarify my doubts.
Thank you.
Regards,
Govindraj Chavan
Just upgrade to the latest version of edgeR. The code in my last post was just a simple example to demonstrate the current behaviour of
glmLRT
; its application to your situation should be obvious.Dear Aaron,
Thank you very much for clarifying my doubts. Will update my edgeR to latest version and repeat the analysis with similar commands mentioned above. Hope it works this time..:)
Regards,
Govindraj Chavan