Advice on designing an EdgeR object in complex experimental setup ?
1
0
Entering edit mode
Mohamed • 0
@aa1ae679
Last seen 2 days ago
United Kingdom

Dear All, I have a tricky question regarding the use of EdgeR in R to run differential expression analyses of the following experiment: it is a multifactorial experiment with around 55 sample designed as follow: Conditions = 2 (treatment & Control) Breed = 4 breeds Sex of animals = 2 (male and Female)

Within each breed, I have samples from 5 individuals (A, B, C, D and E)

I have two main questions:

I know that I can adjust for additional variables upon DE analyses, but When running an nMDS looking at how treated subjects would seprats from control, I can not really visualize this, while adjusting for differences in gender or breed using the nMDS approach ... Are there any idea or technique so that when visualizing the clustering of treatment vs Ctr samples to adjust for other covariate like sex or breed in here ? I need to do this because if I see clustering, I am afriad that this could be due to sex-specific effect, and not necessarily a treatment effect.

If I am going to calculate DE genes between treatmeent & Ctr, I think I need to adjust for 1) base line differences between individuals (as this is a paired design because the same animals were blood treated and as a Control), 2) sex differences 3)Breed.

Could you advice, which of the designs I should follow from the updated guidlines of edgeR (https://www.bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf) that could match my experiment ? I though about the one in section 3.4.1, but with more factors and adjusting for them.

Thanks a lot for any comments. .. Code should be placed in three backticks as shown below

rRDPData r • 516 views
ADD COMMENT
0
Entering edit mode

Cross-posted to Biostars https://www.biostars.org/p/9520241/

ADD REPLY
0
Entering edit mode

The analysis depends on what questions you want to ask.

• Do you want to test Treated vs Control for each breed separately or only overall?
• Do you want to test Treated vs Control for each sex separately or only overall?
• Do you need to make baseline comparisons between the breeds, for example Breed 1 Control vs Breed 2 Control?
ADD REPLY
0
Entering edit mode

Let us assume I want to test treatement Vs control Overall : so using nMDS for that (the first question), how can I adjust for sex/breed variation. Because I assume if one see sepration (or even not), this could be due to gende difference or breed differences between control or treatment subjects rather than the treatment itself

If I am going to run DE looking at differences between treatment/control, I need also to adjust for that.

Could you advice on which example given in the tutorial (link is above) fits the case when I need to compare between treatment and control overall ? taking a note that this is a paired design, so many of the treated subjects were the same used as controls. Thanks

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

Given that you simply want to make an overall comparison between treatment and control, not specific to particular breeds or genders, then the whole experiment is just a simple paired comparison (Section 3.4.1).

First, you need to make unique labels for all the individual animals, because currently the IDs are being recycled within breeds:

Animal <- factor(paste(Individual,Breed,sep="."))


The design matrix for the DE analysis will be:

design <- model.matrix(~Animal+Status)


To see the treatment vs control effect in an MDS plot you can remove the animal effects. Assuming that y is the DGEList object (after filtering and calcNormFactors),

logCPM <- cpm(y, log=TRUE)
design.Status <- model.matrix(~Status)
logCPM.corrected <- removeBatchEffect(logCPM, batch=Animal, design=design.Status)
plotMDS(logCPM.corrected, label=Status)


You do not need to correct for Breed or Sex because correcting for Animal already handles that.

ADD COMMENT
0
Entering edit mode

Thanks Sorry fpr bothering again : but wjy do you repeat the design again for MDS plotting ? and how combining individual and breed adjusts for sex ? Actually, I am still wondering if I compare between treatment vs control, there are impalance between male and female.

Could you also comment on why did not you perform normaliztion TMM at the beginning ? before designing the matrix ? Also, is the DE analyses is done on TMM counts or CPM counts of do I nned to run both normalization before designing the matrix and DE analyses ?

ADD REPLY
0
Entering edit mode

Gordon Smyth : I am sorry for bothering: but could you explain it more for me why did you made a nse design when attempting to remove batch effect ? I saw it only ~ Status !

Also: I could not get it, you first combine individual and breed into a new factor(Animal), so I understand that this removes effects from both variables, but how creating this Animal factor could also remove effects of different sex ?

ADD REPLY
0
Entering edit mode

In my code, y is the DGEList object after filtering and TMM. Your question was only about the experimental design, so I assume that you know how to undertake other aspects of the analysis.

In the paired analysis, Treated is compared to Control for the same animal only. Therefore the comparison is always made between observations for the same sex and the same breed. It is unnecessary (and impossible) to make any further adjustment. Imbalance of sexes is irrelevant because baseline differences between the sexes have been completed removed from the comparison.

This is a very standard analysis. Please read the documentation, for example help("removeBatchEffect").

ADD REPLY
0
Entering edit mode

Thanks for that was really helpful

Gordon Smyth : I have tried the previous codes, it worked nice for the plotMDS, and I can clearly see difference. I just still have some questions:

First: Am I right that the object 'design' (that you suggested for DE analyses, which adjust for Animal) is the same one as the 'design status' that you suggested for MDS after removing the batch effect of Animal?

Second: For DE analyses: I will be using the TMM_filtered count (i.e. not the CPM or logs as this is accounted for anyhow in the DE), and will use the design as you suggested

design <- model.matrix(~Animal+Status)

So my code will looks like this, comparing Stimulated vs unstimulated assumping y is the DGElist

design <- model.matrix(~Animal+Status)

fit <- glmQLFit(y,design)

tr <- glmQLFTest(fit)

Am I right in the previous codes ?

Actually I am confused somehow on using coef to tell the program whihc comparson to make as I only have two condition ! if you could emphasize it more.

Thanks

ADD REPLY
0
Entering edit mode

Glad that it has all worked correctly. I do not understand your new questions however.

The two design matrices design and design.status are obviously different and were used for different purposes. I gave explict code for both so there was no room any ambiguity. How could you think they are the same? The batch removal is only for the MDS plot and is competely independent of the DE analysis.

You specify coef the same as in any edgeR analysis. In the glmQLFTest code you show above, edgeR has assumed that you want to test the last coefficient in your model, which happens to be the correct one here. If you want you could specify it explicitly by coef="StatusTreated". Type columns(fit) to see the possible values for coef.

Cross posting

I do not have time to answer the same question from the same poster on multiple forums. Generally speaking, if I see that someone has posted the same question in multiple forums, without waiting for an answer in either, then I will not answer the question at all. I see that you are regularly updating your question Biostars, but I will not respond to that. Please do not "ping" me on Biostars.

ADD REPLY
0
Entering edit mode

Thanks a lot Gordon, I understood the point of coeff. I just do have a question: When I create the design, does this differs if the coef. is named as (statusTreated) or (StatusControl) ? I mean from the point of view of getting the DE genes. In my analyses, when making the design it is named as (control), but assigning 0 and 1 correctly to each subject as appeared in the last column. Am I right ?

thanks

ADD REPLY
0
Entering edit mode

Gordon Smyth : Could you also advise me on what is the best method to determine the if treatment has an effect compared to control in particular breed, using the RNA seq data ? I actually thought about measuring the euclidean distance between treatment and control animals, and look for average distance between them. Here I am going to use the count used for the MDS analyses that you told in the post (filtered CPM, TMM count) ? Am I right ? or shall I use the TMM filtered counts used for the DE analyses ?

Thanks a lot and sorry for bothering with many questions.

ADD REPLY

Login before adding your answer.

Traffic: 194 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6