How to reduce the impact of one varaible in Deseq2 or edgeR for multivariate value analysis?
3
0
Entering edit mode
yuet • 0
@a6741209
Last seen 8 days ago
France

Hello, everyone,

I'm recently meeting this problem with my analysis, which i've done a lots of research and asked people around but their answers are quite confusing, so if I can get more opinions, that'd be terrific and thanks at advance.

So I'm doing an analysis of DEGs using Deseq2 and edgeR, the problem here is that I get those two variable, each has two conditions: grippe virus and no, Carcinoma and other. But we really want to figure out is, the impact of virus, but the histological type is quite important as well. So I'm wondering if there is any way that we can skip ou reduce the impact of histological type by designing our module.

It's not an interaction condition, so I built my design code like this:

Code should be placed in three backticks as shown below


DESeq.ds <- DESeqDataSetFromTximport(txi.rsem,sampleTable , ~ condition+Histological_type)

resultsNames(DESeq_new)
[1] "Intercept"                                 "condition_NEG_vs_VIRUS"
[3] "Histological_type_Other_vs_Carcinoma"



So I calculated two conditions, now I've to decide which one should I keep it, Some biologist suggest to find the commun part and get rid of it because what we interested in is virus part. Some of bioinformatist suggest that I need to just build one condition in the first place, like that:

DESeq.ds <- DESeqDataSetFromTximport(txi.rsem,sampleTable , ~ condition)


but in this case I have a lots of outliers cells.

compare to my design with two variables:

I also done the analysis with edgeR, but the part of construct the model makes me less sure, because I don't quite understand what I'm doing, if someone could clarify this part for me would be a great help.

> desigN <- model.matrix(~0+histology_type_edger+Groups_edgeR)
> desigN
histology_type_edgerCarcinoma  histology_type_edgerOther  Groups_edgeRVIRUS
1                                  1                         0               1
2                                  1                         0               1
3                                  1                         0               1
4                                  1                         0               1
5                                  0                         1               1
6                                  0                         1               1
7                                  1                         0               0
8                                  1                         0               0
9                                  1                         0               0
10                                 1                         0               0
11                                 0                         1               0
12                                 1                         0               0
13                                 1                         0               0
14                                 1                         0               0
15                                 0                         1               0
16                                 1                         0               0


Thanks for your help!!!

MultipleComparison edgeR DESeq2 • 729 views
1
Entering edit mode
@mikelove
Last seen 8 hours ago
United States

As covered in the DESeq2 vignette, this is a clear case for the multi-factor design -- you want to control for one while assessing the impact of the other. I don't know any statisticians who would recommend not including histological type in the design, if it impacts expression.

0
Entering edit mode

Thanks for your reply, that was what I saw.

But I didn't actually get a very explicable result, so some biologists suggest, that I get rid of the common DEGs of DGE.results_histo and DGE.results_virus from the DEGs of DGE.results_virus, what do you think of this idea?

0
Entering edit mode

I don't have much more to add than what I said above.

0
Entering edit mode

Thank you for the suggestion, Michael!!

1
Entering edit mode
@gordon-smyth
Last seen 59 minutes ago
WEHI, Melbourne, Australia

Linear model construction is the same in edgeR and DESeq2 so I don't understand why you are using completely different variables for edgeR.

Our clear advice in the edgeR guide for most datasets like this is to make a single Group variable with four levels, corresponding to the four combinations of virus state and histology. Then any comparison you want to make becomes quite simple. In particular, you can test for the virus effect for each histological condition separately, which allows you to separate the virus effect from histology completely and transparently.

Trying to fit an additive model, as you have, is bound to give poor results if virus and carcinoma have interacting effects. In my opinion, your idea of subtracting common DEGs in some way is neither necessary nor useful.

0
Entering edit mode

You mean like what guide write for the multifactors 3.3.1? I thought it was just for the situation of time series. So if I understand correctly, what I shall do after the design is:

my.contrasts <- makeContrasts( NEG-car=NEG.Carcinoma, NEG-other=NEG.other, VIRUS-car=VIRUS.Carcinoma, VIRUS-other=VIRUS.other, level=design)

qlf <- glmQLFTest(fit, contrast=c(0,0,1,1))


If I want the result with virus.

But I do have a question about the first example you guys elaborate at the case study, I thought it's more close to my situation, does it? Could you please expain about the different desgin between what you suggest me to do and this study case?

Xin

0
Entering edit mode

Yes, I mean like Section 3.3.1. It works for any factors, not just time series.

No, the contrasts you have suggested are not right; what you've written isn't even syntactically correct. You need to form VIRUS.Carcinoma - NEG.Carcinoma to see the effect of the virus for carcinomas and VIRUS.other - NEG.other to see the effect of the virus for non-carcinomas.

The case study is a paired design with one factor. You have two factors, which might be interacting.

0
Entering edit mode

Hello, Gordon Thanks again for helping me,

So if I want to see the effect of virus and no-virus( because main factor will be virus), I'll make contrast for Carcinoma.VIRUS - Other.VIRUS and Carcinoma.NEG - other.NEG.

But I get an error which I've no idea how to fix it, could you help me with it, please? so my codes are:

contrasts.matrix<-makeContrasts(VirusCarVSother=VIRUS.Carcinoma-VIRUS.Other,
NegCarVSother=NEG.Carcinoma-NEG.Other,levels=design)

qlf<-glmQLFTest(fit,contrast=contrasts.matrix[,"VirusCarVSother"])


the error is :

Error in glmLRT(glmfit, coef = coef, contrast = contrast) : contrast vector of wrong length, should be equal to number of coefficients in the linear model.

my design is :

       VIRUS.Carcinoma VIRUS.Other NEG.Carcinoma NEG.Other
1                   1         0                  0         0
2                   1         0                  0         0
3                   1         0                  0         0
4                   1         0                  0         0
5                   0         1                  0         0
6                   0         1                  0         0
7                   0         0                  1         0
8                   0         0                  1         0
9                   0         0                  1         0
10                  0         0                  1         0
11                  0         0                  0         1
12                  0         0                  1         0
13                  0         0                  1         0
14                  0         0                  1         0
15                  0         0                  0         1
16                  0         0                  1         0
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$Des_Group [1] "contr.treatment"  And I'm wondering if the order of contrast would matter? like is it the same between VIRUS.Carcinoma and Carcinoma.VIRUS when I make contrast? I assume no, but just for make sure. Thanks again, Xin ADD REPLY 0 Entering edit mode The error message is pretty clear, is it not? The two lines of code you show look fine but you have made a programming mistake somewhere leading up to this. So you have to do some debugging. What do you see from colnames(fit) and dim(contrasts.matrix)? ADD REPLY 0 Entering edit mode Thanks, that was my thought, but no idea why was happening. But after rechecking today, I realized because I put an intermediate step, which is xxx=estimDispRest, and glmQLFit(xxx...) got wrong. So all is good now, but just out of curiosity, why my intermediate step can't take everything in my value with the dispersion estimate step, please? ADD REPLY 0 Entering edit mode There is no such function as estimDispRest. It is hardly surprising that inserting non-existent functions into your code may destroy objects and affect downstream results. ADD REPLY 0 Entering edit mode Excuse me for not being very clear and thanks for the great help . Actually estimDispRest is the result of my function: estimDispRest <- estimateDisp(y, design,robust=T), so what I was doing is to give hime another name like xxx, but when function glmQLFit(xxx...) finished, the length is not correct, but when I'm doing glmQLFit(estimDispRest ...) it works. So i was wondering the reason. ADD REPLY 0 Entering edit mode The error cannot possibly have been caused by estimateDisp. It is more likely that you failed to specify the correct design matrix to glmQLFit. Anyway, you need to do your own debugging. ADD REPLY 0 Entering edit mode Thanks for replying and suggestions!!! Have a good day ADD REPLY 0 Entering edit mode yuet • 0 @a6741209 Last seen 8 days ago France Hello again Mike and Gordon, Thanks very much for your previous help for the construction of my analysis model, and sorry for bothering you again because my result is quite stranger compared to other bibliographies I read, please kindly help me again. So I make a ggplot to see the correlation of two log fold-change result. My objectif is to observe the impact of virus in different histological type of cancer in some patients, here I separated into Carcinoma and Other as histological type, the infection condition is Virus, which has this type of virus, Neg which haven't. In this circumstance, I made Neg and Carcinoma as reference group My code for Deseq2 is:  DESeq.ds <- DESeqDataSetFromTximport(txi.rsem,sampleTable , ~ infection+Histological_type) DESeq_new <- DESeq(DESeq.ds) DGE.results=results(DESeq_new,pAdjustMethod = "BH",contrast=c("infection","VIRUS","Neg"))  My edgeR code is :  y <- DGEList(normCts_final) list_calculate <- scaleOffset(y, matrix_normM) #construction of model Group <- factor(paste(sampleTable$infection,sampleTable\$Histological_type,sep="."))
desigN <- model.matrix(~0+Group)
colnames(desigN) <- levels(Group)

keep <- filterByExpr(list_calculate,desigN)
list_pret <- list_calculate[keep, ]

#estimate the dispersion of the data
estimeDisp <- estimateDisp(list_pret, design=desigN,robust=T)
fit <- glmQLFit(estimeDisp, contrast=desigN,robust=T)

contrasts.matrix<-makeContrasts(VIRUScarVSother=VIRUS.Other-VIRUS.carcinome,
NegcarVSother=Neg.Other-Neg.carcinome,levels=desigN)
qlf<-glmQLFTest(fit,contrast=contrasts.matrix[,"VIRUScarVSother"])


When I change only my code Deseq2 to :

DGE.results=results(DESeq_new,pAdjustMethod = "BH",contrast=c("Histological_type","Other","carcinome"))


I got this, which seems like what we want, but It would not be correct to explain, because I want to know the imapct of virus rather than histological type. Any thoughts about this?

I did not change my code edger because there is no result otherwise, I tried the estimateGLMRobustDisp, did not work as well.

I send this question for both of you, because I can't be sure for any of those my code, so maybe I did't fully understand your previous suggestion. Thanks a lot for your comments about my result and solutions maybe?

Xin

0
Entering edit mode

You have fitted very different models in DESeq2 and edgeR. This has nothing to do with differences between DESeq2 and edgeR, just that you have chosen to do different analyses in the two packages. Trying to compare them makes little sense.

I don't have time to give more advice about your specific dataset. I feel I have given you clear advice already.