Question

edgeR for SDE analysis

0

Entering edit mode

vikycairoli • 0

@df306e2b

Last seen 3.0 years ago

Argentina

Hello!! I am very new at this so I probably have a lot of mistakes! I want to analyze RNAseq data for DE of miRNAs in two gorups of patients (HCV and HCV/HIV) with different stages of fibrosis (significant vs no significant). The idea is to look for miRNAs involved in the fibrogenesis process and if there are differences between both group of patients. So I have my samples in a data.frame with their characteristics and I have my input_counts in a data.frame I normalized the counts by creatinga DGElist object:

dge <- DGEList(counts=input_counts, group=GRUPO)
design <- model.matrix (~0+GRUPO+FIBROSIS)

and filter the data by using:

keep <- filterByExpr(dge, design)
dge <- dge[keep, , keep.lib.sizes=FALSE]

estimate the dispersion

dge <- calcNormFactors(dge)
dge <- estimateGLMCommonDisp (dge, design) 
dge <- estimateGLMTagwiseDisp (dge, design)

And calculated the cpm:

exo_mirna_norm <- cpm(dge, normalized.lib.sizes=FALSE)

After normalization I wanted to do the different contrasts in order to answer my biological question, but I am not sure with design fits better for what I want to do. And I don't know if it is better to do one design and do all contrasts with it or you can do multiple designs for every question you have... To study differences between fibrosis I tried:

design2 <- model.matrix(~0+FIBROSIS)
fit <- glmQLFit(dge, design2)
qlf <-  glmQLFTest(fit, contrast=c(1,-1))
topTags(qlf)
toptagsFIBROSIS <- topTags(qlf, n=nrow(dge))
summary(decideTests((qlf)))

toptagsFIBROSIS_SDE <- toptagsFIBROSIS %>%
  as.data.frame %>%
  subset(FDR <= 0.05 & abs(logFC) >=log2(1.5))

But I think I can do the same by:

design3 <- model.matrix(~0+GRUPO:FIBROSIS)
fit <- glmQLFit(dge, design3)
qlf <-  glmQLFTest(fit, contrast=c(1,1,-1,-1))
topTags(qlf)
toptagsFIBROSIS_2 <- topTags(qlf, n=nrow(dge))
summary(decideTests((qlf)))

toptagsFIBROSIS_2_SDE <- toptagsFIBROSIS_2 %>%
  as.data.frame %>%
  subset(FDR <= 0.05 & abs(logFC) >=log2(1.5))

Anyway.. the thing is that I don't quiet understand which design is better, if the interaction GRUPO:FIBROSIS as it takes into account both variables or if just it is better to do a first design only with FIBROSIS and compare both patients, then do another design with only the GROUP information and compare both groups,etc....

Thank you very much to this community!! Vicky, from Argentina

RNASeqRData edgeR RNASeqR RNASeqData • 1.1k views

ADD COMMENT • link updated 3.1 years ago by Gordon Smyth 52k • written 3.1 years ago by vikycairoli • 0

score 1 · Answer 1 · 2022-02-24

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 6 hours ago

WEHI, Melbourne, Australia

I don't follow the details of your experiment, but it is always better to use one comprehensive design matrix for all questions and contrasts. Using simpler design matrices that ignore genuine characteristics of the data, like interactions, will not give good results.

ADD COMMENT • link 3.1 years ago Gordon Smyth 52k