edgeR for SDE analysis
1
0
Entering edit mode
@df306e2b
Last seen 4 months ago
Argentina

Hello!! I am very new at this so I probably have a lot of mistakes! I want to analyze RNAseq data for DE of miRNAs in two gorups of patients (HCV and HCV/HIV) with different stages of fibrosis (significant vs no significant). The idea is to look for miRNAs involved in the fibrogenesis process and if there are differences between both group of patients. So I have my samples in a data.frame with their characteristics and I have my input_counts in a data.frame I normalized the counts by creatinga DGElist object:

dge <- DGEList(counts=input_counts, group=GRUPO)
design <- model.matrix (~0+GRUPO+FIBROSIS)


and filter the data by using:

keep <- filterByExpr(dge, design)
dge <- dge[keep, , keep.lib.sizes=FALSE]


estimate the dispersion

dge <- calcNormFactors(dge)
dge <- estimateGLMCommonDisp (dge, design)
dge <- estimateGLMTagwiseDisp (dge, design)


And calculated the cpm:

exo_mirna_norm <- cpm(dge, normalized.lib.sizes=FALSE)


After normalization I wanted to do the different contrasts in order to answer my biological question, but I am not sure with design fits better for what I want to do. And I don't know if it is better to do one design and do all contrasts with it or you can do multiple designs for every question you have... To study differences between fibrosis I tried:

design2 <- model.matrix(~0+FIBROSIS)
fit <- glmQLFit(dge, design2)
qlf <-  glmQLFTest(fit, contrast=c(1,-1))
topTags(qlf)
toptagsFIBROSIS <- topTags(qlf, n=nrow(dge))
summary(decideTests((qlf)))

toptagsFIBROSIS_SDE <- toptagsFIBROSIS %>%
as.data.frame %>%
subset(FDR <= 0.05 & abs(logFC) >=log2(1.5))


But I think I can do the same by:

design3 <- model.matrix(~0+GRUPO:FIBROSIS)
fit <- glmQLFit(dge, design3)
qlf <-  glmQLFTest(fit, contrast=c(1,1,-1,-1))
topTags(qlf)
toptagsFIBROSIS_2 <- topTags(qlf, n=nrow(dge))
summary(decideTests((qlf)))

toptagsFIBROSIS_2_SDE <- toptagsFIBROSIS_2 %>%
as.data.frame %>%
subset(FDR <= 0.05 & abs(logFC) >=log2(1.5))


Anyway.. the thing is that I don't quiet understand which design is better, if the interaction GRUPO:FIBROSIS as it takes into account both variables or if just it is better to do a first design only with FIBROSIS and compare both patients, then do another design with only the GROUP information and compare both groups,etc....

Thank you very much to this community!! Vicky, from Argentina

RNASeqRData edgeR RNASeqR RNASeqData • 195 views
1
Entering edit mode
@gordon-smyth
Last seen 5 hours ago
WEHI, Melbourne, Australia

I don't follow the details of your experiment, but it is always better to use one comprehensive design matrix for all questions and contrasts. Using simpler design matrices that ignore genuine characteristics of the data, like interactions, will not give good results.