Hello!! I am very new at this so I probably have a lot of mistakes! I want to analyze RNAseq data for DE of miRNAs in two gorups of patients (HCV and HCV/HIV) with different stages of fibrosis (significant vs no significant). The idea is to look for miRNAs involved in the fibrogenesis process and if there are differences between both group of patients. So I have my samples in a data.frame with their characteristics and I have my input_counts in a data.frame I normalized the counts by creatinga DGElist object:
dge <- DGEList(counts=input_counts, group=GRUPO)
design <- model.matrix (~0+GRUPO+FIBROSIS)
and filter the data by using:
keep <- filterByExpr(dge, design)
dge <- dge[keep, , keep.lib.sizes=FALSE]
estimate the dispersion
dge <- calcNormFactors(dge)
dge <- estimateGLMCommonDisp (dge, design)
dge <- estimateGLMTagwiseDisp (dge, design)
And calculated the cpm:
exo_mirna_norm <- cpm(dge, normalized.lib.sizes=FALSE)
After normalization I wanted to do the different contrasts in order to answer my biological question, but I am not sure with design fits better for what I want to do. And I don't know if it is better to do one design and do all contrasts with it or you can do multiple designs for every question you have... To study differences between fibrosis I tried:
design2 <- model.matrix(~0+FIBROSIS)
fit <- glmQLFit(dge, design2)
qlf <- glmQLFTest(fit, contrast=c(1,-1))
topTags(qlf)
toptagsFIBROSIS <- topTags(qlf, n=nrow(dge))
summary(decideTests((qlf)))
toptagsFIBROSIS_SDE <- toptagsFIBROSIS %>%
as.data.frame %>%
subset(FDR <= 0.05 & abs(logFC) >=log2(1.5))
But I think I can do the same by:
design3 <- model.matrix(~0+GRUPO:FIBROSIS)
fit <- glmQLFit(dge, design3)
qlf <- glmQLFTest(fit, contrast=c(1,1,-1,-1))
topTags(qlf)
toptagsFIBROSIS_2 <- topTags(qlf, n=nrow(dge))
summary(decideTests((qlf)))
toptagsFIBROSIS_2_SDE <- toptagsFIBROSIS_2 %>%
as.data.frame %>%
subset(FDR <= 0.05 & abs(logFC) >=log2(1.5))
Anyway.. the thing is that I don't quiet understand which design is better, if the interaction GRUPO:FIBROSIS as it takes into account both variables or if just it is better to do a first design only with FIBROSIS and compare both patients, then do another design with only the GROUP information and compare both groups,etc....
Thank you very much to this community!! Vicky, from Argentina