Entering edit mode
Hi everyone,I am new to R and would like to ask for help to checking for the code that I wrote to see if it is correct. I am doing DE analysis in two groups ,Treatment and Control groups, I want to find out the DE gene list in Treatment group compared to the Controls.I also want to perform correction for age and sex when do the analysis. All suggestions are welcomed ,Thank you guys! Code should be placed in three backticks as shown below
#control group data
counts34 <- cbind(row3, row4)
# To put two groups in one file: row2=Treatment group, counts34=control group
all_counts2vs34 <- cbind(row2, counts34)
print(dim(all_counts2vs34))
# group factors
group <- factor(c(rep("Treatment", ncol(row2)), rep("Control", ncol(counts34))))
print(group)
# age and gender information
age <- c(age_clust2, age_clust34)
gender <- c(gender_clust2, gender_clust34)
# DEG list and adding age and gender information to the list
y <- DGEList(counts=all_counts2vs34, group=group)
y$samples$age <- age
y$samples$gender <- gender
print(y$samples)
print(levels(y$samples$group))
# Filteration
keep <- filterByExpr(y, group=group)
table(keep)
y <- y[keep, , keep.lib.sizes=FALSE]
# desing the matrix and correcting for age, gender
design <- model.matrix(~ gender + age + group)
# Normalization
y <- calcNormFactors(y)
# common dispersion and tagwise dispersion
y <- estimateDisp(y, design)
plotBCV(y)
# differential expression analysis
fit <- glmQLFit(y, design)
results <- glmQLFTest(fit, coef="groupTreatment")
> counts34 <- cbind(row3, row4)
> all_counts2vs34 <- cbind(row2, counts34)
> print(dim(all_counts2vs34))
[1] 60683 123
> group <- factor(c(rep("Treatment", ncol(row2)), rep("Control", ncol(counts34))))
> print(group)
[1] Treatment Treatment Treatment Treatment Treatment Treatment Treatment Treatment
[9] Treatment Treatment Treatment Treatment Treatment Treatment Treatment Control
[17] Control Control Control Control Control Control Control Control
[25] Control Control Control Control Control Control Control Control
[33] Control Control Control Control Control Control Control Control
[41] Control Control Control Control Control Control Control Control
[49] Control Control Control Control Control Control Control Control
[57] Control Control Control Control Control Control Control Control
[65] Control Control Control Control Control Control Control Control
[73] Control Control Control Control Control Control Control Control
[81] Control Control Control Control Control Control Control Control
[89] Control Control Control Control Control Control Control Control
[97] Control Control Control Control Control Control Control Control
[105] Control Control Control Control Control Control Control Control
[113] Control Control Control Control Control Control Control Control
[121] Control Control Control
Levels: Control Treatment
> age <- c(age_clust2, age_clust34)
> gender <- c(gender_clust2, gender_clust34)
> y <- DGEList(counts=all_counts2vs34, group=group)
> y$samples$age <- age
> y$samples$gender <- gender
> print(levels(y$samples$group))
[1] "Control" "Treatment"
> keep <- filterByExpr(y, group=group)
> table(keep)
keep
FALSE TRUE
37622 23061
> y <- y[keep, , keep.lib.sizes=FALSE]
> design <- model.matrix(~ gender + age + group)
> y <- calcNormFactors(y)
> y <- estimateDisp(y, design)
> fit <- glmQLFit(y, design)
> results <- glmQLFTest(fit, coef="groupTreatment")
> topTags(results, n=Inf)
Coefficient: groupTreatment
logFC logCPM F PValue FDR
ENSG00000112297 0.8286224 7.20317222 147.67559 8.599867e-23 1.547711e-18
ENSG00000147394 1.1635487 5.36685490 145.73345 1.342276e-22 1.547711e-18
ENSG00000126549 4.2687167 0.66288401 141.46915 3.609022e-22 2.774255e-18
ENSG00000170961 4.0530816 -0.56015697 139.13043 6.251185e-22 3.603965e-18
Thank you! Because I am quite new to this and I was worried that if something got wrong when I show the results to the crew. It also would effect my next project in grouping samples which might lead to mistakes in further experiments. Again, THANK YOU very much!
For any analysis, it is also good practice to check the standard diagnostic plots for any possible problems as shown in the edgeR workflow examples.