I am analyzing mRNA-Seq dataset using
EdgeR package and testing filtering by
rowSums that would keep genes. I have question about interpreting Histogram of average log2 CPM in EdgeR?
I tested filtering in 4 different ways, and would like to know how to interpret the plot? Basically,
filterByExpr looks good, however, I am interested in creating model.matrix on other variables too like treatment, severity, etc., for comparisons. How to do I decide the cut-off in perhaps
rowsums? What does the negative values in the x-axis signifies? Should the graph look like bell shaped distribution?
Thank you in advance.
Best Regards, Toufiq
dge <- DGEList(counts = Counts, remove.zeros = TRUE) dge$samples # Either; ## filterByExpr keep <- filterByExpr(dge, design). ## ## Pairing and blocking is essential for comparison as different cells are extracted from same subjects table(keep.keep) ## (OR) ## Filtering to remove low counts keep <- rowSums(dge$counts) >= 10 ## (OR) ## Filtering to remove low counts <- rowSums(dge$counts) >= 50 dge <- dge[keep, , keep.lib.sizes=FALSE] dge$counts dim(dge$counts) AveLogCPM <- aveLogCPM(dge) hist(AveLogCPM)