Hi all experts,
I am a biology student that started to learn R and NGS analysis and have some basic questions, so please be patient with me. Regarding differential gene expression analysis from RNA-seq experiment, as far as I read, edgeR accept raw count and normalize with TMM method, is it right? However, I read in a paper used edgeR for differential expression analysis, gene fold change calculated as log2 (FPKM treatment / FPKM control), I got confused why the author said "FPKM", could someone please kindly explain me this issue, where does FPKM come from?
For statistical analysis, we need to ensure that all samples are comparable, if box plot shows samples have not a normal distribution, in fact, one of samples stands out from the rest, please let me know if we normal these data before running edgeR analysis?
Thank you in advance
Thank you very much for your complete reply. Regarding boxplot, my mean was to make boxplot of raw count values across genes for each sample. MDS plot in the R package is something like boxplot and can be used for variance evaluation between samples before doing differential expression analysis, am I right?
No. If you're referring to
plotMDS
, this constructs a multidimensional scaling plot, not a boxplot. The MDS plot serves the same function as a PCA plot, i.e., similar samples should cluster together while dissimilar samples should be far apart. This allows you to tell whether the replicates are consistent; whether the treatment conditions have any noticeable effect; and whether there are outlier samples. The diagnostic information that you get from a MDS plot is far more valuable than that from a boxplot of counts for each sample - the latter doesn't really tell you if there are systematic differences in gene expression between samples (as that is confounded by differences in library sizes between samples).