I'm working on a genomic plateform and I'm in charge of analysing biological data. A team of researchers asked me to compare two different methods to compute differentially expressed genes in a given dataset, from illumina human RNAseq data, aligned with Tophat2.
My dataset in composed of 12 samples, divided in 4 groups A, B, C and D (each of the groups containing 3 samples).
The first approach consists in using Cufflinks to compute FPKM values, calculate the mean FPKM value for each group, and then comparing the groups by performing a t-test (only if the coverage is > 1 in at least the 3 members of one of the 2 compared groups). So for each gene in each pairwise comparison, I get a p-value and a fold-change corresponding to the ratio of the means. A cutoff of p-value < 0.05 and FC > 1.5 was then applied.
The second approach consists in using the RUVseq method (http://www.bioconductor.org/packages/release/bioc/html/RUVSeq.html), based on a GLM approach. I'm using RUV-g with 11 housekeeping genes and I used the model :
where set_raw_counts is the SeqExpressionSet of the raw counts and controls contains housekeeping genes.
Then for computing the DE genes in A vs B :
But I'm not sure to understand well the output in lrt$table. I got 3 columns, logFC, logCPM and p-value.
I'm sorry this is a recurrent question but how is calculated the logFC and the logCPM ? Is it possible to have the details of the calculation ? How can I make it comparable with the FC I got from the first approach (with FPKM ?) Because when I tried to convert the logFC to FC, this lead to FC with very different orders of magnitude from the FPKM ones.
Thank you very much for your time and help.