hi, thanks in advance I want to ask when should I use voom or trend, because I can not decide the library sizes are quite variable between samples, can you show me the code to detect this,
you once said fpkm differential analysis should use trend=T, robust =T , so should I do cpm about fpkm, if so, how should I prefilter fpkm value, is there a recommendation
and I hear someone said limma will give more false positive compared to edger and deseq2, is that true?
here is the detailed code of trend and voom If the sequencing depth is reasonably consistent across the RNA samples, then the simplest and most robust approach to dierential exis to use limma-trend. This approach will usually work well if the ratio of the largest library size to the smallest is not more than about 3-fold. In the limma-trend approach, the counts are converted to logCPM values using edgeR's cpm function:
> logCPM <- cpm(dge, log=TRUE, prior.count=3)
The prior count is used here to damp down the variances of logarithms of low counts. The logCPM values can then be used in any standard limma pipeline, using the trend=TRUE argument when running eBayes or treat. For example:
> fit <- lmFit(logCPM, design)
> fit <- eBayes(fit, trend=TRUE)
> topTable(fit, coef=ncol(design))
Or, to give more weight to fold-changes in the gene ranking, one might use:
> fit <- lmFit(logCPM, design)
> fit <- treat(fit, lfc=log2(1.2), trend=TRUE)
> topTreat(fit, coef=ncol(design))
Differential expression: voom When the library sizes are quite variable between samples, then the voom approach is theoretically more powerful than limma-trend. In this approach, the voom transformation is applied to the normalized and ltered DGEList object:
v <- voom(dge, design, plot=TRUE)
The voom transformation uses the experiment design matrix, and produces an EList object. It is also possible to give a matrix of counts directly to voom without TMM normalization, by
> v <- voom(counts, design, plot=TRUE)
If the data are very noisy, one can apply the same between-array normalization methods as would be used for microarrays, for example:
> v <- voom(counts, design, plot=TRUE, normalize="quantile")
After this, the usual limma pipelines for dierential expression can be applied, for example:
> fit <- lmFit(v, design)
> fit <- eBayes(fit)
> topTable(fit, coef=ncol(design))
Or, to give more weight to fold-changes in the ranking, one could use say:
> fit <- treat(fit, lfc=log2(1.2))
> topTreat(fit, coef=ncol(design))
"and I hear someone said limma will give more false positive compared to edger and deseq2, is that true?" - As an aside, where did you hear or read this?
Also note that FPKM expression units cannot be used as input to DESeq2, EdgeR, or Limma - I trust that you are not doing this. With no cross-sample normalisation employed when deriving them, FPKM expression units are not suitable for any type of differential expression analysis. FPKMs (and RPKMs) represent a primitive form of RNA-seq normalisation when people were generating cDNA libraries for sequencing on just single samples.
thanks a lot you are right, I am not doing the fpkm for analysis by deseq2 , but a lot published paper in these years, even in 2020, still use limma to do fpkm analysis, and published in sci(IF>=5) , so it maybe suitable in some extent. and I am also wonder if fpkm can be analysised by wilcox to do the differential analysis, will the result be more appropriate than using limma.
my another important question is the trend and voom, how to show the library size difference?
Yes, I have also seen published manuscripts whose data are [in part] based on FPKM expression units. If you are prepared to use these FPKM units and cannot obtain any raw counts, then log [base 2] (FPKM + 0.1) these units and then adopt the limma-trend pipeline: https://bioinformatics-core-shared-training.github.io/cruk-autumn-school-2017/DifferentialExpression/rna-seq-de.nb.html#limma-trend
A previous answer here from Gordon: https://support.bioconductor.org/p/56275/#56299
thanks a lot
yes, I have read the link you post and as it suggests in the past. I want to know the difference between wilcox and limma in FPKM, whether wilcox is better (for example, TCGA data)
and want to know for counts data, limma has voom and trend, not like deseq2 and edger, which just has one method, voom and trend are concered about the sample size difference, but I can not distinguish it.
Kevin asked you where you heard that limma gives more false positives but you have refused to answer him. It is a strange thing to claim, and it may be that you have misunderstood what you heard or read.