How to convert TMM RNA-seq file (generated by edgeR) to list of differentially expressed genes
1
0
Entering edit mode
@hamidrezarazzaghian-9208
Last seen 6 days ago

Hi,

I have a TMM (trimmed mean of M values) CSV file of whole RNA sequencing (generated by edgeR package) with two groups (group A with 27 samples and group B with 18 samples). Each samples is in one column and each gene is in one row and all of the columns have header. This file include the list of the genes (in one column) plus values for each gene in each sample. Using this file in R, I want to get the list of differentially expressed genes in group B compared to group A and with correction for multiple testing with Benjamini-Hochberg method (FDR <0.05). The output file should have log fold change, p value and adjusted p value for each of the differentially expressed genes.

I'm wondering can anyone please share the code for this procedure with me?

Here is how the data look like (samples 1 to 27 belong to group A and samples 28 to 45 belong to group B):

1. gene | sample1 | sample2 | sample3 |............| sample 27 | sample 28 |........| sample 45 |
2. TSPAN | -2.994 | -0.651 | 0.274 |............| 2.352 | 1.523 |.........| -2.486 |
3. LAP3 | 3.545 | -1.545 | 2.450 |............| 1.298 | -1.476 |.........| 1.987 |
4. ALS2 | -1.910 | -2.224 | -1.720 |............| -1.758 | 1.368 |.........| 2.154 |
edgeR DifferentialExpression TMM RNASeq • 140 views
0
Entering edit mode

Is this normalized data not suitable for use with edgeR?

0
Entering edit mode
@gordon-smyth
Last seen 8 hours ago
WEHI, Melbourne, Australia

I'm the senior author of the edgeR package but I don't know what you mean by a "TMM csv file". TMM normalizes the library sizes rather than individual expression values. Do you perhaps mean that you have used edgeR's cpm function to generate log-CPM values?

Anyway, edgeR analyses read counts rather than cpm values. Just follow one of the sample workflows, for example:

or else follow the edgeR User's Guide.

Personally I use Rsubread::align followed by Rsubread::featureCounts to generate counts as input to edgeR.

If you really only have log-CPM values and not the original counts, then the limma-trend pipeline could be used instead of edgeR to get DE genes.