Question

How to convert TMM RNA-seq file (generated by edgeR) to list of differentially expressed genes

0

Entering edit mode

hamidrezarazzaghian • 0

@hamidrezarazzaghian-9208

Last seen 2.5 years ago

Canada

Hi,

I have a TMM (trimmed mean of M values) CSV file of whole RNA sequencing (generated by edgeR package) with two groups (group A with 27 samples and group B with 18 samples). Each samples is in one column and each gene is in one row and all of the columns have header. This file include the list of the genes (in one column) plus values for each gene in each sample. Using this file in R, I want to get the list of differentially expressed genes in group B compared to group A and with correction for multiple testing with Benjamini-Hochberg method (FDR <0.05). The output file should have log fold change, p value and adjusted p value for each of the differentially expressed genes.

I'm wondering can anyone please share the code for this procedure with me?

Here is how the data look like (samples 1 to 27 belong to group A and samples 28 to 45 belong to group B):

gene | sample1 | sample2 | sample3 |............| sample 27 | sample 28 |........| sample 45 |
TSPAN | -2.994 | -0.651 | 0.274 |............| 2.352 | 1.523 |.........| -2.486 |
LAP3 | 3.545 | -1.545 | 2.450 |............| 1.298 | -1.476 |.........| 1.987 |
ALS2 | -1.910 | -2.224 | -1.720 |............| -1.758 | 1.368 |.........| 2.154 |

edgeR DifferentialExpression TMM RNASeq • 929 views

ADD COMMENT • link updated 2.5 years ago by Gordon Smyth 50k • written 2.5 years ago by hamidrezarazzaghian • 0

0

Entering edit mode

Is this normalized data not suitable for use with edgeR?

ADD REPLY • link 2.5 years ago swbarnes2 ★ 1.3k

score 0 · Answer 1 · 2021-10-09

I'm the senior author of the edgeR package but I don't know what you mean by a "TMM csv file". TMM normalizes the library sizes rather than individual expression values. Do you perhaps mean that you have used edgeR's cpm function to generate log-CPM values?

Anyway, edgeR analyses read counts rather than cpm values. Just follow one of the sample workflows, for example:

Chen Y, Lun ATL, Smyth GK (2016). From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline. F1000Research 5, 1438.
RnaSeqGeneEdgeRQL Workflow

or else follow the edgeR User's Guide.

Personally I use Rsubread::align followed by Rsubread::featureCounts to generate counts as input to edgeR.

If you really only have log-CPM values and not the original counts, then the limma-trend pipeline could be used instead of edgeR to get DE genes.