How to find Top 10 expressed genes in DESeq2
2
0
Entering edit mode
melkile26 • 0
@melkile26-16646
Last seen 5.8 years ago

I am a newbie to RNAseq analysis. Briefly, I mapped the RNA-seq reads to the reference genome using STAR and I used featureCounts to quantify the reads mapped to the reference genes and found the gene counts. I removed rows that have no counts, or only a single count across all samples using the code below as it is mentioned in RNA-seq workflow

dds <- dds[ rowSums(counts(dds)) > 1, ]

Then I run the function DESeq on the raw counts.

dds <- DESeq(dds)

I would like to extract top 10 expressed genes and then I sorted the result based on baseMean and found the following genes. Is it the right approach to extrct the top expressed genes?

baseMean    log2FoldChange  lfcSE   stat    pvalue  padj LUKE00000033088    24608.95    2.492   0.481   5.180   2.22038382101822E-07    5.88213544449403E-05 LUKE00000005348    23004.02    -1.559  0.366   -4.265  1.9950729451741E-05 0.001090627809135 LUKE00000038686   17687.10    -2.207  0.368   -5.993  2.06555823272055E-09    2.01779219858888E-06 LUKE010978756  9213.70 -2.064  0.354   -5.826  5.66036709013022E-09    4.21293036279692E-06 LUKE00000004836    8685.09 -1.576  0.378   -4.171  3.03873604272807E-05    0.001294153796944 LUKE00000002600   8268.26 -1.522  0.427   -3.565  0.000363547763205   0.005474230769653 LUKE010993945 7758.28 -1.873  0.495   -3.786  0.000153046441589   0.003240726698392 LUKE00000032342   7452.12 1.966   0.480   4.098   4.16481516975401E-05    0.001557322035963 LUKE00000003460   6717.35 -1.686  0.392   -4.304  1.68102625403878E-05    0.000991488315118 LUKE00000018131   6508.93 -1.722  0.402   -4.280  1.86954821598158E-05    0.00106646126335

Is it the right approach to extract the top expressed genes?

deseq2 baseMean differential gene expression rnaseqgene • 8.6k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 1 day ago
United States

We have some code in the workflow:

https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html

Basically:

head(res[order(res$padj),], 10)
ADD COMMENT
0
Entering edit mode

Dear Michael,

Thanks for your replay,

Are you recommending to extract the top expressed genes based on adjusted p-value rather than sorting baseMean?

Best regards,

Mel

ADD REPLY
0
Entering edit mode

Oh, I missed that you want to rank by expression. You should really use TPM for this, rather than counts. Counts are not proportional to expression across genes.

ADD REPLY
0
Entering edit mode

Is this because baseMean doesn't take into account the differences in library sizes between samples

ADD REPLY
1
Entering edit mode
swbarnes2 ★ 1.4k
@swbarnes2-14086
Last seen 1 day ago
San Diego

This will give you the genes with the highest number of counts, but for some library preps, count numbers for a gene can be related to the length of the transcript, so you would want to correct for that if it was a factor with your library prep.

ADD COMMENT

Login before adding your answer.

Traffic: 371 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6