Differential expression of RNA seq data at gene level (for Beginner)
Entering edit mode
Last seen 5.1 years ago


At first, I must mention that I am a beginner in bioinformatics. I am really sorry since my questions may seem really basic.

I am trying to conduct the differential gene expression analysis of my RNAseq data. I am conducting my experiment on Transcriptome analysis of early phase of chlorosis in transgenic tobacco (N. tabaccum) plants. We want to analyze the transcriptional changes early after Dex-treatment (dexamethasone - an inducible promoter) in tobaccos and compare with other transcriptome analyses in virus-infected plants. In this case, I am planning to use DESeq2.

DESeq2 requires count data in the form of a rectangular table of integer values. Now, how can I prepare this table from my RNAseq data? I have read your article "Differential expression of RNA-Seq data at the gene level {the DESeq package)". There you have suggested to use the summarizeOverlaps function of Bioconductor software. I have tried to read the summarizeOverlaps.pdf file; but could not understand it properly. 

I would really appreciate any suggestions regarding how to start using DESeq2  for analyzing my RNAseq data.

Thank You.


deseq2 differential gene expression bioconductor • 916 views
Entering edit mode
Ram ▴ 20
Last seen 5.1 years ago

After obtaining the mapped files you can calculate the counts either by using featureCounts or htseq-count by help of DESeqDataSetFromHTSeq. Later you can prepare count files on basis of condition (treated and untreated).


directory <- "/directory-path/"
sampleFiles <- grep("treated",list.files(directory),value=TRUE)
sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design= ~ condition)
colData(ddsHTSeq)$condition <- factor(colData(ddsHTSeq)$condition, levels=c("untreated","treated"))
rld = rlogTransformation(dds)
sum( abs(res$log2FoldChange) >= 0.5 & res$pvalue < 0.05, na.rm=TRUE )
Entering edit mode
arfranco ▴ 130
Last seen 20 hours ago
European Union

You can do two similar approaches

1. Map your RNA reads to a reference genome or transcriptome to eventually get a BAM file. This can be accomplished with TopHat, STAR, HISAT ant the like, and need its time and computer power. With this BAM file and with the help of a gtf or gff annotation file, along with programs like htseq-count or featurecounts (under R), you can get easily the tabular data you need

2. Do the mapping using the a reference cds fasta file using a pseudoaligner such as KALLISTO, SALMON, and the like. This has same adventages, like that the mapping is accomplished in minutes and using a regular computer without big resources. You obtain three files after each of the mapping that can be introduced into the DESeq2 pipeline with the help of the tximport R program. You have enough information into the DESeq2 vignette or help file to accomplish this

Some published papers see advantageous the using of the second approach 


Login before adding your answer.

Traffic: 349 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6