Question

Differential expression of RNA seq data at gene level (for Beginner)

0

Entering edit mode

islamshaikhul2014 • 0

@islamshaikhul2014-14699

Last seen 6.3 years ago

Hello,

At first, I must mention that I am a beginner in bioinformatics. I am really sorry since my questions may seem really basic.

I am trying to conduct the differential gene expression analysis of my RNAseq data. I am conducting my experiment on Transcriptome analysis of early phase of chlorosis in transgenic tobacco (N. tabaccum) plants. We want to analyze the transcriptional changes early after Dex-treatment (dexamethasone - an inducible promoter) in tobaccos and compare with other transcriptome analyses in virus-infected plants. In this case, I am planning to use DESeq2.

DESeq2 requires count data in the form of a rectangular table of integer values. Now, how can I prepare this table from my RNAseq data? I have read your article "Differential expression of RNA-Seq data at the gene level {the DESeq package)". There you have suggested to use the summarizeOverlaps function of Bioconductor software. I have tried to read the summarizeOverlaps.pdf file; but could not understand it properly.

I would really appreciate any suggestions regarding how to start using DESeq2 for analyzing my RNAseq data.

Thank You.

Shaikhul.

deseq2 differential gene expression bioconductor • 1.2k views

ADD COMMENT • link updated 6.3 years ago by arfranco ▴ 130 • written 6.3 years ago by islamshaikhul2014 • 0

score 1 · Answer 1 · 2017-12-31

After obtaining the mapped files you can calculate the counts either by using featureCounts or htseq-count by help of DESeqDataSetFromHTSeq. Later you can prepare count files on basis of condition (treated and untreated).

Example:

library(DESeq2)
directory <- "/directory-path/"
sampleFiles <- grep("treated",list.files(directory),value=TRUE)
sampleCondition<-c("treated","treated","untreated","untreated")
sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design= ~ condition)
colData(ddsHTSeq)$condition <- factor(colData(ddsHTSeq)$condition, levels=c("untreated","treated"))
dds<-DESeq(ddsHTSeq)
res<-results(dds)
rld = rlogTransformation(dds)
sum( abs(res$log2FoldChange) >= 0.5 & res$pvalue < 0.05, na.rm=TRUE )

score 0 · Answer 2 · 2017-12-31

You can do two similar approaches

1. Map your RNA reads to a reference genome or transcriptome to eventually get a BAM file. This can be accomplished with TopHat, STAR, HISAT ant the like, and need its time and computer power. With this BAM file and with the help of a gtf or gff annotation file, along with programs like htseq-count or featurecounts (under R), you can get easily the tabular data you need

2. Do the mapping using the a reference cds fasta file using a pseudoaligner such as KALLISTO, SALMON, and the like. This has same adventages, like that the mapping is accomplished in minutes and using a regular computer without big resources. You obtain three files after each of the mapping that can be introduced into the DESeq2 pipeline with the help of the tximport R program. You have enough information into the DESeq2 vignette or help file to accomplish this

Some published papers see advantageous the using of the second approach