Hi, I am trying to do comparative RNA-seq analysis with DESeq2.
My purposes are: 1. combine transcripts into genes 2. detect gene expression difference under different conditions 3. obtain a summarized gene expression table
- combine transcripts into genes I basically followed these commands: https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#kallisto
library(tximport) txi <- tximport(files, type = "salmon", tx2gene = tx2gene) names(txi) library(tximportData) dir <- system.file("/Users", package = "tximportData") samples <- read.table(file.path("/Users", "samplelist.csv"), header = TRUE) tx2gene1 <- read.csv(file.path("/Users", "transcript_gene.csv"), header = TRUE) all(file.exists(files)) files <- file.path("/Users", "kallisto", samples$sample, "abundance.tsv") names(files) <- samples$sample library(tximportData) txi <- tximport(files, type = "kallisto", tx2gene = tx2gene1)
Is that a valid way? I found the sum of the gene expression of each sample is largely diverse.
- detect gene expression difference under different conditions I followed this: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-un-normalized-counts
Since it describes that "As input, the DESeq2 package expects count data as obtained, e.g., from RNA-seq or another high-throughput sequencing experiment, in the form of a matrix of integer values", I used "txt" above as input without normalization.
dds <- DESeqDataSetFromMatrix(countData = txi, colData = colData, design = ~condition) result <- results(dds, contrast=c("condition","conditionA","conditionB"))
Is that a valid way to do it?
- obtain a summarized gene expression table
dds <- estimateSizeFactors(dds) normalized <-counts(dds, normalized=TRUE)
However, I found the sum of the gene expression of each sample is largely diverse. Something like: (two samples for condition A and B)
sample A1 A2 B1 B2 gene1 10 11 1 1 gene2 20 19 2 1 gene3 30 32 3 4 gene4 40 38 4 4
It looks odd and I wonder how to obtain a gene expression table of all the samples, which are normalized by both gene length and between-sample bias. Or, is there a significant gene expression depletion in sample B1 and B2 overall?
Thank you for your help.