Hi,
I have RSEM output genes result file containing gene_id
, effective_length
, transcript_id
, expected_count
, TPM
, and FPKM
data values. My understanding is that DESeq2
can work with expected counts as output by RSEM, then normalize, and perform differential gene expression analysis. Does creating a combined expected counts only csv file (all 18 samples together) > rounding expected counts
and importing in DESeqDataSetFromMatrix'
should work? (OR)
library("DESeq2")
library("tximport")
library("readr")
library("tximportData")
Data <- "/Users/Documents/Projects/"
txi <- tximport(files = Data, type = "rsem", txIn = FALSE, txOut = FALSE)
ddsTxi <- DESeqDataSetFromTximport(txi,
colData = samples,
design = ~ condition)
reading in files with read_tsv
1 ### Fails to run; there are 18 samples
Now, through import from tximport
package using individual sample (18 files): I tried using tximport
, but unsuccessful. But I could import via below option with only one sample file:
txi_TEST <- tximport(files = "/Users/Documents/Projects/Sample_1.genes.results.txt",
type = "rsem",
txIn = FALSE,
txOut = FALSE,
countsFromAbundance = "scaledTPM")
reading in files with read_tsv
1
Warning message:
In computeRsemGeneLevel(files, importer, geneIdCol, abundanceCol, :
countsFromAbundance other than 'no' requires transcript-level estimates
Sample_1.genes.results.txt
Sample_2.genes.results.txt
Sample_3.genes.results.txt
Sample_4.genes.results.txt
Sample_5.genes.results.txt
etc.,
Here is the example of the one sample:
structure(list(gene_id = c("ENSG00000000003", "ENSG00000000005",
"ENSG00000000419", "ENSG00000000457", "ENSG00000000460", "ENSG00000000938"
), transcript_id.s. = c("ENST00000373020,ENST00000494424,ENST00000496771,ENST00000612152,ENST00000614008",
"ENST00000373031,ENST00000485971", "ENST00000371582,ENST00000371584,ENST00000371588,ENST00000413082,ENST00000466152,ENST00000494752",
"ENST00000367770,ENST00000367771,ENST00000367772,ENST00000423670,ENST00000470238",
"ENST00000286031,ENST00000359326,ENST00000413811,ENST00000459772,ENST00000466580,ENST00000472795,ENST00000481744,ENST00000496973,ENST00000498289",
"ENST00000374003,ENST00000374004,ENST00000374005,ENST00000399173,ENST00000457296,ENST00000468038,ENST00000475472"
), length = c(2211.19, 940.5, 1071.41, 4571.09, 3321.38, 2238.22
), effective_length = c(2045.08, 774.39, 905.3, 4404.98, 3155.27,
2072.11), expected_count = c(615.84, 0, 1712, 455.05, 224.03,
1446), TPM = c(9.81, 0, 61.63, 3.37, 2.31, 22.74), FPKM = c(9.62,
0, 60.42, 3.3, 2.27, 22.29)), row.names = c(NA, 6L), class = "data.frame")
#> gene_id
#> 1 ENSG00000000003
#> 2 ENSG00000000005
#> 3 ENSG00000000419
#> 4 ENSG00000000457
#> 5 ENSG00000000460
#> 6 ENSG00000000938
#> transcript_id.s.
#> 1 ENST00000373020,ENST00000494424,ENST00000496771,ENST00000612152,ENST00000614008
#> 2 ENST00000373031,ENST00000485971
#> 3 ENST00000371582,ENST00000371584,ENST00000371588,ENST00000413082,ENST00000466152,ENST00000494752
#> 4 ENST00000367770,ENST00000367771,ENST00000367772,ENST00000423670,ENST00000470238
#> 5 ENST00000286031,ENST00000359326,ENST00000413811,ENST00000459772,ENST00000466580,ENST00000472795,ENST00000481744,ENST00000496973,ENST00000498289
#> 6 ENST00000374003,ENST00000374004,ENST00000374005,ENST00000399173,ENST00000457296,ENST00000468038,ENST00000475472
#> length effective_length expected_count TPM FPKM
#> 1 2211.19 2045.08 615.84 9.81 9.62
#> 2 940.50 774.39 0.00 0.00 0.00
#> 3 1071.41 905.30 1712.00 61.63 60.42
#> 4 4571.09 4404.98 455.05 3.37 3.30
#> 5 3321.38 3155.27 224.03 2.31 2.27
#> 6 2238.22 2072.11 1446.00 22.74 22.29
Created on 2022-11-30 with [reprex v2.0.2](https://reprex.tidyverse.org)
Thank you in advance for your help.
Thank you,
Toufiq
Dear ATpoint and @james-w-macdonald-5106
Thank you. I used the setting because I have the RSEM sample.genes.results files can be imported by setting
type = "rsem", txIn = FALSE, txOut = FALSE
. As pointed in RSEM via tximportenter code here
I have RSEM output genes result file containing gene_id, effective_length, transcript_id, expected_count, TPM , and FPKM data values. My understanding was that
DESeq2
can work with output by RSEM, then normalize, and perform differential gene expression analysis. Initially, I thought of creating a combined expected counts file (all samples together) > rounding them and importing in DESeqDataSetFromMatrix', however, I learnt thattximport
supports data output files fromRSEM
, I thought I will go according to this library.I could execute using
tximport
(OR)tximeta
overcoming a couple of challenges using the below steps using 2 approaches. I think I would go with the first approach.