Hi
I'm working with non-model organism, doesn't have any information in the public domain. After denovo assembly using trinity for all 32 samples, using salmon transcript abundance level calculated and further using trinity abundance matrix scripts https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Transcript-Quantification genecount2.matrix were generated for all the datasets. My question is how do i combine all the 32 matrix into one matrix fed into DEseq2/EdgeR packages as an input. From the web I found tximport does the job. To avoid confusion, I put all my quant.sf files in salmon directory (I refered, tximportdata)
dir <- system.file("extdata",package="tximportData")
list.files(dir)
[1] "cufflinks" "ERR188021" "ERR188088"
[4] "ERR188288" "ERR188297" "ERR188329"
[7] "ERR188356" "kallisto" "rsem"
[10] "sailfish" "salmon" "samples_extended.txt"
[13] "samples.txt" "tx2gene.csv"
samples <- read.table(file.path(dir,"samples.txt"),header=TRUE)
files <- file.path(dir,"salmon",samples$run,"quant.sf.gz")
names(files) <- paste0("sample",1:32)
samples <- read.table(file.path(dir,"samples.txt"),header = TRUE)
samples
run treatment replicate
1 9_S5 Model 1
2 10_S9 Model 2
3 11_S13 Model 3
4 12_S17 Model 4
5 13_S21 Model 5
6 14_S26 Model 6
7 15_S30 Model 7
8 16_S2 Model 8
9 17_S6 Salmon 1
10 18_S10 Salmon 2
11 19_S14 Salmon 3
12 20_S18 Salmon 4
13 21_S22 Salmon 5
14 22_S27 Salmon 6
15 23_S31 Salmon 7
16 24_S23 Salmon 8
17 25_S28 Control 1
18 26_S32 Control 2
19 27_S3 Control 3
20 28_S7 Control 4
21 29_S11 Control 5
22 30_S15 Control 6
23 31_S19 Control 7
24 32_S24 Control 8
25 1_S4 Odor 1
26 2_S8 Odor 2
27 3_S12 Odor 3
28 4_S16 Odor 4
29 5_S20 Odor 5
30 6_S25 Odor 6
31 7_S29 Odor 7
32 8_S1 Odor 8
head quant.sf
Name Length EffectiveLength TPM NumReads
TRINITY_DN30552_c0_g1_i1 772 523.000 2.871728 133.000
TRINITY_DN30585_c0_g1_i1 572 323.000 0.769154 22.000
TRINITY_DN30563_c0_g1_i1 516 267.000 444.724836 10515.000
TRINITY_DN30577_c0_g1_i1 1130 881.000 1.358699 106.000
TRINITY_DN30527_c1_g1_i1 366 117.000 3.084007 31.953
TRINITY_DN30527_c0_g1_i1 446 197.000 4.413853 77.000
TRINITY_DN30562_c0_g1_i1 236 16.266 4.165573 6.000
TRINITY_DN30526_c0_g1_i1 384 135.000 1.422029 17.000
TRINITY_DN30543_c0_g1_i1 1384 1135.000 1.094436 110.000
Now, how do i create, tx2gene dataframe, the genome,GTF information is not available for non-model organism I'm currently working with then, how do i generate tx2gene using GenomicFeatures package. Please need suggestions. Here is the gene.counts.matrix I generated using trinity abundance transcript scripts.
8_S1.gene_trans_map_salmon.gene.counts.matrix <==
GeneID 8_S1
TRINITY_DN0_c0_g1 9630.41
TRINITY_DN0_c1_g1 1 73.96
TRINITY_DN100000_c0_g1 76.79
TRINITY_DN100001_c0_g1 71.34
TRINITY_DN100002_c0_g1 73.23
TRINITY_DN100003_c0_g1 87.40
TRINITY_DN100004_c0_g1 31.01
TRINITY_DN100005_c0_g1 70.68
TRINITY_DN100006_c0_g1 13.12
==> 9_S5.gene_trans_map_salmon.gene.counts.matrix <==
GeneID 9_S5
TRINITY_DN0_c0_g1 537.85
TRINITY_DN100000_c0_g1 802.50
TRINITY_DN100001_c0_g1 41.39
TRINITY_DN100002_c0_g1 64.61
TRINITY_DN100003_c0_g1 83.42
TRINITY_DN100004_c0_g1 22.55
TRINITY_DN100005_c0_g1 30.28
TRINITY_DN100006_c0_g1 39.07
TRINITY_DN100007_c0_g1 186.55
How do I generate tx2gene information ? If I had a GTF file, is it possible to generate tx2gene information using GenomicFeatures package ? Some easy way.
Have you read any of the documentation? This is discussed in the function man page and the vignette. Before posting to the support site, it’s assumed that you’ve spent time looking up the documentation (or else why should we bother writing help pages and vignettes)?