hello,
I'm analyzing a RNAseq data set with three different outcomes (`favorable`, `intermediate` and `poor`; no control). I am specifically interested in the expression of certain transcripts (e.g. stat3 transcripts).
I would like to show, that there is a significantly stronger expression (=read counts) of stat3a compared to stat3b between two outcomes.
After trying DEXSeq and cuffdiff, which only give me the comparison of a specific transcript with itself between two conditions, I decided to try and do a t-test on the results from the `salmon` quantification run.
I have used `salmon` to quantify my data using the quasi-alignment method and extracted the results for my two transcripts.
I than read them into R and did a t-test to see if it is significant.
salmon.counts <- read_tsv("stat3.samples.Counts.txt") salmon.counts$ratio <- salmon.counts$ENST00000264657/salmon.counts$ENST00000585517 t.test(subset(salmon.counts, condition=='Favorable')$ratio, subset(salmon.counts, condition=='Poor')$ratio)
the results I get for this test show significance
Welch Two Sample t-test data: subset(tst.pilot, outcome == "Poor")$ratio and subset(tst.pilot, outcome == "Intermediate")$ratio t = -0.85552, df = 5.1434, p-value = 0.4303 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.2329766 0.1158939 sample estimates: mean of x mean of y 0.1984490 0.2569904
I was wondering whether this way is statistically robust or not. If not, is there a better way of analyzing the data.
thanks in advance for any comment or suggestion.
Assa
the quantified table from the salmon output:
sampleID condition ENST00000264657 ENST00000585517 1 Favorable 2505.73 373.75 2 Favorable 2687.13 324.901 3 Favorable 3026.95 533.415 4 Favorable 2381.98 325.676 5 Favorable 2967.1 547.158 6 Favorable 2354.14 443.844 7 Favorable 2836.7 575.74 8 Favorable 2995.65 331.224 9 Favorable 2821 477.53 10 Favorable 3155.98 443.947 11 Intermediate 1779.66 267.906 12 Intermediate 2071.64 190.962 13 Intermediate 2107.06 574.362 14 Intermediate 4554.63 76.4624 15 Intermediate 2885.07 236.034 16 Intermediate 4400.48 69.2131 17 Intermediate 3128.83 421.91 18 Intermediate 2117.58 494.947 19 Intermediate 2197.06 623.131 20 Intermediate 2214.11 681.548 21 Poor 4064.86 231.687 22 Poor 3089.12 309.805 23 Poor 2309.83 553.167 24 Poor 3132.55 238.842 25 Poor 2804 282.656 26 Poor 2719.42 714.62 27 Poor 4029.91 277.442 28 Poor 3562.57 238.041 29 Poor 3688.88 256.918 30 Poor 3881.81 379.808
What are the values in your table, and how were they calculated?
the values are the results of the salmon analysis for each of the two transcripts (=counts, TPM)
I would use TPM values from RSEM or EXPRESS and just do a t-test. I haven't seen many comparisons of two different genes before, there may be better methods out there and I would google for that just incase.