Question: null model and DEXSeqDataSet object in DEXSEQ
0
6 weeks ago by
gv40
United States
gv40 wrote:

Hi Alejandro and Michael,

Thanks a lot for helping me out with DEXSEQ package. I have some questions regarding it:

Ques1. I was reading DEXSEQ.pdf and the explanation of null model ( ∼ sample + exon ) vs alt model (∼ sample + exon + condition:exon)

Under null model, what is the hypothesis? Is it that exon/counting bin counts does not depend on condition? What does alternate hypothesis mean here? I do not understand this part and if possible can you explain me this:

The two models described by these formulae are fit for each counting bin, where the data supplied to the fit comprise two read count values for each sample, corresponding to the two levels of the exon factor: the number of reads mapping to the bin in question (level this), and the sum of the read counts from all other bins of the same gene (level others).


Also is it at this step testForDeu where exonic counts are adjusted for changes in gene expression?

Ques2. I have generated from STAR aligner, a count matrix for splice junction reads for 2 treatment conditions: control and knockout. This is how it looks

gene    c1  c2  kd1 kd2
g1_chrVIII_33663_33698_2    0   0   0   0
g2_chrVIII_326943_327029_2  0   0   0   0
g3_chrVIII_129529_129644_1  0   3   0   0
g3_chrVIII_129529_129647_1  123 139 148 217
g4_chrVIII_400482_400648_2  0   0   0   0
g4_chrVIII_400482_400850_2  0   0   0   0
g5_chrVIII_432447_432483_1  0   0   0   0
g6_chrVIII_428459_428647_2  0   0   0   0
g7_chrVIII_119009_119035_2  0   0   0   0
g8_chrVIII_185267_185575_2  0   0   0   0
g9_chrVIII_148317_148666_2  0   0   0   0
g10_chrVIII_251156_251270_1 0   0   0   0
g10_chrVIII_251156_251258_1 5   10  3   10
g10_chrVIII_251156_251458_1 0   1   2   1
g10_chrVIII_251156_251248_1 186 189 223 233
g10_chrVIII_251156_251224_1 4   0   2   0


I want to look for differential splice junction usage in these 2 conditions?

a. Can I use DESeq2 directly on this count matrix?? I think if I use this, I am not taking into consideration changes in the gene expression between 2 conditions.

b. I would want to use DEXSEQ on the splice junction counts matrix but then how can I make the DEXSeqDataSet object since now let's say if I have 2 conditions with 2 reps in the above example, then column 5,6,7,8 should be the total gene count. This is generally added when I use DEXSeqDataSet on my exon count matrix but this is just splice junction count matrix. In other words I want to create DEXSeqDataSet object where in addition to above 4 columns I have 4 columns more which are basically the gene counts for that junction in question for sample c1,c2,kd1,kd2. I have the genecounts for every sample in another file. How can I then make a DEXSeqDataSet object??

Hope to hear from you guys and thanks for all the help.

dexseq deseq2 • 86 views
modified 6 weeks ago • written 6 weeks ago by gv40
Answer: C: null model and DEXSeqDataSet object in DEXSEQ
1
6 weeks ago by
Alejandro Reyes1.7k
Dana-Farber Cancer Institute, Boston, USA
Alejandro Reyes1.7k wrote:

Under null model, what is the hypothesis? Is it that exon/counting bin counts does not depend on condition?

Yes, exactly.

What does alternate hypothesis mean here?

That the counts for an exon are dependent on the condition.

I do not understand this part and if possible can you explain me this

given an exon, each sample contributes two counts. One corresponding to the counts of that exon (level 'this'), and one corresponding to the sum of counts of the other exons of the same gene (level 'other'). You can think of it as ratios: does the this/others ratio changes between conditions?

Also is it at this step testForDeu where exonic counts are adjusted for changes in gene expression?

Note that the model has 'sample' as a predictor, that accounts for sample specific contributions, including gene expression differences between samples.

Can I use DESeq2 directly on this count matrix?? I think if I use this, I am not taking into consideration changes in the gene expression between 2 conditions

If you are interested in differential splicing, I don't reccommenf this. If a gene is differentially expressed, all it's junctions will be detected as differential. Not sure this is what you want.

How can I then make a DEXSeqDataSet object??

You can supply the gene count data using the parameters 'alternativeCountData' in the DEXSeqDataSet function.

Thanks a lot Alejandro for making it clear. Regarding making DEXSeqDataSet, my alternativeCountData which is a count matrix of GENES in my case should have same number of rows as my initial count matrix(which is junction counts)? And the names of the rows should be the same in 2 count matrix?

You need two matrices, countData and alternativeCountData:

Let's say that for a gene you have 5 junctions for a gene and 4 samples, countData would look like this

1 0 2 3
2 0 1 0
0 9 4 2
2 2 3 3
2 0 1 0


alternativeCountData should look like this:

20 30 10 20
20 30 10 20
20 30 10 20
20 30 10 20
20 30 10 20


where the gene counts are repeated for each junction