For comparimg 4 or more tissue types using DRIMSeq
1
0
Entering edit mode
@rohitsatyam102-24390
Last seen 47 minutes ago
India

Hi Everyone!!

I am analyzing 4 tissue types obtained from the 4 different layers of a plant, sampled at 4 different time points ( days after flowering (daf). Each tissue has 2-3 biological replicate. The samples are grouped as follows

sample_id group
cellArep116daf A16daf
cellArep216daf A16daf
cellArep316daf A16daf
cellBrep116daf B16daf
cellBrep216daf B16daf
cellBrep14daf B4daf
cellBrep24daf B4daf
cellBrep18daf B8daf
cellNrep116daf N16daf
cellNrep216daf N16daf
cellNrep316daf N16daf
cellNrep14daf N4daf
cellNrep24daf N4daf
cellNrep34daf N4daf
cellNrep18daf N8daf
cellNrep28daf N8daf
cellNrep38daf N8daf
cellOrep116daf O16daf
cellOrep14daf O4daf
cellOrep24daf O4daf
cellOrep34daf O4daf
cellOrep18daf O8daf
cellOrep28daf O8daf

I wish to perform differential transcript usage analysis to understand how they differ between different time points for same cell type (say differences between B4daf vs B8daf and so on) and then for same time point but different cell types (B16daf vs A16daf). I am using DRIMSeq for that purpose.

I came across this post wherein the section Create the design matrix without intercept due to no clear 'ground state is somewhat unclear to me (the phrase itself is unclear to me). For a scenario like mine where there is no control case condition, how removing the intercept from the design matrix will affect my analysis, and is it desirable to do so because here it is mentioned that we should never remove the intercept term.

I am confused as to how should I go about it. Also setting one_way = FALSE would be appropriate in my case and how should I set contrast if I am comparing 4 tissues at a time harvested on the same day (say 16daf i.e A16daf vs B16daf vs N16daf vs O16daf)?

#DRIMSeq #DTU #contrast • 1.1k views
ADD COMMENT
0
Entering edit mode

So I myself tried comparing the results obtained with/without intercept on my data and I found more genes with differential transcript usage while using intercept. Also, I observed more genes having significant p-values when using intercept. I used the following code:

    set.seed(1)
    d_with_intercept <- dmPrecision(d, design=design_full)
    ###design_null <- model.matrix(~ 1, data=DRIMSeq::samples(d))
    d_with_intercept <- dmFit(d_with_intercept, design = design_full, verbose = 1)

    my.contrasts_with_intercept <- makeContrasts(
        A.vs.B.16daf = Intercept-B16daf,
        A.vs.N.16daf = Intercept-E16daf,
        levels=design_full)
    d_with_intercept <- dmTest(d_with_intercept, contrast = my.contrasts_with_intercept[,"A.vs.N.16daf"], verbose = 1)
    plotPValues(d_with_intercept, level = "feature")
    plotPValues(d_with_intercept, level = "gene")

And without intercept

design_without_intercept <- model.matrix(~0+group, data=DRIMSeq::samples(d))
design_without_intercept
colnames(design_without_intercept) <- gsub('group', '', colnames(design_without_intercept))
colnames(design_without_intercept)
set.seed(1)
d_without_intercept <- dmPrecision(d, design=design_without_intercept)
###design_null <- model.matrix(~ 1, data=DRIMSeq::samples(d))
d_without_intercept <- dmFit(d_without_intercept, design = design_without_intercept, verbose = 1)

my.contrasts_without_intercept <- makeContrasts(
    A.vs.B.16daf = A16daf-B16daf,
    A.vs.N.16daf = A16daf-N16daf,
    levels=design_without_intercept)

d_without_intercept <- dmTest(d_without_intercept, contrast = my.contrasts_without_intercept[,"A.vs.N.16daf"], verbose = 1)

Will it be worthwhile to go with the intercept approach simply because it gives us more genes with differential transcript usage? In case yes, how do I explain this discrepancy?

enter image description here

ADD REPLY
4
Entering edit mode
@gosia-nowicka-9493
Last seen 9 months ago
Basel

Hello,

the set up of the design matrix (with/without intercept) and contrast depends on the question in hand. Based on my experience, I follow this strategy. Whenever I have a study with a reference group and then one or multiple other groups that I want to compare against that reference group, I use design with intercept and I test the remaining coefficients whether they are equal to zero. In case I want to do different pair-wise comparisons I find easier to interpret the design without intercept and setting up contrasts for each pair-wise comparison.

To illustrate the difference between the two designs with and without intercept I have prepared some plots for a case with 3 groups :D They show you the interpretation of the coefficients that are estimated with those two designs.

31DopS.md.jpg 31Dnv2.md.jpg

Just as a reminder the reference level is set up to the first level in your group variable which should be a factor. Let's assume it is A16daf.

With the design with intercept design_full <- model.matrix(~ 1 + group, data=DRIMSeq::samples(d)) if you want to test for the difference between B16daf and A16daf, you need to test only the B16daf coefficient.

my.contrasts_with_intercept <- makeContrasts(
    A.vs.B.16daf = B16daf,
    A.vs.N.16daf = E16daf,
    levels=design_full)

The comparison where you have a contrast Intercept-B16daf is not quite correct for what you want to ask... You compare the size of the blue and orange rectangle, which is asking if the transcript usage in group B is twice the usage in group A.

With the design without intercept design_full <- model.matrix(~ -1 + group, data=DRIMSeq::samples(d)) if you want to test for the difference between B16daf and A16daf, you need to set a contrast:

my.contrasts_without_intercept <- makeContrasts(
    A.vs.B.16daf = A16daf - B16daf,
    A.vs.N.16daf = A16daf - E16daf,
    levels=design_full)

You can define as many contrasts as you want for any pair-wise comparison you are interested in.

Let me know if you have any further questions.

ADD COMMENT
0
Entering edit mode

Wow!! Thanks for your input. I do have one more concern. It seems as if the DRIMSeq doesn't allow for a more complex design formula like the ones we use in the DESeq. for instance ~0+group+time+group:time.

I tried rerunning DRIMSeq using the time (in days after flowering) as an additional variable which I saved in the samples table column library_layout. So my sample table became:

sample_id group libray_layout
cellArep116daf A 16_daf
cellArep216daf A 16daf
cellArep316daf A 16_daf
cellBrep116daf B 16_daf
cellBrep216daf B 16_daf
cellBrep14daf B 4_daf
cellBrep24daf B 4_daf
cellBrep18daf B 8_daf
cellNrep116daf N 16_daf
cellNrep216daf N 16_daf
cellNrep316daf N 16_daf
cellNrep14daf N 4_daf
cellNrep24daf N 4_daf
cellNrep34daf N 4_daf
cellNrep18daf N 8_daf
cellNrep28daf N 8_daf
cellNrep38daf N 8_daf
cellOrep116daf O 16_daf
cellOrep14daf O 4_daf
cellOrep24daf O 4_daf
cellOrep34daf O 4_daf
cellOrep18daf O 8_daf
cellOrep28daf O 8_daf

samples <- data.frame(sample_id = sampleTable$Samplename, group = sampleTable$Sampletype, library_layout = sampleTable$daf)
design_full <- model.matrix(~ 0 + group + library_layout, data=DRIMSeq::samples(d))
d <- dmDSdata(counts=counts, samples= samples)
colnames(design) <- gsub('group', '', colnames(design))
colnames(design)
set.seed(1)
d <- dmPrecision(d, design=design)
d <- dmFit(d, design = design, verbose = 1)

While precision calculation step I get a long array of produceNAs and with an error in the end:

NaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedError in optimHess(par = par, fn = dmlikregG, gr = dmscoreregG, x = x, : non-finite value supplied by optim

ADD REPLY

Login before adding your answer.

Traffic: 554 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6