DiffBind Loss and Gain location
1
0
Entering edit mode
Junsik • 0
@8e364780
Last seen 8 months ago
United States

Hi,

I have generated two sets (two tissues) of histone modification CUT&RUN data, each consists of three replicates of WT and three replicates of a mutant.

I successfully made a heatmap using dba.plotProfile and now understand a bit of profileplyr to change some aesthetics.

What I couldn't understand were some details.

  1. In one tissue (data set 1), the plot shows WT first on the left and then the mutant on the right, and for the other tissue (data set 2), it's shown in the opposite (the mutant first, WT second). I found the answer about this and used dba.plotProfile(object, samples=list(first=1:3, second=4:6)) to change the order but I don't understand what these 1:3 and 4:6 indicate. Do these numbers mean the row number on the samplesheet? Or the dba.object's row or column? And I don't know why 'first' and 'second' work although I never used these names in my data set.

  2. Similar question. I don't know how to change gain and loss locations. Sometimes 'gain' appears on top but in the other data set, it's opposite.

  3. Regarding the 2nd question, sometimes 'gain' is gain compared to WT, and sometimes based on the mutant. So I don't know how the program chose which one is a control and a comparison.

By the way - I can't thank enough for your maintaining, answering and posting regarding the program. I somehow googled and found your lecture notes and also very helpful. Thank you so much.

DiffBind • 605 views
ADD COMMENT
1
Entering edit mode
Rory Stark ★ 5.1k
@rory-stark-5741
Last seen 12 days ago
Cambridge, UK

Without seeing your sample sheet or script, I can't give precise answers. I'll assume that you have both Conditions (WT and Mutant) for a single Tissue  in a two separate DBA objects T1and T2. (You can do this with both Tissues in a single object but the model is a bit more complex).

Addressing your third question first:

Sometimes 'gain' is gain compared to WT, and sometimes based on the mutant. So I don't know how the program chose which one is a control and a comparison.

How are you setting up the contrast(s)?   You can control which sample group is the reference group using dba.contrast(). You can use the reorderMeta parameter to establish the reference value:

T1 <- dba.contrast(T1, reorderMeta=list(Condition="WT"))
T1 <- dba.contrast(T1)

Then intervals where with stronger binding in the Mutant will be Gain sites, and those in the WT will be Loss sites. Alternatively, you can set up the contrast explicitly:

T1 <- dba.contrast(T1, contrast=c("Condition", "Mutant", "WT"))

Addressing your other questions:

You can control exactly what sites and samples are included and in what order by passing them explicitly to dba.plotProfile()

The samples parameter takes a specification of which samples to include, and how to group them. The numbers are the are sample numbers when you print out the DBA object. for example,  using the sample data:

> data(tamoxifen_analysis)
> tamoxifen
11 Samples, 2845 sites in matrix:
       ID Tissue Factor  Condition  Treatment Replicate   Reads FRiP
1  BT4741  BT474     ER  Resistant Full-Media         1  652697 0.16
2  BT4742  BT474     ER  Resistant Full-Media         2  663370 0.15
3   MCF71   MCF7     ER Responsive Full-Media         1  346429 0.31
4   MCF72   MCF7     ER Responsive Full-Media         2  368052 0.19
5   MCF73   MCF7     ER Responsive Full-Media         3  466273 0.25
6   T47D1   T47D     ER Responsive Full-Media         1  399879 0.11
7   T47D2   T47D     ER Responsive Full-Media         2 1475415 0.06
8  MCF7r1   MCF7     ER  Resistant Full-Media         1  616630 0.22
9  MCF7r2   MCF7     ER  Resistant Full-Media         2  593224 0.14
10  ZR751   ZR75     ER Responsive Full-Media         1  706836 0.33
11  ZR752   ZR75     ER Responsive Full-Media         2 2575408 0.22

Design: [~Tissue + Condition] | 1 Contrast:
     Factor     Group Samples     Group2 Samples2 DB.DESeq2

The Responsive MCF7 samples are 3:5. It is often easier to use the built-in sample masks. For example, these same samples could be referenced as tamoxifen$masks$MCF7 & tamoxifen$masks$Responsive. The sites will be plotted in the order you specify them (possibly merged). Note that the label you use in the specification doesn't really matter (right now) except they need to be unique.

The sites parameter takes a list of groups of sites you want to include. You can specify these using GRanges objects, or a GRangesList object if you want multiple sets of sites. For example, the dba.report() function returns a GRanges object, so you can use a report to pick out the sites you want in each group. Then the groups are plotted in the order they appear in the GRangesList, top to bottom. If you name each GRanges object in the GRangesList, the name will be used as a label.

To put this all together, consider an example using the sample data:

tamoxifen$config$RunParallel <- TRUE
report <- dba.report(tamoxifen)
gain   <- report[report$Fold > 0,]
loss   <- report[report$Fold < 0,]

profile <- dba.plotProfile(tamoxifen, 
                           samples=list(group1=tamoxifen$masks$Resistant,
                                        group2=tamoxifen$masks$Responsive),
                           sites=GRangesList(Gain=gain,Loss=loss),
                           merge=c(DBA_TISSUE,DBA_REPLICATE))
dba.plotProfile(profile)
ADD COMMENT
0
Entering edit mode

Thank you! This definitely solves my problems.

ADD REPLY

Login before adding your answer.

Traffic: 616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6