DESeq2: design - 1 ctl, 2 different treated
1
0
Entering edit mode
charlesh • 0
@charlesh-13279
Last seen 5.6 years ago

Hi;
Novice at creating a design for DESEq2

We have 3 conditions each with replicates:

  • CTL(untreated – 4 replicates)
  • Ce (treated w/ cerium – 4 replicates)
  • nCe (treated w/ modified cerium – 4 replcates)

Sequences were off 2 separate machines, and different lanes

We’d like to compare each data set to each other, but really the goal is to identify genes that are DE in the nCe samples compared to all others.  We’d like to control for variation due to sequencers / lanes if possible.

We’ve created the summarizeOverlaps object

se <- summarizeOverlaps(features=ebg, reads=bamfiles, mode="Union",singleEnd=FALSE, ignore.strand=TRUE, fragments=TRUE )

We are contemplating how to set up the design / contrasts.

We’ve read post re: LRT / ANOVA but are still a bit unsure

One idea LRT analysis, controlling for sequencer:

"condition" is defined in the sampleTable (ctl, ce, nce), as is "flowcell" for each sample

dds = DESeq(se, test = "LRT", full=~flowcell + condition, reduced = ~ flowcell)

Would this be an appropriate analysis that would identify genes DE in nCe vs others?

thanks

Charles

deseq2 • 1.5k views
ADD COMMENT
0
Entering edit mode

Thanks Michael - we'll give that a try.

Charles

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 16 hours ago
United States

That works. It will find differences if C is DE relative to A and B, B to A and C, A to B and C, or if they are all distinct. In each of these cases, if you look at one group in particular, say, C, it is DE from at least one other group.

ADD COMMENT
0
Entering edit mode

Michael

I received an update regarding the details of the experiment, and the design now incorporates the fact that some samples had phosphate, others not.

Sample Name Source condition phosphate pH sequencer flowcell lane
CTL1 McGill untr absent 7 D00279 C9VUBANXX 1
CTL2 McGill untr absent 7 D00279 C9VUBANXX 2
CTL3 McGill untr absent 7 D00279 C9VUBANXX 1
CTL4 McGill untr absent 7 D00279 C9VUBANXX 2
CTL1-1 IRIC untr present 7 HWI-ST942 C3DWVACXX 7_8
CTL1-3 IRIC untr present 7 HWI-ST942 C3DWVACXX 7_8
CTL2-3 IRIC untr present 7 HWI-ST942 C3DWVACXX 7_8
Ce-1 IRIC ce present 7 HWI-ST942 C3DWVACXX 7_8
Ce-3 IRIC ce present 7 HWI-ST942 C3DWVACXX 7_8
Ce2 McGill ce absent 7 D00279 C9VUBANXX 2
Ce3 McGill ce absent 7 D00279 C9VUBANXX 2
Ce4 McGill ce absent 7 D00279 C9VUBANXX 1
nCe-1 IRIC nce present 7 HWI-ST942 C3DWVACXX 7_8
nCe-3 IRIC nce present 7 HWI-ST943 C3DWVACXX 7_8

 

We had merged some fastq files for the same biological sample, but they were sequenced on separate lanes (7_8).

Should we split these apart to yield lan7, and lane8 sequences for same sample?

What we would like to do is:

  • identify DE genes in ctl vs Ce
  • identify DE genes in clt vs nCe
  • identify DE genes in Ce vs nCe
  • control for phosphate, sequencer variation

Our initial plan was trun:

dds = DESeq(se, test = "LRT", full=~flowcell + condition, reduced = ~ flowcell)

to add a control for phosphate, would we:

dds = DESeq(se, test = "LRT", full=~flowcell + phosphate + condition, reduced = ~ flowcell + phosphate)

 

thanks

Charles

ADD REPLY
0
Entering edit mode

So I would recommend a different setup if you want to make these comparisons. 

First, you should add together the lanes which represent additional sequencing of the same library, we call these technical replicates. You can use the collapseReplicates() function in DESeq2.

You can use a design of ~phosphate + condition, and then use standard contrasts with the results function, e.g. for your first comparisons it would look like:

dds <- DESeq(dds)
​res <- results(dds, contrast=c("condition","Ce","ctl"))

Then for additional comparisons, you don't rerun DESeq(), just build a new results table:

​res2 <- results(dds, contrast=c("condition","nCe","ctl"))

By the way, If "ctl" stands for control, note that the standard way to represent a fold change is to put control in the denominator, not the numerator, that is, put control at the end of the contrast argument, so you get fold changes of Ce / control.

ADD REPLY
0
Entering edit mode

Thanks again for your advice Michael!

nomenclature: yes 'CTL' does stand for control, thanks for pointing out it needs to be last arg (denominator).

re: technical replicates

Good to know about collapseReplicates().  All the samples listed are biological reps - separate libraries.  For example the library for CTL1-1 was sequenced on 2 lanes, and the fastq's from both lanes were merged.  Is this what you were suggesting - merge technical rep's?  

So, would it be correct then to create the DESeq2 object using data as is, ie no need to merge?

Charles

ADD REPLY
0
Entering edit mode
Yes, no need to merge if you already did so
ADD REPLY
0
Entering edit mode

Running analysis ran OK

design of ~phosphate + condition

dds <- DESeq(dds)

The design was to control for phosphate , however looking at how samples cluster (MDS plot), however shows that not all groups (untr, ce, nce) cluster as desired, ie not all untr cluster together.

 

Biological variability at its best/worst I suspect.

Are there downstream techniques to deal with this?

Is there a legitimate way to evaluate samples for removal from analyses?

Charles

ADD REPLY
0
Entering edit mode

better plot image

ADD REPLY
0
Entering edit mode

I have a recent answer here on when to consider an outlier sample worthy of removal, like this week or last. Basically only if it really stands out from the entire dataset and I usually also look for fastqc type indicators.

ADD REPLY
0
Entering edit mode

Michael

I found the post - thanks!

A: Sample not clustering as expected in DESeq2 

Charles

ADD REPLY

Login before adding your answer.

Traffic: 1098 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6