Hi All,

I am trying to collapse technical replicates in DESeq2. Already had a look at the manual, but still is not clear to me how to do it. I did try run it but got some errors, need to understand how it works properly.

If I want to collapse A1 with A1.1, B1 with B1.1, C1 and C1.1 , and D2 with D2.1

dds <- DESeqDataSetFromMatrix(
countData = countdata,
colData = coldata,
design = ~ Subject + Treatment)
dds

> coldata
Subject Treatment Time
A1         1        35    1
A1.1       1        35    1
A2         2        35    1
A3         3        35    1
A4         4        35    1
A5         5        35    1
B1         1        25    1
B1.1       1        25    1
B2         2        25    1
B4         4        25    1
B5         5        25    1
C1         1        35   24
C1.1       1        35   24
C2         2        35   24
C3         3        35   24
C4         4        35   24
C5         5        35   24
D2         2        25   24
D2.1       2        25   24
D4         4        25   24
D5         5        25   24
>

dds$Subject <- factor(sample(paste0("Subject",rep(1:22, c(1,1,2,3,4,5,1,1,2,3,4,5,1,1,2,3,4,5,2,2,4,5)))))?? dds$run <- paste0("run",1:??)

ddsColl <- collapseReplicates(dds, dds$Subject, dds$run)

From the example in the manual: paste0("run",1:12), means now there are 12 rows in the coldata?

## Collapse replicates in manual

dds <- makeExampleDESeqDataSet(m=12)

# make data with two technical replicates for three samples
dds$sample <- factor(sample(paste0("sample",rep(1:9, c(2,1,1,2,1,1,2,1,1))))) dds$run <- paste0("run",1:12)

ddsColl <- collapseReplicates(dds, dds$sample, dds$run)

##

Also will like to know after if after I collapse the replicates, I need to modify my target file and run DESeqDataSetFromMatrix again??

Thanks,

Catalina

Michael Love26k wrote:

if we look up the help:

?collapseReplicates

There is information about these arguments:

groupby:     a grouping factor, as long as the columns of object

run:     optional, the names of each unique column in object. if provided, a new column runsCollapsed will be added to the colData which pastes together the names of run

And also information about the result:

Value:     the object with as many columns as levels in groupby.

So, you should make a new column which uniquely identifies the libraries which were sequenced more than once (this is what we refer to as a technical replicate). It looks like this would be:

dds$id <- factor(paste0(dds$subject, dds$treatment, dds$time))

Then provide dds$id to the 'groupby' argument. You should not run a constructor function (like DESeqDataSetFrom*) after you've run collapseReplicates(). ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Michael Love26k Hi Michael, when you defne 'groupby' with dds$id I don't understand where do you tell which samples to collapse? Like is my case A1 with A1.1, B1 with B1.1, C1 with C1.1 , and D2 with D2.1 that are my technical replicates. Would I need to specify that?

Thanks

1

It collapses by the levels in the factor variable 'groupby'.

That is why the output has as many columns as levels in 'groupby'.

For example, if the original counts matrix has 5 columns, and groupby is A, A, A, B, C, then it adds the counts from columns 1-3 to produce a column "A", and the final count table will have columns A, B, C.

Thanks Michael, now I understand I don't need to define which columns to collapse, but need to change my replicates to have the same ID.

From the example in: ?collapseReplicates I couldn't understand which were the three samples and it was confusing me.

# make data with two technical replicates for three samples
dds\$sample <- factor(sample(paste0("sample",rep(1:9, c(2,1,1,2,1,1,2,1,1)))))