Question

Normalise to a housekeeping gene in DESEq2

0

Entering edit mode

ecg1g15 ▴ 20

@ecg1g15-19970

Last seen 3.4 years ago

I am working with gene expression data from a RNASeq dataset using DESEq2.

I have realised my housekeeping gene (gene which is expected to be maintained across samples independent of condition) is significantly different between two of my condition groups. Therefore, I would like to normalise all my data to the expression of this gene.

Can I approach it like this?

dds <- DESeq(dds)

dds<- estimateSizeFactors(dds, controlGenes=OG_90)

dds <- nbinomWaldTest(dds)

I get the following error when running the estimateFactor:

Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc, : object 'OG_90' not found

How can I refer to the housekeeping gene in that function?

DESeq2 housekeeping R RNASeqData • 3.4k views

ADD COMMENT • link 3.4 years ago ecg1g15 ▴ 20

1

Entering edit mode

I don't know how to mark ATpoint comment as answer, so adding a post here to mark the thread as answered...

ADD REPLY • link 3.4 years ago Michael Love 41k

1

Entering edit mode

I have switched them around now. To move a post to the answer box, you first have to click on the ADD ANSWER button to open the dialog box, and then drag the comment there via the little hand icon.

ADD REPLY • link 3.4 years ago Kevin Blighe ★ 3.9k

1

Entering edit mode

Thanks!

ADD REPLY • link 3.4 years ago Michael Love 41k

score 4 · Accepted Answer · 2020-11-27

4

Entering edit mode

ATpoint ★ 4.0k

@atpoint-13662

Last seen 16 hours ago

Germany

From the manual:

controlGenes: optional, numeric or logical index vector specifying those genes to use for size factor estimation (e.g. housekeeping or spike-in genes)

That means it must be a logical (TRUE/FALSE) or numeric vector that tells the function which row of your dds object the control genes are in. I would check though whether the normalization itself (using the defaults) make sense, for example using MA-plots. The bulk of data points should be somewhat centered along a logFC of zero. Just because a gene is expected not to change does not mean that in reality this assumption holds true. I would explore data before relying on normalization to a single gene.

ADD COMMENT • link 3.4 years ago ATpoint ★ 4.0k

0

Entering edit mode

It worked by running the row name like this:

dds<- estimateSizeFactors(dds, controlGenes=3970)
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)

Which made the significance of the housekeeping between conditions to 0 and I am more confident now of my gene expression results overall. Thanks @ATpoint

ADD REPLY • link 3.4 years ago ecg1g15 ▴ 20