Normalise to a housekeeping gene in DESEq2
1
0
Entering edit mode
ecg1g15 ▴ 30
@ecg1g15-19970
Last seen 4.0 years ago

I am working with gene expression data from a RNASeq dataset using DESEq2.

I have realised my housekeeping gene (gene which is expected to be maintained across samples independent of condition) is significantly different between two of my condition groups. Therefore, I would like to normalise all my data to the expression of this gene.

Can I approach it like this?

dds <- DESeq(dds)

dds<- estimateSizeFactors(dds, controlGenes=OG_90)

dds <- nbinomWaldTest(dds)

I get the following error when running the estimateFactor:

Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc, : object 'OG_90' not found

How can I refer to the housekeeping gene in that function?

DESeq2 housekeeping R RNASeqData • 4.3k views
ADD COMMENT
1
Entering edit mode

I don't know how to mark ATpoint comment as answer, so adding a post here to mark the thread as answered...

ADD REPLY
1
Entering edit mode

I have switched them around now. To move a post to the answer box, you first have to click on the ADD ANSWER button to open the dialog box, and then drag the comment there via the little hand icon.

ADD REPLY
1
Entering edit mode

Thanks!

ADD REPLY
5
Entering edit mode
ATpoint ★ 4.6k
@atpoint-13662
Last seen 1 day ago
Germany

From the manual:

controlGenes: optional, numeric or logical index vector specifying those genes to use for size factor estimation (e.g. housekeeping or spike-in genes)

That means it must be a logical (TRUE/FALSE) or numeric vector that tells the function which row of your dds object the control genes are in. I would check though whether the normalization itself (using the defaults) make sense, for example using MA-plots. The bulk of data points should be somewhat centered along a logFC of zero. Just because a gene is expected not to change does not mean that in reality this assumption holds true. I would explore data before relying on normalization to a single gene.

ADD COMMENT
1
Entering edit mode

It worked by running the row name like this:

dds<- estimateSizeFactors(dds, controlGenes=3970)
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)

Which made the significance of the housekeeping between conditions to 0 and I am more confident now of my gene expression results overall. Thanks @ATpoint

ADD REPLY

Login before adding your answer.

Traffic: 895 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6