Question

DESeq2 estimateSizeFactors controlGenes return an error

0

Entering edit mode

veljokisand • 0

@veljokisand-14559

Last seen 6.4 years ago

Hi,

I tried to find in web first but apparently no right answer:

I run:

dds <- estimateSizeFactors(dds, controlGenes = contGenes)

contGenes is a vector but an error message is "Error in counts[controlGenes, , drop = FALSE] : subscript out of bounds"

contGenes int values are quite high, and since example dds <- estimateSizeFactors(dds, controlGenes = 1:200) works fine I tried

dds <- estimateSizeFactors(dds, controlGenes = contGenes/1000)

and then it is fully OK. Why it is so, I understand this should be abundance of spiked in internal standard, should be this in the same range as real data? Or it is just Ok to put array of percent of spiked reads from the total?

Cheers,

Veljo

deseq2 • 1.7k views

ADD COMMENT • link updated 6.4 years ago by Michael Love 41k • written 6.4 years ago by veljokisand • 0

score 0 · Answer 1 · 2017-12-06

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 18 hours ago

United States

"this should be abundance of spiked in internal standard"

controlGenes should specify ~~either names or the~~ numeric or logical indices of the rows of dds (genes) that you want to use for normalization.

If you're not sure about what you should provide as an argument to a function in R, you should always first check the associated help file:

?estimateSizeFactors

ADD COMMENT • link 6.4 years ago Michael Love 41k

0

Entering edit mode

ThanksMichael!

ok I understood it completely wrong, somehow did not understand that it means matrix index. Just to double check, in genes, matrix my spike in gene is at column 20, I need to specify it as following:

dds <- estimateSizeFactors(dds, controlGenes = 20)

Veljo

ADD REPLY • link 6.4 years ago veljokisand • 0

0

Entering edit mode

Yes. Having just one spike in is pretty bad though. You'd typically want many, throwing out a number off the top of my head, like 20 or more.

ADD REPLY • link 6.4 years ago Michael Love 41k

0

Entering edit mode

Sure, we have 4 for our environmental metatranscriptomes, a bit hard to work with 20 or more in this case, they have to be synthetic.

Still the final question, I did manage to figure out how to use gene names:

dds <- estimateSizeFactors(dds, controlGenes = "spike_gene1") nor

dds <- estimateSizeFactors(dds, controlGenes = spike_gene1)

and similar do not work...

According to manual, it should logical vector i.e. c(FALSE, TRUE, FALSE, TRUE, FALSE)?

ADD REPLY • link 6.4 years ago veljokisand • 0

0

Entering edit mode

~~Can you confirm that "spike_gene1" is in the rownames(dds)? It needs to be or else this syntax won't work.~~

You can use ~~any kind of index: by name, by~~ number or a logical vector of length nrow(dds).

ADD REPLY • link 6.4 years ago Michael Love 41k

0

Entering edit mode

Yes, rownames(dds) lists all gene names including "spike_gene1" without any problems...

ADD REPLY • link 6.4 years ago veljokisand • 0

0

Entering edit mode

I'm sorry, I wasn't correct, controlGenes can be either a numeric or a logical. So you can use:

dds <- estimateSizeFactors(dds,
  controlGenes=rownames(dds) %in% c("spike_gene1", ... ))

Where you can put in the rest of the spike in genes where I put "..."

ADD REPLY • link 6.4 years ago Michael Love 41k