Factor vs Character in design
2
0
Entering edit mode
@7aa55f9e
Last seen 7 weeks ago
United Kingdom

I have no code to post. This is question about the different the differing results I get when I change a comparison variable in the design formula from character, to factor.

i am comparing differential expression across age groups.

The data has a variable 'age' : with these values, 20,25,30,35,40,45,50. with 20 as the base comparison level.

when I run results for 'age' as a factor I get :

 Gene     baseMean  log2FoldChange      lfcSE        stat       pvalue       padj

 GeneZ    2.0324404  -0.0230828518 0.17758857 -0.12997938 0.8965827428 0.96129754

but when I run it with 'age' as a character get :

 Gene     baseMean  log2FoldChange      lfcSE        stat       pvalue         padj

 GeneZ    2.0324404  -0.013965354 0.17827642 -0.07833539 9.375613e-01 9.842875e-01

Is R treating the factor data as numerical ordinal?

So, which should I use?

(single gene for example - I note the padj)

Many thanks.

DESeq2 • 329 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 14 hours ago
United States

Using a character age vs a factor age won't make a difference. R will convert the character age to factor, with the same order and then proceed. As an example:

> fakeo <- data.frame(vals = rnorm(70), age = as.character(rep(seq(20,50,5), each = 10)))
> summary(lm(vals~age, fakeo))

Call:
lm(formula = vals ~ age, data = fakeo)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8111 -0.5374  0.1354  0.5590  2.1417 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.01985    0.31871  -0.062    0.951
age25       -0.26316    0.45072  -0.584    0.561
age30       -0.52770    0.45072  -1.171    0.246
age35       -0.27076    0.45072  -0.601    0.550
age40        0.39448    0.45072   0.875    0.385
age45        0.33408    0.45072   0.741    0.461
age50        0.11166    0.45072   0.248    0.805

Residual standard error: 1.008 on 63 degrees of freedom
Multiple R-squared:  0.0978,    Adjusted R-squared:  0.01188 
F-statistic: 1.138 on 6 and 63 DF,  p-value: 0.3509

> fakeo$age <- factor(fakeo$age)
> fakeo$age
 [1] 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25 30 30 30 30 30 30 30 30 30 30 35 35 35 35 35 35 35
[38] 35 35 35 40 40 40 40 40 40 40 40 40 40 45 45 45 45 45 45 45 45 45 45 50 50 50 50 50 50 50 50 50 50
Levels: 20 25 30 35 40 45 50
> summary(lm(vals~age, fakeo))

Call:
lm(formula = vals ~ age, data = fakeo)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8111 -0.5374  0.1354  0.5590  2.1417 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.01985    0.31871  -0.062    0.951
age25       -0.26316    0.45072  -0.584    0.561
age30       -0.52770    0.45072  -1.171    0.246
age35       -0.27076    0.45072  -0.601    0.550
age40        0.39448    0.45072   0.875    0.385
age45        0.33408    0.45072   0.741    0.461
age50        0.11166    0.45072   0.248    0.805

Residual standard error: 1.008 on 63 degrees of freedom
Multiple R-squared:  0.0978,    Adjusted R-squared:  0.01188 
F-statistic: 1.138 on 6 and 63 DF,  p-value: 0.3509

Same results, regardless.

1
Entering edit mode

Put a different way, it's likely that you are computing a different contrast somehow rather than having something to do with how R handles numeric-looking characters.

ADD REPLY
0
Entering edit mode

Only thing I could imagine is that when using a factor 20 is not the base level while when using character the internal conversion makes 20 the base level. There are posts here that show factor level order can make a slight difference in how DESeq2 estimates model parameters.

ADD REPLY
0
Entering edit mode

That's possible as well, although OP says 20 is the baseline.

ADD REPLY
0
Entering edit mode

That would seem a logical and reasonable explanation of what I'm seeing.

ADD REPLY
0
Entering edit mode
@7aa55f9e
Last seen 7 weeks ago
United Kingdom

Excellent explanation - thank you very much.

ADD COMMENT

Login before adding your answer.

Traffic: 530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6