Question

Not seeing gene ID in glmQLFit headers

0

Entering edit mode

Mike Lawrence • 0

@mike-lawrence-24010

Last seen 3.7 years ago

University of Manitoba

Hi everyone,

I'm relatively new to using edgeR and I'm doing some work looking at a 2 factor toxicology (treatment [4 levels] & time [3 levels]) transcript project.

I've been following along with a couple bioconductor vignettes and am at the point where I'm fitting my data to the glmQLfit portion. However, when I look at the output for the headers, rather then getting the gene ID's and the groups (as in the edgeR user guide pg 71), I'm getting the intercept and the groups. Is there anything that would cause this? I've included the code that I'm using to get the result explained here. Thanks in advance!

Code:

>design <- model.matrix(~group+time)

>normfact <- estimateDisp(normfact, design, robust = TRUE)
>normfact

>normfact$common.dispersion
>summary(normfact$tagwise.dispersion)


>plotBCV(normfact)

>fit <- glmQLFit(normfact, design, robust=TRUE)

>head(fit$coefficients)

OUTPUT:



  (Intercept)        groupN      groupT      groupTN     timeB12      timeC24
1   -12.63176 -0.0467496225 -0.34751354 -0.116757410 -0.03835614  0.053469181
2   -10.47211  0.0502074972  0.04794780  0.040551811 -0.01753715 -0.022839745
4   -13.73637 -0.2918577680 -0.46750819 -0.118112091 -0.27250320 -0.320790026
6   -12.90871 -0.2677782345  0.04047747 -0.004869292  0.27539573  0.110037160
8   -14.13635 -0.1915214490 -0.04989598 -0.163178952  0.08314316  0.123834088
9   -10.32999 -0.0002245027 -0.02992052 -0.028494481 -0.01449350 -0.005435779

glmQLFit edgeR header • 1.1k views

ADD COMMENT • link updated 3.7 years ago by Gordon Smyth 50k • written 3.7 years ago by Mike Lawrence • 0

score 2 · Answer 1 · 2020-08-17

2

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 3 hours ago

United States

You will only have the gene IDs as the row.names of your fitted object if those were the row.names of the counts matrix you started with. Evidently you just had 1:N as the row.names, so that's what you see when you extract the coefficients from your fitted model object.

ADD COMMENT • link 3.7 years ago James W. MacDonald 65k

0

Entering edit mode

Thank you for the quick response! So is that necessary then moving forward in DE analyses then? Or will it bring the gene ID's in from the DGE object? I only ask as example 4.4. from the edgeR user guide didn't appear to do anything special in the design matrix for including the gene ID's unless I'm missing something? Maybe I'm not quite following correctly?

ADD REPLY • link 3.7 years ago Mike Lawrence • 0

1

Entering edit mode

If you add the gene annotations to your DGEList at the outset, when you use topTags, they will be added to your list of genes.

ADD REPLY • link 3.7 years ago James W. MacDonald 65k

1

Entering edit mode

edgeR will use whatever gene IDs are in the data object normfact.

You can insert gene IDs and gene annotation into the edgeR pipeline at any point, but they are not input via the design matrix. Usually people insert the gene annotation at the beginning when the DGEList is created. In any case, it doesn affect the DE results.

Your data object normfact might even contain a data.frame of gene annotation, which would be incorporated into any topTags table produced by edgeR. From the code you show, we can't telll whether normfact contains such annotation or not.

ADD REPLY • link 3.7 years ago Gordon Smyth 50k

0

Entering edit mode

Perfect! Thank you both for the help!

ADD REPLY • link 3.7 years ago Mike Lawrence • 0

score 1 · Answer 2 · 2020-08-17

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 3 hours ago

WEHI, Melbourne, Australia

The output you show is analogous to that shown on page 71 of the User's Guide.

In the User's Guide case study, the row names are Entrez Gene IDs. In your data, the row names are just 1,2,3 etc. As James and I have explained, edgeR will use whatever row names you entered in the first place.

In the Users's Guide case study, the intercept was removed from the linear model by including 0+ in the model formula. You haven't done that, nor do you necessarily need to. The coefficient columns as output by glmQLFit are the same as the columns of the design matrix.

None of this necessarily indicates any problem for your analysis.

In short, you make the row names when you create the DGEList or count matrix and you make the column names when you make the design matrix. edgeR just follows your lead.

ADD COMMENT • link 3.7 years ago Gordon Smyth 50k

0

Entering edit mode

Ok thank you. Yes I have all of my gene ID's in the DGElist as they were imported from the original text file. This makes much more sense now, I just didn't want to move forward if the program was wrongly interpreting my input. Thank you so much for the help!

ADD REPLY • link 3.7 years ago Mike Lawrence • 0

0

Entering edit mode

Ok thank you. Yes I have all of my gene ID's in the DGElist as they were imported from the original text file. This makes much more sense now, I just didn't want to move forward if the program was wrongly interpreting my input. Thank you so much for the help!

ADD REPLY • link 3.7 years ago Mike Lawrence • 0