I am trying to do differential gene expression analysis on a transcriptome I generated. I generated a normalized count matrix called Trans_Atlas first and now I am running into an error using glmFit command.
Here is my script so far:
> Trans_Atlas <- read.table("normalized_count_matrix", header=T, row.name=1, com='', check.names=F)
> group <- factor(c(1,1,2,2,3,3,4,4,5,5,6,6,7,7))
> design <- model.matrix(~0+group, data=Trans_Atlas$samples)
> test <- estimateDisp(Trans_Atlas,design)
> fit <- glmFit(test,design)
Error in glmFit.default(test, design) : y is not a numeric matrix
I'm confused on this error as test is a list with 6 values:
common.dispersion : num 0.482
trended.dispersion : num [1:30558]
tagwise.dispersion : num [1:30558]
span : num 0.281
prior.df : num 6.21
prior.n : num 0.888
I'm new to R where did I mess up?
Thanks
Hi even I had this problem, honestly, I read the manual and I am still having difficulty. It is not just about reading manual it is about knowing how to use these functions. I like the user borpano do not know what to do when I have this issue.
As for normalization factors I would like to know how you suggest doing TMM normalization Aaron Lun as you are supposed to supply the original counts?
Yes, see
?calcNormFactors
.I know this one however, I don't think we can just supply it to a in the following command
y <- estimateDisp( a, design) .
I wanted to check as I am not used to using this package. Any help you can give would be much appreciated.
If you read
?estimateDisp
, you will see that the output ofestimateDisp
changes depending on whethera
is aDGEList
or a matrix. All use cases in the user's guide will supply aDGEList
toestimateDisp
; this is what you should be doing as well.I get that we should be supplying a
DGEList
but can we supply TMM normalized values as it does not seem okay to in the documentation to supply it toestimateDisp
? However, TMM normalization in general is encouraged, I believe?I get that we should be supplying a DGEList
Then do it.
can we supply TMM normalized values
edgeR has no concept of "TMM normalized values", see Section 2.7.6 of the user's guide. There are TMM normalization factors, but the counts should not be modified during the course of the analysis. You should not supply any kind of scaled counts to
estimateDisp
or the other edgeR functions.it does not seem okay to in the documentation to supply it to estimateDisp?
I don't understand what you're complaining about. Just create a
DGEList
object and use it for your analysis steps includingestimateDisp
- it's that easy. Why are you trying to fight this?However, TMM normalization in general is encouraged, I believe?
Use
calcNormFactors
on aDGEList
, and you will get aDGEList
in return. This can be used for input into the remaining edgeR functions, includingestimateDisp
.I was not trying to fight I just didn't understand so from that I get that I should just use a dgelist of counts fed in estimateDisp. Thank you for all your help! :)
If you have been following the instructions in the user's guide, you cannot possibly have the problems described by the OP, because we never tell users to run the commands in that manner! Specifically, all calls to
estimateDisp
are performed onDGEList
objects, not on the count matrices themselves. In addition, we do not accept "normalized counts" as input - see the explanation in Section 2.7.6. I would suggest the following:?estimateDisp
.Oh in which case, while I see your point, I have used a DGEList (I did not see that the person did not use a DGEList) and yet I am still getting the error that this person got. As for the user guide please do not refer to that I do not understand what I am doing wrong, I have read the manual.
I see you must have ignored point 4 in my comment above. I'm not psychic, I can't figure out what you did wrong if you don't show any code! In the absence of any further information, I can only refer you to the user's guide, or the workflow that I mentioned above, or
?estimateDisp
.I did not ignore it intentionally. I did not realize that I needed to in this case. Sorry. Thank you for the help though.
I'm confused as to why the estimateDisp is creating this list object instead of the correct matrix. To generate this count matrix I used some of the perl scripts include in the Trinity de novo transcriptome Assember which output this R script to create both a DGEList and normalized via normalization factors all using the edgeR package.
> library(edgeR)
Loading required package: limma
>
> rnaseqMatrix = read.table("RSEM.isoform.TPM.not_cross_norm", header=T, row.names=1, com='', check.names=F)
> rnaseqMatrix = as.matrix(rnaseqMatrix)
> rnaseqMatrix = round(rnaseqMatrix)
> exp_study = DGEList(counts=rnaseqMatrix, group=factor(colnames(rnaseqMatrix)))
> exp_study = calcNormFactors(exp_study)
> exp_study$samples$eff.lib.size = exp_study$samples$lib.size * exp_study$samples$norm.factors
> write.table(exp_study$samples, file="RSEM.isoform.TPM.not_cross_norm.TMM_info.txt", quote=F, sep="\t", row.names=F)
>
Do I have to reformat the "Trans_Atlas into a DGEList data structure?
Yes. Read the "Value" section in
?estimateDisp
, especially pertaining toestimateDisp.default
.This solved my issue thank you very much.