Hi guys,
at the moment I have a dataframe, which the Gene ID is in one column, the others are sample ID as column names and expression value as observations.
Can I simply create a DGEList with this dataframe? Or do I have to turn all Gene ID in the column into row names before I do that?
If yes, then I did below codes:
wide_vgsID <- column_to_rownames(vgsID , "GENE.SYMBOL")
#then I got
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ........
I have use the duplicate() function to check, but the console return FALSE
Does anyone have any idea how to solve this? Thanks!
thank you Gordon!
Hi Gordon,
May I have a follow up question?
My gene expression counts need transformation for sure. shall I do it before generating the list? i.e. log transform the expression value, then generate DGEList? Or can I do it after?
Plus I have blank spaces in the df too, I assume I need to replace them with zero when I create the list?
Thanks
Sequence read counts are never transformed, either before or after creating the DGEList.
edgeR always works on actual counts, not on transformed quantities. Please see the edgeR User's Guide and associated documentation.
Sorry, but I have no idea what you mean by "blank spaces". There are no data generation processes that I know of that produce blank spaces where data points should be. Please explain what your data actually represents because, on the face of it, it seems you might be trying to do an analysis that is not appropriate for your data.
Thank you for the transformation part.
the df I'm working on originally is a long format table like below:-
Then for generating the DGEList, I decided to transform it into a wide format and generated below table:
because some samples may not express a certain gene, hence the console leave it blank when I wide pivot it. When I view() the df, it showed as blank. But when I do summery() it shows as NULL in the console.
Right now, I am trying to use apply() to replace the blank with 0 but with no luck, all values turned into 0.
I am still very new to this, appreciate the advices.
Is this proteomics data?
Your data is not actually in standard wide format. It is rather a type of matrix-market format where the unobserved genes are omitted. IMO it is dangerous to use general-purpose pivot or reshape functions to convert this to wide format because those functions don't know what to do with the missing values. On the other hand, converting to wide format using base R is straightforward. First, read your long format table into R as a data.frame. I will call it
LongForm
. Then