Problem converting first column to rownames
1
0
Entering edit mode
A ▴ 40
@a-14337
Last seen 12 months ago
United Kingdom

Hi all, 

 

I am having a really frustrating problem reading in a CSV file (hope this is an appropriate forum as it is for use directly with deseq2!)

I am reading in a CSV count matrix, with first column as gene names. I am trying to make the first column a rownames columns so that that deseqfrommatrix function will work straight from the first sample! The code I am executing is as follows:

countdata<-read.csv(file = "non-norm_counts.csv")

countdata1<-countdata[,-1]
rownames(countdata1)<-countdata1[, 1]

I get the following error:  duplicate 'row.names' are not allowed

I have tried so many ways of trying to get this to work, trying solutions other people have had but to no avail. Of course now on the deseq function, ncolcountdata==nrowcoldata is not true. 

I have read in the CSV previously and used the following code to simply remove the gene data: countdata[1:ncol(Deannacountdata)]

Ultimately, on dds results, I would like the gene names next to their corresponding statistics, log2fold change, pval, adj pval etc. 

But now following results(dds) and further downstream analysis I have no idea which genes I am working with apart from the corresponding row number.  But it is worse than this: I am trying to further use a package called degreport (degpattern function) to find clusters of gene expression over time, but ultimately, the significant DE genes after LRT do not map back to any of the gene names as the column is missing and so the package throws up its own errors as it cant identify any significantly expressed genes on my list and cannot make sense of it.. 

Someone save me!!... If there is any further information, I will be happy to provide!

Many thanks!

 

 

deseq2 countdata • 2.1k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 4 days ago
United States

Duplicate row.names is an issue, because the genes names are used as identifiers. If two rows of countData have the same name, how can you pull out the correct row later with character indexing?

A quick fix is make.unique():

> make.unique(c("x","x","y","z"))
[1] "x"   "x.1" "y"   "z"  

But you may also want to investigate the genes that are duplicated:

countdata[ duplicated(countdata[,1]), 1]
ADD COMMENT
0
Entering edit mode

Just a quick not on those data that are duplicated: looks as follows:

 

      <NA>          <NA>          <NA>          <NA>          <NA>          37316         <NA>          <NA>         
 [28] <NA>          <NA>          <NA>          <NA>          <NA>          <NA>          Fam205a2      <NA>          Crybg3       
 [37] <NA>          <NA>          <NA>          <NA>          <NA>          Pcdha11       Ccl27a        <NA>          <NA>         
 [46] <NA>          <NA>          Il11ra2       Il11ra2       Ccl27a        <NA>          <NA>          <NA>          Gm16701      
 [55] <NA>          <NA>          <NA>          <NA>          <NA>          <NA>          <NA>          <NA>          <NA>         
 [64] <NA>          <NA>          <NA>          <NA>          <NA>          Gm4430        <NA>          <NA>          <NA>         
 [73] <NA>          <NA>          <NA>          <NA>          <NA>          <NA>          <NA>          <NA>          <NA>         
 

 

Is there a reason that the gene names appear on columns that should contain count data, and 2, make.unique seems like a very laborious task given the amount of duplicates that are present (my paste is just a small segment) is there a way of assigning all duplicates automatic unique identifiers without changing them manually?

 

Many thanks!

ADD REPLY
0
Entering edit mode

You should figure out what to do with the NAs. How are you going to report results for a gene with an ID of NA, if it is differentially expressed according to DESeq2. One choice would be to remove these, another would be to figure out the problem upstream.

Play around with make.unique() in your R session. And read the help ?make.unique.

I use this forum to point people down the right path, but you'll learn more by experimenting and reading documentation.

ADD REPLY
0
Entering edit mode

many thanks! will play around see if i can resolve the issue.

ADD REPLY

Login before adding your answer.

Traffic: 587 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6