How to add row data to DESeqDataSetFromMatrix
1
0
Entering edit mode
rattray56 • 0
@rattray56-15737
Last seen 4.6 years ago

I have a rawcount RNAseq Illumina data set, and a metadata table, but when I try to make a DEseq2DataSetFromMatrix it tells me I have one extra column in my countdata. I do, it is the gene names. How can I get around this. I have tried adding nrow= gene_ID and then just listing the genes, but I don't like the idea of separating the gene names from the count table. Just seems too risky. Any suggestions are really appreciated. Alison

deseq2 • 1.2k views
ADD COMMENT
0
Entering edit mode

You need to post the top few lines of your files for anyone to be able to help you. Have you looked at the file formats used in datasets from tutorials, to see how they differ from what you have?

ADD REPLY
2
Entering edit mode
@mikelove
Last seen 14 hours ago
United States

When you read the file into R you should specify that the first column is the rownames.

ADD COMMENT
0
Entering edit mode

I am logged in and have tried to post my code twice now... is there some trick?

ADD REPLY
0
Entering edit mode

I am logged in and have tried to post my code twice now... is there some trick?

ADD REPLY
0
Entering edit mode

Not so sure how to do that... help? Here is what I am attempting to do.
Also removed column 1 from cts data but I don't think that is the real issue (though advice appreciated!) Step1: import the counts and column data, delete unwanted columns. Import raw count data: cts <- read.table("RawCountFile_rsemgenes.txt", header = TRUE, sep = "\t") dim(cts)

[1] 47643 26

head(cts)

geneid clone57RNA clone43RNA2 clone67_RNA

1 ENSMUSG00000000001.4_Gnai3 10634 6954 6835

2 ENSMUSG00000000003.15_Pbsn 0 0 0

3 ENSMUSG00000000028.14_Cdc45 559 1570 807

4 ENSMUSG00000000031.15_H19 5748 174 4103

5 ENSMUSG00000000037.16_Scml2 37 194 49

6 ENSMUSG00000000049.11_Apoh 0 3 1

clone55RNA clone7RNA clone45RNA clone88RNA clone26RNA clone25RNA

1 6510 11463 7221 6256 7530 7268

2 0 0 0 0 0 0

3 1171 1089 1069 800 1088 1071

4 146 23529 435 1318 16302 101

5 96 52 147 45 97 84

6 0 0 0 0 0 0

(cut this off to save space) Import column data: coldat <- read.csv("brca2metadata.csv", header = TRUE, sep = ",") head(coldat)

clone_ID condition

1 clone57_RNA control

2 clone43RNA2 treated

3 clone67_RNA treated

4 clone55_RNA treated

5 clone7_RNA treated

6 clone45_RNA treated

dim(coldat)

[1] 25 2

notice that there is one more column in the cts data (presumably gene names, but lets find out) colnames(cts)

[1] "geneid" "clone57RNA" "clone43RNA2" "clone67_RNA"

[5] "clone55RNA" "clone7RNA" "clone45RNA" "clone88RNA"

[9] "clone26RNA" "clone25RNA" "clone11RNA" "clone35RNA_2"

[13] "clone91RNA" "clone83RNA" "clone3RNA" "clone53RNA"

[17] "clone6RNA" "clone12RNA" "clone69RNA" "clone94RNA"

[21] "clone95RNA" "clone70RNA" "clone36RNA" "clone29RNA"

[25] "clone58RNA" "clone54RNA_2"

rownames(coldat)

[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14"

[15] "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25"

rownames(coldat[1:25, 1])

NULL

So how do I get it to use the actual clone names for of the coldat, clearly not counting it as a column…. very frustrating! Until I can get those names to be the same, it will not be possible to construct a DESeq2 dataset! I can move the gene_ID names to a rownames column. but I cannot see how to remove the numbers on the coldata!

ADD REPLY
0
Entering edit mode

Thanks... I finally figured it out with some local help. Why was my question removed?

ADD REPLY

Login before adding your answer.

Traffic: 766 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6