Question

How to add row data to DESeqDataSetFromMatrix

0

Entering edit mode

rattray56 • 0

@rattray56-15737

Last seen 4.7 years ago

I have a rawcount RNAseq Illumina data set, and a metadata table, but when I try to make a DEseq2DataSetFromMatrix it tells me I have one extra column in my countdata. I do, it is the gene names. How can I get around this. I have tried adding nrow= gene_ID and then just listing the genes, but I don't like the idea of separating the gene names from the count table. Just seems too risky. Any suggestions are really appreciated. Alison

deseq2 • 1.2k views

ADD COMMENT • link updated 4.9 years ago by Michael Love 41k • written 4.9 years ago by rattray56 • 0

0

Entering edit mode

You need to post the top few lines of your files for anyone to be able to help you. Have you looked at the file formats used in datasets from tutorials, to see how they differ from what you have?

ADD REPLY • link 4.9 years ago swbarnes2 ★ 1.3k

score 2 · Answer 1 · 2019-06-17

2

Entering edit mode

Michael Love 41k

@mikelove

Last seen 8 hours ago

United States

When you read the file into R you should specify that the first column is the rownames.

ADD COMMENT • link 4.9 years ago Michael Love 41k

0

Entering edit mode

I am logged in and have tried to post my code twice now... is there some trick?

ADD REPLY • link 4.9 years ago rattray56 • 0

0

Entering edit mode

I am logged in and have tried to post my code twice now... is there some trick?

ADD REPLY • link 4.9 years ago rattray56 • 0

0

Entering edit mode

Not so sure how to do that... help? Here is what I am attempting to do.
Also removed column 1 from cts data but I don't think that is the real issue (though advice appreciated!) Step1: import the counts and column data, delete unwanted columns. Import raw count data: cts <- read.table("RawCountFile_rsemgenes.txt", header = TRUE, sep = "\t") dim(cts)

[1] 47643 26

head(cts)

geneid clone57RNA clone43RNA2 clone67_RNA

1 ENSMUSG00000000001.4_Gnai3 10634 6954 6835

2 ENSMUSG00000000003.15_Pbsn 0 0 0

3 ENSMUSG00000000028.14_Cdc45 559 1570 807

4 ENSMUSG00000000031.15_H19 5748 174 4103

5 ENSMUSG00000000037.16_Scml2 37 194 49

6 ENSMUSG00000000049.11_Apoh 0 3 1

clone55RNA clone7RNA clone45RNA clone88RNA clone26RNA clone25RNA

1 6510 11463 7221 6256 7530 7268

2 0 0 0 0 0 0

3 1171 1089 1069 800 1088 1071

4 146 23529 435 1318 16302 101

5 96 52 147 45 97 84

6 0 0 0 0 0 0

(cut this off to save space) Import column data: coldat <- read.csv("brca2metadata.csv", header = TRUE, sep = ",") head(coldat)

clone_ID condition

1 clone57_RNA control

2 clone43RNA2 treated

3 clone67_RNA treated

4 clone55_RNA treated

5 clone7_RNA treated

6 clone45_RNA treated

dim(coldat)

[1] 25 2

notice that there is one more column in the cts data (presumably gene names, but lets find out) colnames(cts)

[1] "geneid" "clone57RNA" "clone43RNA2" "clone67_RNA"

[5] "clone55RNA" "clone7RNA" "clone45RNA" "clone88RNA"

[9] "clone26RNA" "clone25RNA" "clone11RNA" "clone35RNA_2"

[13] "clone91RNA" "clone83RNA" "clone3RNA" "clone53RNA"

[17] "clone6RNA" "clone12RNA" "clone69RNA" "clone94RNA"

[21] "clone95RNA" "clone70RNA" "clone36RNA" "clone29RNA"

[25] "clone58RNA" "clone54RNA_2"

rownames(coldat)

[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14"

[15] "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25"

rownames(coldat[1:25, 1])

NULL

So how do I get it to use the actual clone names for of the coldat, clearly not counting it as a column…. very frustrating! Until I can get those names to be the same, it will not be possible to construct a DESeq2 dataset! I can move the gene_ID names to a rownames column. but I cannot see how to remove the numbers on the coldata!

ADD REPLY • link 4.9 years ago rattray56 • 0

0

Entering edit mode

Thanks... I finally figured it out with some local help. Why was my question removed?

ADD REPLY • link 4.9 years ago rattray56 • 0