Question

Deseq issues - pipeline all of a sudden stopped working

0

Entering edit mode

skamboj • 0

@skamboj-21199

Last seen 5.5 years ago

The following was working just fine last week, however, I keep getting different errors all of a sudden and I don't know why

I've been getting either: duplicate 'row.names' are not allowed

when I delete row.names I get: ncol doesn't equal ncount or I get that there are neg values in my count data, which there are not and I simply can not figure out what's wrong... any suggestions?

I've tried playing around with header and a few other things, but can't figure it out

I think it has something to do with it not correctly reading the first column that has the gene names and the first row that has the names that correspond to who the count values within the matrix belong to, but don't know how to fix it

library("DESeq2")

genecountmatrix <- as.matrix(read.csv("/home/gordovezfa/R/3.5/Shawn/file.csv", row.names = "geneid")) View(genecount_matrix)

RNAseq <- read.csv("/home/gordovezfa/R/3.5/Shawn/file.csv", row.names = "ids") View(SMS_RNAseq)

Disease <- factor(c("SMS","Con"))

genecountmatrix <- as.data.frame(genecountmatrix)

RNAseq <- as.data.frame(RNAseq)

dds <- DESeqDataSetFromMatrix(countData = genecountmatrix, colData = RNAseq, design = ~Disease, tidy = TRUE)

dds

dds <- dds[ rowSums(counts(dds)) > 5,] dds <- DESeq(dds)

res <- results(dds, contrast = c("Disease", "Effec", "Con"), tidy = TRUE) res <- res[order(res$padj),] sum(res$padj < 0.05, na.rm = TRUE)

res <- res[ !is.na(res$padj), ] res <- res[ !is.na(res$pvalue), ] res <- res[, -which(names(res) == "padj")]

sum(res$padj < 0.05)

write.csv(as.data.frame(res), file="SMS.csv")

deseq2 • 1.4k views

ADD COMMENT • link updated 5.5 years ago by Michael Love 43k • written 5.5 years ago by skamboj • 0

score 0 · Answer 1 · 2019-07-01

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

DESeq2 didn't have any changes that would affect you.

However, you can check yourself for duplicate row names easily enough for x:

any(duplicated(rownames(x)))

There's not a good reason to have duplicated row names during an analysis, they can cause all kinds of confusion.

ADD COMMENT • link 5.5 years ago Michael Love 43k

0

Entering edit mode

any(duplicated(rownames(genecountmatrix))) [1] FALSE

geneNames = row.names(genecountmatrix) geneNames[duplicated(geneNames)] character(0)

https://imgur.com/a/hDG7Av6

Current error:

Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘0’, ‘1’, ‘10’, ‘100

ADD REPLY • link 5.5 years ago skamboj • 0

0

Entering edit mode

It’s your colData though that’s giving the error right?

ADD REPLY • link 5.5 years ago Michael Love 43k

0

Entering edit mode

posted an image of the coldata as well if you scroll down

Also, geneNames = row.names(SMS_RNAseq)

geneNames[duplicated(geneNames)] character(0) any(duplicated(colnames(SMSRNAseq))) [1] FALSE any(duplicated(rownames(SMSRNAseq))) [1] FALSE

ADD REPLY • link 5.5 years ago skamboj • 0

score 0 · Answer 2 · 2019-07-01

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 9 hours ago

San Diego

genecountmatrix <- as.matrix(read.csv("/home/gordovezfa/R/3.5/Shawn/file.csv", row.names = "geneid")) View(genecount_matrix)

RNAseq <- read.csv("/home/gordovezfa/R/3.5/Shawn/file.csv", row.names = "ids") View(SMS_RNAseq)

Are you sure that's right? To be pulling the count data and the sample info data from the file? Your counts should have samples as column names, the other info has sample names as row names.

ADD COMMENT • link 5.5 years ago swbarnes2 ★ 1.4k

0

Entering edit mode

genecountmatrix <- as.matrix(read.csv("file.csv", row.names = "gene_id"))

SMS_RNAseq <- read.csv("RNAseq.csv", row.names = "ids")

https://imgur.com/a/hDG7Av6

Current error:

Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘0’, ‘1’, ‘10’, ‘100

ADD REPLY • link 5.5 years ago skamboj • 0

0

Entering edit mode

That figure is not showing your colData (?) - there is only a single column of gene names?

ADD REPLY • link 5.5 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

If you scroll down in the imgur link you'll see the coldata layout. Correct all genes are in the geneid column

Also, geneNames = row.names(SMS_RNAseq) geneNames[duplicated(geneNames)] character(0) any(duplicated(colnames(SMSRNAseq))) [1] FALSE any(duplicated(rownames(SMSRNAseq))) [1] FALSE

ADD REPLY • link 5.5 years ago skamboj • 0

0

Entering edit mode

Thanks - now I see it. Why is the first column of your count matrix the gene names? - they should be rownames. I do not immediately see anything else that is unusual. This is probably something that we could fix quickly if we were at your computer, but difficult while remote. If I may suggest that you go back to the start and re-do each step, ensuring that each [step] is doing exactly as you expect. Also check the encoding of your variables via the str() function.

ADD REPLY • link 5.5 years ago Kevin Blighe ★ 4.0k