Deseq issues - pipeline all of a sudden stopped working
2
0
Entering edit mode
skamboj • 0
@skamboj-21199
Last seen 4.8 years ago

The following was working just fine last week, however, I keep getting different errors all of a sudden and I don't know why

I've been getting either: duplicate 'row.names' are not allowed

when I delete row.names I get: ncol doesn't equal ncount or I get that there are neg values in my count data, which there are not and I simply can not figure out what's wrong... any suggestions?

I've tried playing around with header and a few other things, but can't figure it out

I think it has something to do with it not correctly reading the first column that has the gene names and the first row that has the names that correspond to who the count values within the matrix belong to, but don't know how to fix it

library("DESeq2")

genecountmatrix <- as.matrix(read.csv("/home/gordovezfa/R/3.5/Shawn/file.csv", row.names = "geneid")) View(genecount_matrix)

RNAseq <- read.csv("/home/gordovezfa/R/3.5/Shawn/file.csv", row.names = "ids") View(SMS_RNAseq)

Disease <- factor(c("SMS","Con"))

genecountmatrix <- as.data.frame(genecountmatrix)

RNAseq <- as.data.frame(RNAseq)

dds <- DESeqDataSetFromMatrix(countData = genecountmatrix, colData = RNAseq, design = ~Disease, tidy = TRUE)

dds

dds <- dds[ rowSums(counts(dds)) > 5,] dds <- DESeq(dds)

res <- results(dds, contrast = c("Disease", "Effec", "Con"), tidy = TRUE) res <- res[order(res$padj),] sum(res$padj < 0.05, na.rm = TRUE)

res <- res[ !is.na(res$padj), ] res <- res[ !is.na(res$pvalue), ] res <- res[, -which(names(res) == "padj")]

sum(res$padj < 0.05)

write.csv(as.data.frame(res), file="SMS.csv")

deseq2 • 1.1k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

DESeq2 didn't have any changes that would affect you.

However, you can check yourself for duplicate row names easily enough for x:

any(duplicated(rownames(x)))

There's not a good reason to have duplicated row names during an analysis, they can cause all kinds of confusion.

ADD COMMENT
0
Entering edit mode

any(duplicated(rownames(genecountmatrix))) [1] FALSE

geneNames = row.names(genecountmatrix) geneNames[duplicated(geneNames)] character(0)

https://imgur.com/a/hDG7Av6

Current error:

Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘0’, ‘1’, ‘10’, ‘100

ADD REPLY
0
Entering edit mode

It’s your colData though that’s giving the error right?

ADD REPLY
0
Entering edit mode

posted an image of the coldata as well if you scroll down

Also, geneNames = row.names(SMS_RNAseq)

geneNames[duplicated(geneNames)] character(0) any(duplicated(colnames(SMSRNAseq))) [1] FALSE any(duplicated(rownames(SMSRNAseq))) [1] FALSE

ADD REPLY
0
Entering edit mode
swbarnes2 ★ 1.3k
@swbarnes2-14086
Last seen 1 day ago
San Diego

genecountmatrix <- as.matrix(read.csv("/home/gordovezfa/R/3.5/Shawn/file.csv", row.names = "geneid")) View(genecount_matrix)

RNAseq <- read.csv("/home/gordovezfa/R/3.5/Shawn/file.csv", row.names = "ids") View(SMS_RNAseq)

Are you sure that's right? To be pulling the count data and the sample info data from the file? Your counts should have samples as column names, the other info has sample names as row names.

ADD COMMENT
0
Entering edit mode

genecountmatrix <- as.matrix(read.csv("file.csv", row.names = "gene_id"))

SMS_RNAseq <- read.csv("RNAseq.csv", row.names = "ids")

https://imgur.com/a/hDG7Av6

Current error:

Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘0’, ‘1’, ‘10’, ‘100

ADD REPLY
0
Entering edit mode

That figure is not showing your colData (?) - there is only a single column of gene names?

ADD REPLY
0
Entering edit mode

If you scroll down in the imgur link you'll see the coldata layout. Correct all genes are in the geneid column

Also, geneNames = row.names(SMS_RNAseq) geneNames[duplicated(geneNames)] character(0) any(duplicated(colnames(SMSRNAseq))) [1] FALSE any(duplicated(rownames(SMSRNAseq))) [1] FALSE

ADD REPLY
0
Entering edit mode

Thanks - now I see it. Why is the first column of your count matrix the gene names? - they should be rownames. I do not immediately see anything else that is unusual. This is probably something that we could fix quickly if we were at your computer, but difficult while remote. If I may suggest that you go back to the start and re-do each step, ensuring that each [step] is doing exactly as you expect. Also check the encoding of your variables via the str() function.

ADD REPLY

Login before adding your answer.

Traffic: 515 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6