Hi,
I was doing differential genes analysis with DESeq2. My experiment design is very simple, I used a drug to treat the cancer cell line, I did total RNAseq for both control and drug-treated, and each of them had three repetitions. When I did the "Convert sample variable mappings to an appropriate form that DESeq2 can read" task, I got a error, "Error in [.data.frame
(sampleInfo, , keep) : undefined columns selected".I tried "many answers", but still failed. Really need help!
the detailed codes as below:
```
> myData <- read.csv("gene_count_matrix.csv")
> head(myData)
gene_id sample1 sample2 sample3 sample4 sample5 sample6
1 MSTRG.1|DDX11L1 12 8 13 20 28 39
2 MSTRG.2 8647 8341 8044 18022 17711 20429
3 MSTRG.3|MIR1302-2HG 0 0 0 0 0 0
4 MSTRG.3|MIR1302-2 0 0 0 2 0 0
5 MSTRG.4|FAM138A 0 0 0 0 0 0
6 MSTRG.5|OR4G4P 0 0 0 0 0 0
> geneID <- myData$gene_id
> sampleIndex <- grepl("sample\\d+",colnames(myData))
> myData <- as.matrix(myData[,sampleIndex])
> rownames(myData) <- geneID
> head(myData)
sample1 sample2 sample3 sample4 sample5 sample6
MSTRG.1|DDX11L1 " 12" " 8" " 13" " 20" " 28" " 39"
MSTRG.2 " 8647" " 8341" " 8044" " 18022" " 17711" " 20429"
MSTRG.3|MIR1302-2HG " 0" " 0" " 0" " 0" " 0" " 0"
MSTRG.3|MIR1302-2 " 0" " 0" " 0" " 2" " 0" " 0"
MSTRG.4|FAM138A " 0" " 0" " 0" " 0" " 0" " 0"
MSTRG.5|OR4G4P " 0" " 0" " 0" " 0" " 0" " 0"
> sampleInfo <- read.csv("PHENO_DATA.csv")
> head(sampleInfo)
ids.........groups
1 sample1 control1
2 sample2 control2
3 sample3 control3
4 sample4 lycorine1
5 sample5 lycorine2
6 sample6 lycorine3
> rownames(sampleInfo) <- sampleInfo$ids
> keep <- c("ids", "groups")
> sampleInfo <- sampleInfo[,keep]
Error in `[.data.frame`(sampleInfo, , keep) : undefined columns selected
Hi, Do you mean just leave the PHENO_DATA.csv as it was? But how to delete the first column, I mean the numbers "1, 2, 3, 4, 5, 6"? I checked a lot of examples, they set the sampleData (sampleInfo in my case) as below:
ids groups sample1 control1 sample2 control2 sample3 control3 sample4 lycorine1 sample5 lycorine2 sample6 lycorine3
You just need to do something like:
Then
myData
should represent numerical data, and the other issues should vanish.By the way, your question is unrelated to DESeq2 and should probably have been asked on a more generic bioinformatics website.
Kevin Blighe Blighe Thank you for your help! Do you mean I got this error because I load my count data in a wrong way? But I got the error "Error in [.data.frame(sampleInfo, , keep) : undefined columns selected" from my sampleInfo data.
Yes, my suggestion will help to solve the ultimate error that you receive. Please check the input and output of every command that you are running; however, please start by first using:
There are other general issues. For example, when you run this,
head(myData)
, one can clearly see how your object,myData
, is non-numeric - all numbers are wrapped in quotation marks and have leading whitespace - why is this? How was gene_count_matrix.csv produced? Please show your screen to the person who produced this file (gene_count_matrix.csv).Later, when you run
sampleInfo <- read.csv("PHENO_DATA.csv")
, you can see that it is not detecting the delimiter. Please use, withread.csv()
, the correct value forsep
, which is usuallysep = ','
orsep = '\t'
.