Question

undefined columns selected

0

Entering edit mode

waltsonwang88 • 0

@405f7487

Last seen 3.2 years ago

United States

Hi,

I was doing differential genes analysis with DESeq2. My experiment design is very simple, I used a drug to treat the cancer cell line, I did total RNAseq for both control and drug-treated, and each of them had three repetitions. When I did the "Convert sample variable mappings to an appropriate form that DESeq2 can read" task, I got a error, "Error in [.data.frame(sampleInfo, , keep) : undefined columns selected".I tried "many answers", but still failed. Really need help!

the detailed codes as below:

``` 
> myData <- read.csv("gene_count_matrix.csv")
> head(myData)
              gene_id sample1 sample2 sample3 sample4 sample5 sample6
1     MSTRG.1|DDX11L1      12       8      13      20      28      39
2             MSTRG.2    8647    8341    8044   18022   17711   20429
3 MSTRG.3|MIR1302-2HG       0       0       0       0       0       0
4   MSTRG.3|MIR1302-2       0       0       0       2       0       0
5     MSTRG.4|FAM138A       0       0       0       0       0       0
6      MSTRG.5|OR4G4P       0       0       0       0       0       0
> geneID <- myData$gene_id
> sampleIndex <- grepl("sample\\d+",colnames(myData))
> myData <- as.matrix(myData[,sampleIndex])
> rownames(myData) <- geneID
> head(myData)
                    sample1   sample2   sample3   sample4   sample5   sample6  
MSTRG.1|DDX11L1     "     12" "      8" "     13" "     20" "     28" "     39"
MSTRG.2             "   8647" "   8341" "   8044" "  18022" "  17711" "  20429"
MSTRG.3|MIR1302-2HG "      0" "      0" "      0" "      0" "      0" "      0"
MSTRG.3|MIR1302-2   "      0" "      0" "      0" "      2" "      0" "      0"
MSTRG.4|FAM138A     "      0" "      0" "      0" "      0" "      0" "      0"
MSTRG.5|OR4G4P      "      0" "      0" "      0" "      0" "      0" "      0"
> sampleInfo <- read.csv("PHENO_DATA.csv")
> head(sampleInfo)
     ids.........groups
1  sample1     control1
2  sample2     control2
3  sample3     control3
4 sample4     lycorine1
5 sample5     lycorine2
6 sample6     lycorine3
> rownames(sampleInfo) <- sampleInfo$ids
> keep <- c("ids", "groups")
> sampleInfo <- sampleInfo[,keep]
Error in `[.data.frame`(sampleInfo, , keep) : undefined columns selected

DESeq2 • 3.1k views

ADD COMMENT • link updated 3.6 years ago by Kevin Blighe ★ 4.0k • written 3.6 years ago by waltsonwang88 • 0

score 0 · Answer 1 · 2021-07-07

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 10 hours ago

San Diego

Clearly, all that stuff you did after importing the count data made things worse. Why can't you set the rownames as you import?

ADD COMMENT • link 3.6 years ago swbarnes2 ★ 1.4k

0

Entering edit mode

Hi, Do you mean just leave the PHENO_DATA.csv as it was? But how to delete the first column, I mean the numbers "1, 2, 3, 4, 5, 6"? I checked a lot of examples, they set the sampleData (sampleInfo in my case) as below:

ids groups sample1 control1 sample2 control2 sample3 control3 sample4 lycorine1 sample5 lycorine2 sample6 lycorine3

ADD REPLY • link 3.6 years ago waltsonwang88 • 0

0

Entering edit mode

You just need to do something like:

myData <- read.csv('gene_count_matrix.csv', row.names = 1, header = TRUE)

Then myData should represent numerical data, and the other issues should vanish.

By the way, your question is unrelated to DESeq2 and should probably have been asked on a more generic bioinformatics website.

ADD REPLY • link 3.6 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Kevin Blighe Blighe Thank you for your help! Do you mean I got this error because I load my count data in a wrong way? But I got the error "Error in [.data.frame(sampleInfo, , keep) : undefined columns selected" from my sampleInfo data.

ADD REPLY • link 3.6 years ago waltsonwang88 • 0

1

Entering edit mode

Yes, my suggestion will help to solve the ultimate error that you receive. Please check the input and output of every command that you are running; however, please start by first using:

myData <- read.csv('gene_count_matrix.csv', row.names = 1, header = TRUE)

There are other general issues. For example, when you run this, head(myData), one can clearly see how your object, myData, is non-numeric - all numbers are wrapped in quotation marks and have leading whitespace - why is this? How was gene_count_matrix.csv produced? Please show your screen to the person who produced this file (gene_count_matrix.csv).

Later, when you run sampleInfo <- read.csv("PHENO_DATA.csv"), you can see that it is not detecting the delimiter. Please use, with read.csv(), the correct value for sep, which is usually sep = ',' or sep = '\t'.

ADD REPLY • link 3.6 years ago Kevin Blighe ★ 4.0k