phenotypic table for DESeqDataSetFromHTSeqCount
1
0
Entering edit mode
rbenel ▴ 40
@rbenel-13642
Last seen 19 months ago
Israel

Hi, I am trying to use a sampleTable from a .txt file as I normally would when importing counts as a count matrix using DESeqDataSetFromMatrix however when I do this the function doesn't find the proper path for the files. I receive the error message listed below.

I have tried to assign the rownames of the table to be the same as the names of each sample in the directory and that hasn't helped either.

Any ideas what I am missing?

If I use the suggested code in the tutorial i.e. make a sample table from the names of the files in the directory the function works, but I have a lot of information I would like to include in an separate sampleTable..

library(DESeq2)
ExonCountFiles <- "/Bigdata/Dropbox (Technion Dropbox)/Rina_Benel/Shared/CMV/CMV_sep19/htseq_6.5.21/exon_count"

list.files(ExonCountFiles)

TamisDesign <- read.table("/Bigdata/Dropbox (Technion Dropbox)/Rina_Benel/Home/Rina/CMV_Sep2019/data/sampleSheetTami.txt",
                        sep = "\t", header = F, row.names = 1)
head(TamisDesign)
colnames(TamisDesign) <- c("sampleName", "infectionStatus", "placentaStatus",
                           "NumWomen", "BioRep")

TamisDesign <- TamisDesign[order(TamisDesign$sampleName), ]

head(TamisDesign)

TamisDesign$NumWomen <- factor(TamisDesign$NumWomen)
TamisDesign$infectionStatus <- factor(TamisDesign$infectionStatus)
TamisDesign$placentaStatus <- factor(TamisDesign$placentaStatus)
Error in file(file, "rt") : cannot open the connection

This however does work...

sampleFiles <- list.files(ExonCountFiles)
sampleCondition <- gsub('exon_count_', "", sampleFiles)
sampleTable <- data.frame(sampleName = sampleFiles,
                          fileName = sampleFiles,
                          condition = sampleCondition)
sampleTable$condition <- factor(sampleTable$condition)

Thanks!

DESeq2 • 1.1k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 7 hours ago
United States

Check this:

file.exists(sampleFiles)

This will help you debug further.

ADD COMMENT
0
Entering edit mode

The vector contains just FALSE...

So where is the directory parameter set to read files from? Currently I have directory set to a the path of the file which includes all (80 separate files) of the output files of HTSeq...

head(list.files(ExonCountFiles))
[1] "exon_count_CMV1_0003" "exon_count_CMV1_0006" "exon_count_CMV1_0012"
[4] "exon_count_CMV1_0013" "exon_count_CMV1_0017" "exon_count_CMV1_0018"
ADD REPLY
0
Entering edit mode

Oh sorry I didn't see any code including a directory argument in your post.

Then to debug do:

file.exists(file.path(directory, sampleFiles))
ADD REPLY
0
Entering edit mode

Sorry the directory variable in my code was ExonCountFiles

A vector of TRUE :)

Which means to me that the function is looking for the files in the correct place...

So what seems to be the issue?

ADD REPLY
0
Entering edit mode

Another issue is whether R is allowed to read those files I guess.

Try reading one in with read.table or scan(..., what="char")

ADD REPLY
0
Entering edit mode

Yea, no issue there...

FirstFile <- read.table(file.path(ExonCountFiles, sampleFiles)[1])
head(FirstFile)
                  V1 V2
1 ENSG00000000003.15 61
2  ENSG00000000005.6  0
3 ENSG00000000419.12  7
4 ENSG00000000457.14 27
5 ENSG00000000460.17  0
6 ENSG00000000938.13 35
ADD REPLY
0
Entering edit mode

alternatively is there a work around for DESeqDataSetFromHTSeqCount. I guess I could convert it to a count matrix and then use DESeqDataSetFromMatrix?

ADD REPLY
0
Entering edit mode

I can't help debug because you haven't posted the code where you actually get the error. Can you post that, and also the output from traceback() after you get the error.

ADD REPLY
0
Entering edit mode

I posted it above, but maybe it wasn't clear. Here it is again:

library(DESeq2)
ExonCountFiles <- "/Bigdata/Dropbox (Technion Dropbox)/Rina_Benel/Shared/CMV/CMV_sep19/htseq_6.5.21/exon_count"

head(list.files(ExonCountFiles))

[1] "exon_count_CMV1_0003" "exon_count_CMV1_0006" "exon_count_CMV1_0012"
[4] "exon_count_CMV1_0013" "exon_count_CMV1_0017" "exon_count_CMV1_0018"


TamisDesign <- read.table("/Bigdata/Dropbox (Technion Dropbox)/Rina_Benel/Home/Rina/CMV_Sep2019/data/sampleSheetTami.txt",
                        sep = "\t", header = F, row.names = 1)
head(TamisDesign)
colnames(TamisDesign) <- c("sampleName", "infectionStatus", "placentaStatus",
                           "NumWomen", "BioRep")

TamisDesign <- TamisDesign[order(TamisDesign$sampleName), ]

head(TamisDesign)

TamisDesign$NumWomen <- factor(TamisDesign$NumWomen)
TamisDesign$infectionStatus <- factor(TamisDesign$infectionStatus)
TamisDesign$placentaStatus <- factor(TamisDesign$placentaStatus)

file.exists(ExonCountFiles)

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = TamisDesign,
                                       directory = ExonCountFiles,
                                       design = ~ infectionStatus + placentaStatus)

Here is the error that the DESeqDataSetFromHTSeqCount produces

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file '/Bigdata/Dropbox (Technion Dropbox)/Rina_Benel/Shared/CMV/CMV_sep19/htseq_6.5.21/exon_count/mock': No such file or directory

It is clear from the end of the path that the error includes "ending in mock" that the function is not reading the file names properly. Since there is no file that starts with mock

ADD REPLY
1
Entering edit mode

The documentation says:

sampleTable - for htseq-count: a data.frame with three or more columns. Each row describes one sample. The first column is the sample name, the second column the file name of the count file generated by htseq-count, and the remaining columns are sample metadata which will be stored in colData.

It isn't clear from the above that you've specified the file name (the second column) in TamisDesign. It looks like you second column is the infection status.

ADD REPLY
0
Entering edit mode

Thank you!

I guess in the DESeqDataSetFromMatrix function the rownames of the sampleTable are the same as the colnames of the counts and so that is how my sampleTable was set up.

ADD REPLY

Login before adding your answer.

Traffic: 504 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6