phenotypic table for DESeqDataSetFromHTSeqCount
1
0
Entering edit mode
rbenel ▴ 20
@rbenel-13642
Last seen 4 days ago
Israel

Hi, I am trying to use a sampleTable from a .txt file as I normally would when importing counts as a count matrix using DESeqDataSetFromMatrix however when I do this the function doesn't find the proper path for the files. I receive the error message listed below.

I have tried to assign the rownames of the table to be the same as the names of each sample in the directory and that hasn't helped either.

Any ideas what I am missing?

If I use the suggested code in the tutorial i.e. make a sample table from the names of the files in the directory the function works, but I have a lot of information I would like to include in an separate sampleTable..

library(DESeq2)
ExonCountFiles <- "/Bigdata/Dropbox (Technion Dropbox)/Rina_Benel/Shared/CMV/CMV_sep19/htseq_6.5.21/exon_count"

list.files(ExonCountFiles)

sep = "\t", header = F, row.names = 1)
colnames(TamisDesign) <- c("sampleName", "infectionStatus", "placentaStatus",
"NumWomen", "BioRep")

TamisDesign <- TamisDesign[order(TamisDesign$sampleName), ] head(TamisDesign) TamisDesign$NumWomen <- factor(TamisDesign$NumWomen) TamisDesign$infectionStatus <- factor(TamisDesign$infectionStatus) TamisDesign$placentaStatus <- factor(TamisDesign$placentaStatus)  Error in file(file, "rt") : cannot open the connection  This however does work... sampleFiles <- list.files(ExonCountFiles) sampleCondition <- gsub('exon_count_', "", sampleFiles) sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, condition = sampleCondition) sampleTable$condition <- factor(sampleTable$condition)  Thanks! DESeq2 • 216 views ADD COMMENT 1 Entering edit mode @mikelove Last seen 9 hours ago United States Check this: file.exists(sampleFiles) This will help you debug further. ADD COMMENT 0 Entering edit mode The vector contains just FALSE... So where is the directory parameter set to read files from? Currently I have directory set to a the path of the file which includes all (80 separate files) of the output files of HTSeq... head(list.files(ExonCountFiles)) [1] "exon_count_CMV1_0003" "exon_count_CMV1_0006" "exon_count_CMV1_0012" [4] "exon_count_CMV1_0013" "exon_count_CMV1_0017" "exon_count_CMV1_0018"  ADD REPLY 0 Entering edit mode Oh sorry I didn't see any code including a directory argument in your post. Then to debug do: file.exists(file.path(directory, sampleFiles))  ADD REPLY 0 Entering edit mode Sorry the directory variable in my code was ExonCountFiles A vector of TRUE :) Which means to me that the function is looking for the files in the correct place... So what seems to be the issue? ADD REPLY 0 Entering edit mode Another issue is whether R is allowed to read those files I guess. Try reading one in with read.table or scan(..., what="char") ADD REPLY 0 Entering edit mode Yea, no issue there... FirstFile <- read.table(file.path(ExonCountFiles, sampleFiles)[1]) head(FirstFile) V1 V2 1 ENSG00000000003.15 61 2 ENSG00000000005.6 0 3 ENSG00000000419.12 7 4 ENSG00000000457.14 27 5 ENSG00000000460.17 0 6 ENSG00000000938.13 35  ADD REPLY 0 Entering edit mode alternatively is there a work around for DESeqDataSetFromHTSeqCount. I guess I could convert it to a count matrix and then use DESeqDataSetFromMatrix? ADD REPLY 0 Entering edit mode I can't help debug because you haven't posted the code where you actually get the error. Can you post that, and also the output from traceback() after you get the error. ADD REPLY 0 Entering edit mode I posted it above, but maybe it wasn't clear. Here it is again: library(DESeq2) ExonCountFiles <- "/Bigdata/Dropbox (Technion Dropbox)/Rina_Benel/Shared/CMV/CMV_sep19/htseq_6.5.21/exon_count" head(list.files(ExonCountFiles)) [1] "exon_count_CMV1_0003" "exon_count_CMV1_0006" "exon_count_CMV1_0012" [4] "exon_count_CMV1_0013" "exon_count_CMV1_0017" "exon_count_CMV1_0018" TamisDesign <- read.table("/Bigdata/Dropbox (Technion Dropbox)/Rina_Benel/Home/Rina/CMV_Sep2019/data/sampleSheetTami.txt", sep = "\t", header = F, row.names = 1) head(TamisDesign) colnames(TamisDesign) <- c("sampleName", "infectionStatus", "placentaStatus", "NumWomen", "BioRep") TamisDesign <- TamisDesign[order(TamisDesign$sampleName), ]

TamisDesign$NumWomen <- factor(TamisDesign$NumWomen)
TamisDesign$infectionStatus <- factor(TamisDesign$infectionStatus)
TamisDesign$placentaStatus <- factor(TamisDesign$placentaStatus)

file.exists(ExonCountFiles)

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = TamisDesign,
directory = ExonCountFiles,
design = ~ infectionStatus + placentaStatus)


Here is the error that the DESeqDataSetFromHTSeqCount produces

Error in file(file, "rt") : cannot open the connection
In file(file, "rt") :
cannot open file '/Bigdata/Dropbox (Technion Dropbox)/Rina_Benel/Shared/CMV/CMV_sep19/htseq_6.5.21/exon_count/mock': No such file or directory


It is clear from the end of the path that the error includes "ending in mock" that the function is not reading the file names properly. Since there is no file that starts with mock

1
Entering edit mode

The documentation says:

sampleTable - for htseq-count: a data.frame with three or more columns. Each row describes one sample. The first column is the sample name, the second column the file name of the count file generated by htseq-count, and the remaining columns are sample metadata which will be stored in colData.

It isn't clear from the above that you've specified the file name (the second column) in TamisDesign. It looks like you second column is the infection status.

0
Entering edit mode

Thank you!

I guess in the DESeqDataSetFromMatrix function the rownames of the sampleTable are the same as the colnames of the counts and so that is how my sampleTable was set up.

Traffic: 452 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.