edgeR: Error in taglist[[i]] : subscript out of bounds
2
0
Entering edit mode
Nick N ▴ 60
@nick-n-6370
Last seen 8.6 years ago
United Kingdom

I have rna-seq data (18 samples) from which I produced the raw counts using htseq-count. Now I want to analyze the data using edgeR. I've done this dozens of times and had no issues. Not this time. When I execute

samples <- read.csv(file="metadata_composite.csv",header=TRUE,sep=",")
counts <- readDGE(samples$CountFiles)$counts

I get:

Error in taglist[[i]] : subscript out of bounds

I've analyzed the same data using Deseq2 and had no issues whatsoever. So it is something related to edgeR. Here is my data - it consists of 18 directories each containing a count file (accepted_hits.count) and a directory called edgeR which contains "metadata_composite.csv" which is the metadata file that I load in in the code snippet above. 

Can you tell what is the problem?

 

 

edger rna-seq • 3.8k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 15 hours ago
The city by the bay

You've got a couple of duplicated entries in samples$CountFiles. Each entry of taglist is named according to the count file, so if you have the same count file for different samples, information from the later samples will overwrite that of earlier samples instead of forming a new entry in taglist. The offending samples seem to be WT-H2 and WT-H3 in your CSV file, both of which refer to the WT-H1 directory to extract the count file. Redirect these to get the count files from their appropriate directories, and you should be fine.

ADD COMMENT
0
Entering edit mode

The same error I got as Nick N.

> s=read.csv("Samples.csv")

> counts=readDGE(s$cf)$counts

Error in taglist[[i]] : subscript out of bounds

I can't able  to open his link that he provided  and I couldn't able to find error in my csv file. can u pls point out what are changes to make for rectify the error.

LibraryName LibraryLayout fastq1 fastq2 condition shortname
ATCC26_30 PAIRED AT30.left.fastq AT30.right.fastq Saphrophyre AT_30
ATCC26_37 PAIRED AT37.left.fastq AT37.right.fastq Saphrophyre AT_37
CI1123_30 PAIRED CI30.left.fastq CI30.right.fastq Surgery CI_30
CI1123_37 PAIRED CI37.left.fastq CI37.right.fastq Surgery CI_37
CI1698_30 PAIRED YS30.left.fastq YS30.right.fastq Healed YS_30
CI1698_37 PAIRED YS37.left.fastq YS37.right.fastq Healed YS_37

 

The above meta table information which I am using to run the commands.

ADD REPLY
0
Entering edit mode

Well, for starters, there's no cf column in your metadata table.

ADD REPLY
0
Entering edit mode

I have used paste function for adding cf column to Samples.csv file. In terminal window I could able to find the column  cf with its values but I could not able to find the cf column in csv file when I opened directly.  I manually entered the cf column in csv fiile, but still different error occurred like

counts=readDGE(s$cf)$counts
Error in read.table(file = file, header = header, sep = sep, quote = quote,  :
  no lines available in input.

I have one doubt , without variable name we can't read the csv file in R, am I right?... Why I am asking means the protocol which I am using to follow ,they directly called csv file into R and manipulating the columns.But I could not able to do that.

Any suggestion to rectify the error

Error in read.table(file = file, header = header, sep = sep, quote = quote,  :
  no lines available in input.

 

 

ADD REPLY
0
Entering edit mode

It's not clear to me what you're actually doing. All I can say is to check that:

  • there is a column named cf in your data frame s.
  • s$cf is a character vector that contains paths to count files.
  • the count files aren't empty and are properly formatted.

I would also suggest that you find someone local to help you, since it seems like you're new to this. This support site is not meant to be a place to learn R or Bioconductor.

ADD REPLY
1
Entering edit mode
@gordon-smyth
Last seen 4 hours ago
WEHI, Melbourne, Australia

To add to Aaron's post, there are also a couple of tunes that you need to use to accommodate htseq output. First, htseq output has no headers, so

dge <- readDGE(samples$CountFiles, header=FALSE)

Second, the last 5 lines of htseq output are not real genes, so you need to remove them:

realgene <- grep("^ENS",rownames(dge))
dge <- dge[realgene,]
ADD COMMENT

Login before adding your answer.

Traffic: 809 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6