DEXSeq Error in lf[[1]] : subscript out of bounds
3
0
Entering edit mode
rahel14350 ▴ 10
@rahel14350-8472
Last seen 5.8 years ago
United States

Dear all,
I have a large RNA-seq data set with 63 samples in 12 condition with different number of replicates. I want to run R on the server to make it faster, but I have this error.
Would you please help me. error:
"Error in lf[[1]] : subscript out of bounds
Calls: DEXSeqDataSetFromHTSeq
Execution halted"

I have R version 3.2.0. here is a short example of the codes I am using:
require("methods")
library("DEXSeq")
library(parallel)


countFiles=list.files(inDir, pattern="*dexseq_count.txt", full.names=TRUE)

flattenedFile = list.files(inDir, pattern="Genecode.v19.DEXSeq.exon-introns.gtf",full.names=TRUE)

sampleTable = data.frame(row.names=factor(c("C1675", "C1492", "D0742", "D0743", "C1670", "C1671")),condition=factor(c("CD_SIG_b", "CD_SIG_nb", "CD_TILE_b", "CD_TILE_b", "CD_TILE_nb", "CD_TILE_nb")),replicate=factor(c("1","2","1","2","1","2")), type=factor(c(rep("paired-end",6))),sex=factor(c("w", "w", "m", "m", "w", "w"))).

Many thanks,

DEXSeq • 3.6k views
ADD COMMENT
1
Entering edit mode

Hi Rahel,

You seem to be double posting in the forum, which makes the problem difficult to trace.  Could you show the first lines of your annotation file as well as your count files?

Alejandro

ADD REPLY
0
Entering edit mode
rahel14350 ▴ 10
@rahel14350-8472
Last seen 5.8 years ago
United States

Dear Alejandro,

Thanks for your reply. I am sorry for double posting, I will delete the rest of posts.

Here is:

> sampleAnnotation( dxd )
DataFrame with 63 rows and 5 columns
      sample  condition replicate       type      sex
    <factor>   <factor>  <factor>   <factor> <factor>
1      C1675   CD_SIG_b         1 paired-end        w
2      C1492   CD_SIG_b         2 paired-end        w
3      D0742   CD_SIG_b         3 paired-end        w
4      D0743   CD_SIG_b         4 paired-end        m
5      C1670  CD_SIG_nb         1 paired-end        w
...      ...        ...       ...        ...      ...
59     C1650 UC_TILE_nb         2 paired-end        m
60     C1653 UC_TILE_nb         3 paired-end        w
61     C1652 UC_TILE_nb         4 paired-end        m
62     C2484 UC_TILE_nb         5 paired-end        w
63     C1678 UC_TILE_nb         6 paired-end        w

> colData(dxd)
DataFrame with 126 rows and 6 columns
      sample  condition replicate       type      sex     exon
    <factor>   <factor>  <factor>   <factor> <factor> <factor>
1      C1675   CD_SIG_b         1 paired-end        w     this
2      C1492   CD_SIG_b         2 paired-end        w     this
3      D0742   CD_SIG_b         3 paired-end        w     this
4      D0743   CD_SIG_b         4 paired-end        m     this
5      C1670  CD_SIG_nb         1 paired-end        w     this
...      ...        ...       ...        ...      ...      ...
122    C1650 UC_TILE_nb         2 paired-end        m   others
123    C1653 UC_TILE_nb         3 paired-end        w   others
124    C1652 UC_TILE_nb         4 paired-end        m   others
125    C2484 UC_TILE_nb         5 paired-end        w   others
126    C1678 UC_TILE_nb         6 paired-end        w   others
> head( counts(dxd), 5 )
                        [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ENSG00000000003.10:E001  422  476  485  220  935  201  103  187  100   458
ENSG00000000003.10:E002  124  159  188   91  336   73   44   58   49   135
ENSG00000000003.10:E003   88  119  124   61  238   54   28   36   44    91
ENSG00000000003.10:E004   83   94   97   49  200   45   18   31   42    91
ENSG00000000003.10:E005   86  109   97   48  222   50   18   33   44    91


I also tried the parallelization to make the DEXSeq to run faster without script (last run took me more that 10 days and I just stop it):

> library(BiocParallel)
> BPPARAM = MultiCoreParam(workers=4)
Error: could not find function "MultiCoreParam"


and I have error.

I am thankful if you can help in any of those errors.

Kind Regards,

Rahel

ADD COMMENT
1
Entering edit mode

Hi Rahel, 

For this specific error, it is simply that the function is misspelled (it is MulticoreParam).

Re- running times, I would suggest to parallelize also by chromosomes and you could also filter counting bins with small counts. 

Alejandro 

ADD REPLY
0
Entering edit mode
rahel14350 ▴ 10
@rahel14350-8472
Last seen 5.8 years ago
United States

Dear Alejandro,

Many many thanks for your reply. It worked now.

As I am new in DEXSeq and I just get the data set from a colleagues, What do you mean of paralleize by choromosome number? Do I need to add more choromosome info on my input data?

I am so grateful for your help.

Rahel

ADD COMMENT
0
Entering edit mode

Hi Rahel,

DEXSeq fits several GLMs for each column of the DEXSeqDataSet object, and when increasing the number of samples it can take some time to compute. For this cases it is useful to run DEXSeq for each chromosome independently. To answer your question, in the rowRanges of the DEXSeqDataSet object you will find the information of the chromosomes, which can be used to split the DEXSeqDataSet object.

Alejandro

 

 

 

ADD REPLY
0
Entering edit mode
rahel14350 ▴ 10
@rahel14350-8472
Last seen 5.8 years ago
United States

Dear Alejandro,

Many thanks for explanation and your help. I am going to run it again. I hope it work now. I might come back with new question!

Kind Regards,

Rahel

ADD COMMENT

Login before adding your answer.

Traffic: 482 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6