Entering edit mode
Matteo Carrara
▴
20
@matteo-carrara-5734
Last seen 10.3 years ago
Dear Alejandro,
thank you for the quick reply. After you mentioned the coherence of
the input, I dug more in it. It looks like the problem was quite
trivial: I accidentally left the "paired" option of the counting
script to the default, event though my datasets were all paired-end. I
have run the script again with the correct parameters and the loading
ends successfully.
Your question about the samples is far from being unrelated, since it
has impact on the next steps of the analysis.
When I wrote the message I was just testing the function, so I limited
the loading to a single dataset. The real design comprises two
different conditions with two biological replicates, each of them with
two technical replicates, for a total of four datasets per condition.
I decided to use DEXSeq exactly to assess the differential exon usage
Thank you again.
Best,
--
Matteo Carrara
PhD Student in Complex Systems for Life Sciences
Department of Biotechnology and Health Sciences
MBC - Molecular Biotechnology Center
via Nizza, 52 Torino
ITALY
On Tue, Jan 29, 2013 at 6:49 PM, Alejandro Reyes
<alejandro.reyes at="" embl.de=""> wrote:
>
> Dear Matteo Carrara,
>
> Could you add to this e-mail the first 10 and last 10 lines of your
input files (both counts and annotation files produced by the python
scripts)?
>
> On unrelated topics, I noticed that you have only one sample, what
exactly do you want to do with DEXSeq in this case?
> Note that DEXSeq is designed to test for differences in exon usage
between different conditions with replicates.
>
> Best wishes,
> Alejandro Reyes
>
>
>> Hello,
>>
>> I have been trying to learn how to perform a differential
expression
>> analysis of RNA-seq data using the DEXSeq package lately and I
encountered
>> an unexpected behaviour in the function read.HTSeqCount: the
function fails
>> to load the file obtained from the python script "dexseq_count.py"
with the
>> following error message:
>>
>> Error in strsplit(rownames(dcounts), ":") : non-character argument
>>
>> I would really appreciate any pointers that might help me correct
my code
>> or my input files.
>>
>> Here is what I have done:
>> - downloaded the mm9 GTF gene set from www.ensembl.org and run the
script
>> "dexseq_prepare_annotation.py"
>> - mapped my raw RNA-seq reads on the mm9 genome using tophat,
converting
>> the output in sorted SAM format
>> - run the script "dexseq_count.py" using the "flattened" GTF and
the SAM
>> file obtained before
>> - loaded the dataset in R using the function read.HTSeqCount() as
following:
>>
>> --------------------------------------
>>>
>>> library(DEXSeq)
>>> wt<-read.HTSeqCount("./wt_mapped.counts", "WT",
>>
>> flattenedfile="./flattened_mm9.gtf")
>>
>>
>> Error in strsplit(rownames(dcounts), ":") : non-character argument
>> --------------------------------------
>>
>> As far as I could understand, the "pasilla" package, used for the
examples
>> in the vignette, provided a counts file under the name
>> "pasilla_gene_counts.tsv". Loading that file, however, results in
the same
>> error message.
>>
>> All I could do was pinpointing the source of the error in the code
of the
>> function, although that did not help me in finding a solution or a
>> workaround:
>> After creating the data frame "dcounts" storing the counts and
setting the
>> row names, that same data frame is sub-set
>>
>> dcounts <- dcounts[substr(rownames(dcounts), 1, 1) != "_",
>> ]
>>
>> This code, however changes the object dcounts in such a way that
the
>> "rownames()" function returns NULL. The next statement is then
bound to
>> fail since it requires rownames(dcounts) to be a character or a
vector of
>> characters:
>>
>> genesrle <- sapply(strsplit(rownames(dcounts), ":"), "[[",
>> 1)
>>
>> I am running R 2.15.2 and DEXSeq 1.4.0 from Bioconductor version
2.11,
>> but I was able to reproduce this on the devel version of R
(2013-01-22
>> r61734) using DEXSeq_1.5.6 from Bioconductor version 2.12.
>>
>> ---------------------------------
>>>
>>> sessionInfo()
>>
>> R version 2.15.2 (2012-10-26)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods
base
>>
>> other attached packages:
>> [1] BiocInstaller_1.8.3 DEXSeq_1.4.0 Biobase_2.18.0
>> [4] BiocGenerics_0.4.0
>>
>> loaded via a namespace (and not attached):
>> [1] biomaRt_2.14.0 hwriter_1.3 RCurl_1.95-3 statmod_1.4.16
>> stringr_0.6.2
>> [6] tools_2.15.2 XML_3.95-0.1
>> --------------------------------
>>
>> Thank you in advance for any help you can provide.
>> Best Regards,
>
>
--
Matteo Carrara