Entering edit mode
Elena Sorokin
▴
160
@elena-sorokin-4659
Last seen 10.2 years ago
Hello again,
As a result of my recent post, I'm now delving into the DEXseq
package.
But I am having struggling with building my exonCountSet object, as
described in the vignette/ R documentation. First I went through the
example in the vignette, and was able to complete the pasilla
analysis.
With my own data, though, the differential expression testing doesn't
work, and additionally, the annotation GFF file doesn't seem to have
loaded. Could somebody kindly write out the exact code for reading in
the HTSeq counts files using the pasilla input files (p.11 of the
vignette), or better yet, point out where my code below is wrong?
My goal is to be able to use the neat graphics as well as DE testing
on
my dataset, so I don't want to do the bare-bones loading. My code is
posted below... I apologize that it's lengthy.
Many thanks in advance to anybody out there who can help me,
Elena
_____________________
# For reference, I have four counts files, called veh1_counts.txt,
veh2_counts.txt, drug1_counts.txt, drug2_counts.txt that I made using,
for example:
*python dexseq_count.py DEXSeq_annotations.gff <samfile>
veh1_counts.txt *
and one GFF file, called DEXseq_annotations.gff, that I made using:
*
python dexseq_prepare_annotation.py Ce.WS220.65_copy.gtf
DEXSeq_annotations.gff*
# load annotation file
> annotationfile = file.path("DEXSeq_annotations.gff")
> annotationfile
[1] "DEXSeq_annotations.gff"
# I made a data frame called "samples":
> samples
condition replicate
veh1 vehicle 1
veh2 drug 1
drug1 vehicle 2
drug2 drug 2
# make exon count set object
ecs = read.HTSeqCounts(countfiles = file.path("C:\\Rdata\\DEXseq",
paste(paste(rownames(samples),"counts", sep="_"), "txt", sep=".")),
design = samples,
flattenedfile = annotationfile)
# I thought everything seemed OK, so went on...
# Size factors
> ecs <- estimateSizeFactors(ecs)
# Dispersion
> ecs <- estimateDispersions(ecs)
# Fit Dispersion
> ecs <- fitDispersionFunction(ecs)
# Plot Individual exons via mean expression
> meanvalues <- rowMeans(counts(pasillaExons))
> plot(meanvalues, fData(pasillaExons)$dispBeforeSharing, log="xy",
main="mean vs CR dispersion")
Error in plot.window(...) : need finite 'ylim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) :
28 x values <= 0 omitted from logarithmic plot
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
# Test for Expression Differences
> test <- testGeneForDEU(ecs)
Error in geneIDs(ecs) == geneID : 'geneID' is missing
# There must be something wrong with the geneIDs column in my exon
count set
> head(geneIDs(ecs))
2L52.1:001 2L52.1:002 2L52.1:003 2L52.1:004 2L52.1:005 2L52.1:006
2L52.1 2L52.1 2L52.1 2L52.1 2L52.1 2L52.1
41112 Levels: 2L52.1 2L52.2 2RSSE.1 2RSSE.2 2RSSE.3 2RSSE.4 2RSSE.5
2RSSE.6 2RSSE.7 2RSSE.8 3R5.1+K08E3.8 3R5.2 4R79.1 4R79.2 4R79.4
4R79.5
6R55.2 AC3.1 ... ZK994.t3
# Are there any other problems with my exon count set? Yes, also the
annotation file is not loaded.
> ecs
ExonCountSet (storageMode: environment)
assayData: 176762 features, 4 samples
element names: counts
protocolData: none
phenoData
sampleNames: veh1 veh2 drug1 drug2
varLabels: sizeFactor condition replicate type
varMetadata: labelDescription
featureData
featureNames: 2L52.1:001 2L52.1:002 ... ZK994.t3:001 (176762 total)
fvarLabels: geneID exonID ... transcripts (13 total)
fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:
# Looking at the fData, I note that annotation info is missing.
> head(fData(ecs))
geneID exonID testable dispBeforeSharing dispFitted
dispersion pvalue padjust chr start end strand transcripts
2L52.1:001 2L52.1 E001 FALSE NA 0.5118
0.5118 NA NA <na> NA NA <na> <na>
2L52.1:002 2L52.1 E002 TRUE 1.15e-09 0.0884
0.0884 NA NA <na> NA NA <na> <na>
2L52.1:003 2L52.1 E003 TRUE 1.40e-02 0.1418
0.1418 NA NA <na> NA NA <na> <na>
2L52.1:004 2L52.1 E004 TRUE 1.54e-09 0.2004
0.2004 NA NA <na> NA NA <na> <na>
2L52.1:005 2L52.1 E005 TRUE 6.67e-10 0.0963
0.0963 NA NA <na> NA NA <na> <na>
2L52.1:006 2L52.1 E006 TRUE 2.16e-09 0.1222
0.1222 NA NA <na> NA NA <na> <na>
> sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] pasilla_0.2.11 DESeq_1.6.1 locfit_1.5-6 lattice_0.20-0
akima_0.5-7 DEXSeq_1.0.2 Biobase_2.14.0
loaded via a namespace (and not attached):
[1] annotate_1.32.1 AnnotationDbi_1.16.11 DBI_0.2-5
genefilter_1.36.0 geneplotter_1.32.1 grid_2.14.0
hwriter_1.3
[8] IRanges_1.12.6 plyr_1.7.1 RColorBrewer_1.0-5
RSQLite_0.11.1 splines_2.14.0 statmod_1.4.14
stringr_0.6
[15] survival_2.36-10 tools_2.14.0 xtable_1.6-0
>
[[alternative HTML version deleted]]