Please read the posting guide and do not break the thread from the
list.
More will learn from the dialogue, and you may have the benefit of
expertise
of other readers who know more about SNP file formats than I do.
Clearly this is not a problem with GGtools. You have a ped file that
read.snps.pedfile cannot process. This is one of the drawbacks of
using
file formats that lack formal validators that can give specific
indications
on format violations. I cannot see the flaw in the ped file based on
your
excerpt, but see below.
On Wed, Mar 31, 2010 at 10:38 PM, claire pujoll <cpujoll@gmail.com>
wrote:
>
> Dear Prof Carey,
>
> many thanks for your time and quick response.
>
> I read your two papers discussing the data structures for genetics
of gene
> expression. I think it will be useful to add a note in the new
manuel to
> tell the reader that smlSet has be replaced by racExSet as there as
several
> tutorial and vignette on the web discussing racExSet.
>
I will revise the vignette to indicate that the former racExSet class
is
superseded by the smlSet class.
>
> # GWAS data
> setwd("home/hamming/SNPmatrix")
> map <- read.pedfile.map("Filt_CommonSNPChrom24.map")
> ped <- read.snps.pedfile("Filt_CommonSNPChrom24.ped")
>
>
> # Gene expression data
> library("wp5Express")
> data(wp5Express)
> es <- exprs(wp5Express)
>
> ## Create smlSet class
> smlSetTest <- make_smlSet(es, # Gene
expression
> sml,
# SNP
> data
> organism = "Homo sapiens")
>
>
> I tried to read SNP data but I had an error. Please note that I have
PED
> and MAP by chromosome and
> also .txt genotype file with SNP as columon and row as individuals.
>
> > ped <- read.snps.pedfile("Filt_CommonSNPChrom24.ped")
> Found accompanying map file, reading it first
> Reading Filt_CommonSNPChrom24.ped ...
> Can take a while...
> Found 32 fields in the header, i.e. 13 snps, hopefully...
> malformed input line: [CXMA-01-001 CXMA-01-001 0 0
> 1 2 C T A C C C T T G G A G G T
G
> T C T G G C T C C A C
> ...]
> Error in read.snps.pedfile("Filt_CommonSNPChrom24.ped") :
> ped file invalid - see error message above
>
>
You must try to make a valid ped file for your data. Here is an
example of
a valid one:
1 1 0 0 1 1 A A G T
2 1 0 0 1 1 A C T G
3 1 0 0 1 1 C C G G
4 1 0 0 1 2 A C T T
5 1 0 0 1 2 C C G T
6 1 0 0 1 2 C C T T
it is test.ped from the plink distribution
It seems to me that having hyphenated text in the second column, as in
your
example, may be problematic.
the associated map file is:
1 snp1 0 1
1 snp2 0 2
if we apply read.snps.pedfile to test.ped, we see
> mys = read.snps.pedfile("test.ped")
Found accompanying map file, reading it first
Reading test.ped ...
Can take a while...
Found 10 fields in the header, i.e. 2 snps, hopefully...
last line [5] : 6 1 0 0 1 2 C C T...
EOF reached after 6 samples
Read 6 samples from input, now converting...
...done, 6 samples with 2 snps
> mys
$snp.data
A snp.matrix with 6 rows and 2 columns
Row names: Family.1.Individual.1 ... Family.6.Individual.1
Col names: snp1 ... snp2
$subject.support
Family Member Father Mother Sex Affected
Family.1.Individual.1 1 1 NA NA Male Unaffected
Family.2.Individual.1 2 1 NA NA Male Unaffected
Family.3.Individual.1 3 1 NA NA Male Unaffected
Family.4.Individual.1 4 1 NA NA Male Affected
Family.5.Individual.1 5 1 NA NA Male Affected
Family.6.Individual.1 6 1 NA NA Male Affected
$snp.support
snp.names position chromosome assignment
snp1 snp1 1 1 A/C
snp2 snp2 2 1 G/T
Let us make an smlSet using this and related information. First we
need an
ExpressionSet
instance
ex = matrix(rnorm(12), nr=2) # fake expression data
rownames(ex) = c("A", "B")
es = new("ExpressionSet", exprs=ex)
es$x = rnorm(6) # fake covariate
of course more metadata should be supplied, but at this point
> es
ExpressionSet (storageMode: lockedEnvironment)
assayData: 2 features, 6 samples
element names: exprs
protocolData: none
phenoData
sampleNames: 1, 2, ..., 6 (6 total)
varLabels and varMetadata description:
x: NA
featureData: none
experimentData: use 'experimentData(object)'
Annotation:
now make the smlSet:
msl = make_smlSet(es, list(`1`=mys$snp.data))
sampleNames(msl) = rownames(mys$snp.data)
we need a name on the snp matrix list elements, to enumerate
chromosomes --
you can use any relevant tokenization. for most examples, we have one
named
element per chromosome, but other organizations (such as a single
snp.matrix
that is genome-wide) are possible
once this is in hand, we can perform tests:
> f1 = gwSnpTests(probeId("A")~x, msl, chrnum("1"))
Warning messages:
1: 'DESCRIPTION' file has 'Encoding' field and re-encoding is not
possible
2: 'DESCRIPTION' file has 'Encoding' field and re-encoding is not
possible
> topSnps(f1)
p.val
snp2 0.1470954
snp1 0.9257000
ignore the warnings
> sessionInfo()
R version 2.11.0 Under development (unstable) (2010-03-24 r51388)
x86_64-apple-darwin10.2.0
locale:
[1] C
attached base packages:
[1] splines stats graphics grDevices datasets tools utils
[8] methods base
other attached packages:
[1] GGtools_3.5.71 ff_2.1-2 bit_1.1-3
[4] annotate_1.25.0 AnnotationDbi_1.9.6 GGBase_3.7.7
[7] RSQLite_0.8-4 DBI_0.2-5 Biobase_2.7.5
[10] snpMatrix_1.11.2 survival_2.35-7 weaver_1.13.0
[13] codetools_0.2-2 digest_0.4.1
loaded via a namespace (and not attached):
[1] BSgenome_1.15.18 Biostrings_2.15.25 GSEABase_1.9.0
[4] GenomicRanges_0.0.9 IRanges_1.5.74 RCurl_1.3-0
[7] XML_2.6-0 annaffy_1.19.0 graph_1.25.1
[10] multicore_0.1-3 rtracklayer_1.7.12 xtable_1.5-5
Warning messages:
1: 'DESCRIPTION' file has 'Encoding' field and re-encoding is not
possible
2: 'DESCRIPTION' file has 'Encoding' field and re-encoding is not
possible
>
>
> many thanks for your help,
>
> best regards,
> Claire
>
>
>
>
>
>
>
> On Thu, Apr 1, 2010 at 3:16 AM, Vincent Carey
<stvjc@channing.harvard.edu>wrote:
>
>> Dear Claire,
>>
>> If you are seeing references to racExSet, you are looking at an
outdated
>> version of the GGtools package. The GGBase package includes a
function
>> make_smlSet that helps create the structure. As you note, the
snpMatrix
>> package defines the snp.matrix class. The snpMatrix package has
functions
>> that generate snp.matrix instances from ped format, hapmap-
formatted data,
>> plink-formatted data, or chiamo-formatted data. If you do not use
any of
>> these formats, you will have to create a list of snp.matrix
objects, each
>> element created according to specifications in the snpMatrix
package.
>>
>> If you have up-to-date versions of the software and would like
further
>> guidance, provide a small excerpt of your genotyping data file and
I will
>> provide illustration of how to create the associated smlSet, using
some
>> dummy expression and phenotype data. Clearly the most convenient
approach
>> will be based on a conventional format for genotyping data such as
those
>> enumerated above.
>>
>>
>> On Wed, Mar 31, 2010 at 8:29 PM, claire pujoll <cpujoll@gmail.com>
wrote:
>>
>>> Dear Prof Carey, dear Bioconductor user,
>>>
>>> Could you please tel me where can I find a clear tutorial on how
>>> creating an smlSet class so I can use GGtools (I read the
vignettes but
>>> still not able to find clear information regarding snp data). For
gene
>>> expression I can create an eSet but how can I create SNP data slot
from my
>>> genotype file. I know That I should use snp.matrix but could we
have an
>>> example (not based on HapMap data).
>>>
>>> also it would be useful to clarify regarding racExSet and smlist
as this
>>> is realy very confusing
>>>
>>>
>>> I have 3 matrices:
>>>
>>> gene expression (log2 scale)
>>> indiv1 indiv2 ... indiv 1000
>>>
>>> probe1 .....
>>>
>>> probe2
>>>
>>> probe 48,000 .....
>>>
>>>
>>> SNP data
>>> SNP1 SNP2 SNP 700,000
>>> indiv1 AA AB BB
>>> indiv2
>>> .....
>>> indiv1000 ..... ....... ......
>>>
>>> and pheno data
>>>
>>> thanks for your help.
>>>
>>>
>>> best regards,
>>>
>>> Claire Pujoll, PhD
>>>
>>>
>>>
>>
>
[[alternative HTML version deleted]]