Missing ProbeSets in Affymetrix MoGene 1.0 ST chips

0

Entering edit mode

Mark Cowley ▴ 910

@mark-cowley-2951

Last seen 9.7 years ago

Dear list, There are 93 transcript_cluster_id's on the MoGene 1.0 ST chip that are listed in the csv annotation file, and searchable in the MoGene chip at NetAffx, but that are not present in the [unsupported] CDF file from netaffx. 45 of these ID's are present in the MoGene PGF file, and correspond to the antigenomic probesets, but the remaining 48 are not in the PGF file either. From NetAffx, the 48 non-control probesets are: 11 snRNA's, a RefSeq gene (Lphn2) and 2 other novel transcripts, with the remaining 44 having no annotation other than their genomic location. This isn't a problem, unless Lphn2 is your gene of interest, which it isn't in my case, but it would be nice to know what's going on here! If you RMA normalise using the CDF file (like genespring does) then you end up with 93 rows of missing data, or if you normalise using the PGF/CLF files then you will end up missing out on the remaining 48 probesets. Has anyone else come across this and know what is going on here?? These transcript_cluster_ids are: c("10361826", "10362430", "10362444", "10362452", "10502768", "10532622", "10349381", "10350469", "10354866", "10362438", "10362872", "10369759", "10374030", "10391748", "10395778", "10411504", "10422960", "10436496", "10436660", "10446349", "10453719", "10457089", "10458079", "10460144", "10461932", "10481652", "10482786", "10487009", "10498317", "10501216", "10502040", "10503414", "10513713", "10521665", "10535929", "10546555", "10552810", "10553535", "10560364", "10582560", "10582566", "10582570", "10582576", "10585872", "10586931", "10592453", "10601614", "10602194", "10338002", "10338005", "10338006", "10338007", "10338008", "10338009", "10338010", "10338011", "10338012", "10338013", "10338014", "10338015", "10338016", "10338018", "10338019", "10338020", "10338021", "10338022", "10338023", "10338024", "10338027", "10338028", "10338030", "10338031", "10338032", "10338033", "10338034", "10338038", "10338039", "10338040", "10338043", "10338045", "10338046", "10338048", "10338049", "10338050", "10338051", "10338052", "10338053", "10338054", "10338055", "10338057", "10338058", "10338061", "10338062") cheers, Mark ----------------------------------------------------- Mark Cowley, BSc (Bioinformatics)(Hons) Peter Wills Bioinformatics Centre Garvan Institute of Medical Research, Sydney, Australia

Annotation cdf GeneSpring Annotation cdf GeneSpring • 948 views

ADD COMMENT • link updated 15.7 years ago by James W. MacDonald 65k • written 15.7 years ago by Mark Cowley ▴ 910

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 12 hours ago

United States

Have you asked anybody at Affy? Mark Cowley wrote: > Dear list, > There are 93 transcript_cluster_id's on the MoGene 1.0 ST chip that are > listed in the csv annotation file, and searchable in the MoGene chip at > NetAffx, but that are not present in the [unsupported] CDF file from > netaffx. > 45 of these ID's are present in the MoGene PGF file, and correspond to > the antigenomic probesets, but the remaining 48 are not in the PGF file > either. > From NetAffx, the 48 non-control probesets are: 11 snRNA's, a RefSeq > gene (Lphn2) and 2 other novel transcripts, with the remaining 44 having > no annotation other than their genomic location. This isn't a problem, > unless Lphn2 is your gene of interest, which it isn't in my case, but it > would be nice to know what's going on here! > > If you RMA normalise using the CDF file (like genespring does) then you > end up with 93 rows of missing data, or if you normalise using the > PGF/CLF files then you will end up missing out on the remaining 48 > probesets. > > Has anyone else come across this and know what is going on here?? > > These transcript_cluster_ids are: > c("10361826", "10362430", "10362444", "10362452", "10502768", > "10532622", "10349381", "10350469", "10354866", "10362438", "10362872", > "10369759", "10374030", "10391748", "10395778", "10411504", "10422960", > "10436496", "10436660", "10446349", "10453719", "10457089", "10458079", > "10460144", "10461932", "10481652", "10482786", "10487009", "10498317", > "10501216", "10502040", "10503414", "10513713", "10521665", "10535929", > "10546555", "10552810", "10553535", "10560364", "10582560", "10582566", > "10582570", "10582576", "10585872", "10586931", "10592453", "10601614", > "10602194", "10338002", "10338005", "10338006", "10338007", "10338008", > "10338009", "10338010", "10338011", "10338012", "10338013", "10338014", > "10338015", "10338016", "10338018", "10338019", "10338020", "10338021", > "10338022", "10338023", "10338024", "10338027", "10338028", "10338030", > "10338031", "10338032", "10338033", "10338034", "10338038", "10338039", > "10338040", "10338043", "10338045", "10338046", "10338048", "10338049", > "10338050", "10338051", "10338052", "10338053", "10338054", "10338055", > "10338057", "10338058", "10338061", "10338062") > > cheers, > Mark > ----------------------------------------------------- > Mark Cowley, BSc (Bioinformatics)(Hons) > > Peter Wills Bioinformatics Centre > Garvan Institute of Medical Research, Sydney, Australia > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662

ADD COMMENT • link 15.7 years ago James W. MacDonald 65k

0

Entering edit mode

no, not yet! I will do now. On 04/09/2008, at 10:52 PM, James W. MacDonald wrote: > Have you asked anybody at Affy? > > Mark Cowley wrote: >> Dear list, >> There are 93 transcript_cluster_id's on the MoGene 1.0 ST chip that >> are listed in the csv annotation file, and searchable in the MoGene >> chip at NetAffx, but that are not present in the [unsupported] CDF >> file from netaffx. >> 45 of these ID's are present in the MoGene PGF file, and correspond >> to the antigenomic probesets, but the remaining 48 are not in the >> PGF file either. >> From NetAffx, the 48 non-control probesets are: 11 snRNA's, a >> RefSeq gene (Lphn2) and 2 other novel transcripts, with the >> remaining 44 having no annotation other than their genomic >> location. This isn't a problem, unless Lphn2 is your gene of >> interest, which it isn't in my case, but it would be nice to know >> what's going on here! >> If you RMA normalise using the CDF file (like genespring does) then >> you end up with 93 rows of missing data, or if you normalise using >> the PGF/CLF files then you will end up missing out on the remaining >> 48 probesets. >> Has anyone else come across this and know what is going on here?? >> These transcript_cluster_ids are: >> c("10361826", "10362430", "10362444", "10362452", "10502768", >> "10532622", "10349381", "10350469", "10354866", "10362438", >> "10362872", "10369759", "10374030", "10391748", "10395778", >> "10411504", "10422960", "10436496", "10436660", "10446349", >> "10453719", "10457089", "10458079", "10460144", "10461932", >> "10481652", "10482786", "10487009", "10498317", "10501216", >> "10502040", "10503414", "10513713", "10521665", "10535929", >> "10546555", "10552810", "10553535", "10560364", "10582560", >> "10582566", "10582570", "10582576", "10585872", "10586931", >> "10592453", "10601614", "10602194", "10338002", "10338005", >> "10338006", "10338007", "10338008", "10338009", "10338010", >> "10338011", "10338012", "10338013", "10338014", "10338015", >> "10338016", "10338018", "10338019", "10338020", "10338021", >> "10338022", "10338023", "10338024", "10338027", "10338028", >> "10338030", "10338031", "10338032", "10338033", "10338034", >> "10338038", "10338039", "10338040", "10338043", "10338045", >> "10338046", "10338048", "10338049", "10338050", "10338051", >> "10338052", "10338053", "10338054", "10338055", "10338057", >> "10338058", "10338061", "10338062") >> cheers, >> Mark >> ----------------------------------------------------- >> Mark Cowley, BSc (Bioinformatics)(Hons) >> Peter Wills Bioinformatics Centre >> Garvan Institute of Medical Research, Sydney, Australia >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Hildebrandt Lab > 8220D MSRB III > 1150 W. Medical Center Drive > Ann Arbor MI 48109-0646 > 734-936-8662

ADD REPLY • link 15.7 years ago Mark Cowley ▴ 910

0

Entering edit mode

Hi folks, I got a reply from Affymetrix today regarding the missing probesets on the MoGene chip: reply from Casey Gates from Affymetrix: -------------------------------------------------------------------- The 48 transcript cluster IDs that you have identified as not in the PGF file are from what we call low-coverage transcript clusters: those having less than 4 probes. These tend to be very short, non-biologically interesting sequences and were excluded from the PGF with the intent that they should not be analyzed by users. So the advice is that you can safely ignore them. The reason they are in the NetAffx CSV file is that the NetAffx team used the GFF files as a source for the array design data, which contain these low-coverage transcript clusters. They should have been excluded from the CSV annotation files and NetAffx website, and they will be excluded in future annotation releases. ---------------------------------------------------------------------- I hope that helps everyone, Mark On 05/09/2008, at 10:27 AM, Mark Cowley wrote: > no, not yet! I will do now. > > On 04/09/2008, at 10:52 PM, James W. MacDonald wrote: > >> Have you asked anybody at Affy? >> >> Mark Cowley wrote: >>> Dear list, >>> There are 93 transcript_cluster_id's on the MoGene 1.0 ST chip >>> that are listed in the csv annotation file, and searchable in the >>> MoGene chip at NetAffx, but that are not present in the >>> [unsupported] CDF file from netaffx. >>> 45 of these ID's are present in the MoGene PGF file, and >>> correspond to the antigenomic probesets, but the remaining 48 are >>> not in the PGF file either. >>> From NetAffx, the 48 non-control probesets are: 11 snRNA's, a >>> RefSeq gene (Lphn2) and 2 other novel transcripts, with the >>> remaining 44 having no annotation other than their genomic >>> location. This isn't a problem, unless Lphn2 is your gene of >>> interest, which it isn't in my case, but it would be nice to know >>> what's going on here! >>> If you RMA normalise using the CDF file (like genespring does) >>> then you end up with 93 rows of missing data, or if you normalise >>> using the PGF/CLF files then you will end up missing out on the >>> remaining 48 probesets. >>> Has anyone else come across this and know what is going on here?? >>> These transcript_cluster_ids are: >>> c("10361826", "10362430", "10362444", "10362452", "10502768", >>> "10532622", "10349381", "10350469", "10354866", "10362438", >>> "10362872", "10369759", "10374030", "10391748", "10395778", >>> "10411504", "10422960", "10436496", "10436660", "10446349", >>> "10453719", "10457089", "10458079", "10460144", "10461932", >>> "10481652", "10482786", "10487009", "10498317", "10501216", >>> "10502040", "10503414", "10513713", "10521665", "10535929", >>> "10546555", "10552810", "10553535", "10560364", "10582560", >>> "10582566", "10582570", "10582576", "10585872", "10586931", >>> "10592453", "10601614", "10602194", "10338002", "10338005", >>> "10338006", "10338007", "10338008", "10338009", "10338010", >>> "10338011", "10338012", "10338013", "10338014", "10338015", >>> "10338016", "10338018", "10338019", "10338020", "10338021", >>> "10338022", "10338023", "10338024", "10338027", "10338028", >>> "10338030", "10338031", "10338032", "10338033", "10338034", >>> "10338038", "10338039", "10338040", "10338043", "10338045", >>> "10338046", "10338048", "10338049", "10338050", "10338051", >>> "10338052", "10338053", "10338054", "10338055", "10338057", >>> "10338058", "10338061", "10338062") >>> cheers, >>> Mark >>> ----------------------------------------------------- >>> Mark Cowley, BSc (Bioinformatics)(Hons) >>> Peter Wills Bioinformatics Centre >>> Garvan Institute of Medical Research, Sydney, Australia >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> Hildebrandt Lab >> 8220D MSRB III >> 1150 W. Medical Center Drive >> Ann Arbor MI 48109-0646 >> 734-936-8662 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 15.7 years ago Mark Cowley ▴ 910

Login before adding your answer.