Missing ProbeSets in Affymetrix MoGene 1.0 ST chips
1
0
Entering edit mode
Mark Cowley ▴ 910
@mark-cowley-2951
Last seen 9.7 years ago
Dear list, There are 93 transcript_cluster_id's on the MoGene 1.0 ST chip that are listed in the csv annotation file, and searchable in the MoGene chip at NetAffx, but that are not present in the [unsupported] CDF file from netaffx. 45 of these ID's are present in the MoGene PGF file, and correspond to the antigenomic probesets, but the remaining 48 are not in the PGF file either. From NetAffx, the 48 non-control probesets are: 11 snRNA's, a RefSeq gene (Lphn2) and 2 other novel transcripts, with the remaining 44 having no annotation other than their genomic location. This isn't a problem, unless Lphn2 is your gene of interest, which it isn't in my case, but it would be nice to know what's going on here! If you RMA normalise using the CDF file (like genespring does) then you end up with 93 rows of missing data, or if you normalise using the PGF/CLF files then you will end up missing out on the remaining 48 probesets. Has anyone else come across this and know what is going on here?? These transcript_cluster_ids are: c("10361826", "10362430", "10362444", "10362452", "10502768", "10532622", "10349381", "10350469", "10354866", "10362438", "10362872", "10369759", "10374030", "10391748", "10395778", "10411504", "10422960", "10436496", "10436660", "10446349", "10453719", "10457089", "10458079", "10460144", "10461932", "10481652", "10482786", "10487009", "10498317", "10501216", "10502040", "10503414", "10513713", "10521665", "10535929", "10546555", "10552810", "10553535", "10560364", "10582560", "10582566", "10582570", "10582576", "10585872", "10586931", "10592453", "10601614", "10602194", "10338002", "10338005", "10338006", "10338007", "10338008", "10338009", "10338010", "10338011", "10338012", "10338013", "10338014", "10338015", "10338016", "10338018", "10338019", "10338020", "10338021", "10338022", "10338023", "10338024", "10338027", "10338028", "10338030", "10338031", "10338032", "10338033", "10338034", "10338038", "10338039", "10338040", "10338043", "10338045", "10338046", "10338048", "10338049", "10338050", "10338051", "10338052", "10338053", "10338054", "10338055", "10338057", "10338058", "10338061", "10338062") cheers, Mark ----------------------------------------------------- Mark Cowley, BSc (Bioinformatics)(Hons) Peter Wills Bioinformatics Centre Garvan Institute of Medical Research, Sydney, Australia
Annotation cdf GeneSpring Annotation cdf GeneSpring • 948 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 12 hours ago
United States
Have you asked anybody at Affy? Mark Cowley wrote: > Dear list, > There are 93 transcript_cluster_id's on the MoGene 1.0 ST chip that are > listed in the csv annotation file, and searchable in the MoGene chip at > NetAffx, but that are not present in the [unsupported] CDF file from > netaffx. > 45 of these ID's are present in the MoGene PGF file, and correspond to > the antigenomic probesets, but the remaining 48 are not in the PGF file > either. > From NetAffx, the 48 non-control probesets are: 11 snRNA's, a RefSeq > gene (Lphn2) and 2 other novel transcripts, with the remaining 44 having > no annotation other than their genomic location. This isn't a problem, > unless Lphn2 is your gene of interest, which it isn't in my case, but it > would be nice to know what's going on here! > > If you RMA normalise using the CDF file (like genespring does) then you > end up with 93 rows of missing data, or if you normalise using the > PGF/CLF files then you will end up missing out on the remaining 48 > probesets. > > Has anyone else come across this and know what is going on here?? > > These transcript_cluster_ids are: > c("10361826", "10362430", "10362444", "10362452", "10502768", > "10532622", "10349381", "10350469", "10354866", "10362438", "10362872", > "10369759", "10374030", "10391748", "10395778", "10411504", "10422960", > "10436496", "10436660", "10446349", "10453719", "10457089", "10458079", > "10460144", "10461932", "10481652", "10482786", "10487009", "10498317", > "10501216", "10502040", "10503414", "10513713", "10521665", "10535929", > "10546555", "10552810", "10553535", "10560364", "10582560", "10582566", > "10582570", "10582576", "10585872", "10586931", "10592453", "10601614", > "10602194", "10338002", "10338005", "10338006", "10338007", "10338008", > "10338009", "10338010", "10338011", "10338012", "10338013", "10338014", > "10338015", "10338016", "10338018", "10338019", "10338020", "10338021", > "10338022", "10338023", "10338024", "10338027", "10338028", "10338030", > "10338031", "10338032", "10338033", "10338034", "10338038", "10338039", > "10338040", "10338043", "10338045", "10338046", "10338048", "10338049", > "10338050", "10338051", "10338052", "10338053", "10338054", "10338055", > "10338057", "10338058", "10338061", "10338062") > > cheers, > Mark > ----------------------------------------------------- > Mark Cowley, BSc (Bioinformatics)(Hons) > > Peter Wills Bioinformatics Centre > Garvan Institute of Medical Research, Sydney, Australia > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662
ADD COMMENT
0
Entering edit mode
no, not yet! I will do now. On 04/09/2008, at 10:52 PM, James W. MacDonald wrote: > Have you asked anybody at Affy? > > Mark Cowley wrote: >> Dear list, >> There are 93 transcript_cluster_id's on the MoGene 1.0 ST chip that >> are listed in the csv annotation file, and searchable in the MoGene >> chip at NetAffx, but that are not present in the [unsupported] CDF >> file from netaffx. >> 45 of these ID's are present in the MoGene PGF file, and correspond >> to the antigenomic probesets, but the remaining 48 are not in the >> PGF file either. >> From NetAffx, the 48 non-control probesets are: 11 snRNA's, a >> RefSeq gene (Lphn2) and 2 other novel transcripts, with the >> remaining 44 having no annotation other than their genomic >> location. This isn't a problem, unless Lphn2 is your gene of >> interest, which it isn't in my case, but it would be nice to know >> what's going on here! >> If you RMA normalise using the CDF file (like genespring does) then >> you end up with 93 rows of missing data, or if you normalise using >> the PGF/CLF files then you will end up missing out on the remaining >> 48 probesets. >> Has anyone else come across this and know what is going on here?? >> These transcript_cluster_ids are: >> c("10361826", "10362430", "10362444", "10362452", "10502768", >> "10532622", "10349381", "10350469", "10354866", "10362438", >> "10362872", "10369759", "10374030", "10391748", "10395778", >> "10411504", "10422960", "10436496", "10436660", "10446349", >> "10453719", "10457089", "10458079", "10460144", "10461932", >> "10481652", "10482786", "10487009", "10498317", "10501216", >> "10502040", "10503414", "10513713", "10521665", "10535929", >> "10546555", "10552810", "10553535", "10560364", "10582560", >> "10582566", "10582570", "10582576", "10585872", "10586931", >> "10592453", "10601614", "10602194", "10338002", "10338005", >> "10338006", "10338007", "10338008", "10338009", "10338010", >> "10338011", "10338012", "10338013", "10338014", "10338015", >> "10338016", "10338018", "10338019", "10338020", "10338021", >> "10338022", "10338023", "10338024", "10338027", "10338028", >> "10338030", "10338031", "10338032", "10338033", "10338034", >> "10338038", "10338039", "10338040", "10338043", "10338045", >> "10338046", "10338048", "10338049", "10338050", "10338051", >> "10338052", "10338053", "10338054", "10338055", "10338057", >> "10338058", "10338061", "10338062") >> cheers, >> Mark >> ----------------------------------------------------- >> Mark Cowley, BSc (Bioinformatics)(Hons) >> Peter Wills Bioinformatics Centre >> Garvan Institute of Medical Research, Sydney, Australia >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Hildebrandt Lab > 8220D MSRB III > 1150 W. Medical Center Drive > Ann Arbor MI 48109-0646 > 734-936-8662
ADD REPLY
0
Entering edit mode
Hi folks, I got a reply from Affymetrix today regarding the missing probesets on the MoGene chip: reply from Casey Gates from Affymetrix: -------------------------------------------------------------------- The 48 transcript cluster IDs that you have identified as not in the PGF file are from what we call low-coverage transcript clusters: those having less than 4 probes. These tend to be very short, non-biologically interesting sequences and were excluded from the PGF with the intent that they should not be analyzed by users. So the advice is that you can safely ignore them. The reason they are in the NetAffx CSV file is that the NetAffx team used the GFF files as a source for the array design data, which contain these low-coverage transcript clusters. They should have been excluded from the CSV annotation files and NetAffx website, and they will be excluded in future annotation releases. ---------------------------------------------------------------------- I hope that helps everyone, Mark On 05/09/2008, at 10:27 AM, Mark Cowley wrote: > no, not yet! I will do now. > > On 04/09/2008, at 10:52 PM, James W. MacDonald wrote: > >> Have you asked anybody at Affy? >> >> Mark Cowley wrote: >>> Dear list, >>> There are 93 transcript_cluster_id's on the MoGene 1.0 ST chip >>> that are listed in the csv annotation file, and searchable in the >>> MoGene chip at NetAffx, but that are not present in the >>> [unsupported] CDF file from netaffx. >>> 45 of these ID's are present in the MoGene PGF file, and >>> correspond to the antigenomic probesets, but the remaining 48 are >>> not in the PGF file either. >>> From NetAffx, the 48 non-control probesets are: 11 snRNA's, a >>> RefSeq gene (Lphn2) and 2 other novel transcripts, with the >>> remaining 44 having no annotation other than their genomic >>> location. This isn't a problem, unless Lphn2 is your gene of >>> interest, which it isn't in my case, but it would be nice to know >>> what's going on here! >>> If you RMA normalise using the CDF file (like genespring does) >>> then you end up with 93 rows of missing data, or if you normalise >>> using the PGF/CLF files then you will end up missing out on the >>> remaining 48 probesets. >>> Has anyone else come across this and know what is going on here?? >>> These transcript_cluster_ids are: >>> c("10361826", "10362430", "10362444", "10362452", "10502768", >>> "10532622", "10349381", "10350469", "10354866", "10362438", >>> "10362872", "10369759", "10374030", "10391748", "10395778", >>> "10411504", "10422960", "10436496", "10436660", "10446349", >>> "10453719", "10457089", "10458079", "10460144", "10461932", >>> "10481652", "10482786", "10487009", "10498317", "10501216", >>> "10502040", "10503414", "10513713", "10521665", "10535929", >>> "10546555", "10552810", "10553535", "10560364", "10582560", >>> "10582566", "10582570", "10582576", "10585872", "10586931", >>> "10592453", "10601614", "10602194", "10338002", "10338005", >>> "10338006", "10338007", "10338008", "10338009", "10338010", >>> "10338011", "10338012", "10338013", "10338014", "10338015", >>> "10338016", "10338018", "10338019", "10338020", "10338021", >>> "10338022", "10338023", "10338024", "10338027", "10338028", >>> "10338030", "10338031", "10338032", "10338033", "10338034", >>> "10338038", "10338039", "10338040", "10338043", "10338045", >>> "10338046", "10338048", "10338049", "10338050", "10338051", >>> "10338052", "10338053", "10338054", "10338055", "10338057", >>> "10338058", "10338061", "10338062") >>> cheers, >>> Mark >>> ----------------------------------------------------- >>> Mark Cowley, BSc (Bioinformatics)(Hons) >>> Peter Wills Bioinformatics Centre >>> Garvan Institute of Medical Research, Sydney, Australia >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> Hildebrandt Lab >> 8220D MSRB III >> 1150 W. Medical Center Drive >> Ann Arbor MI 48109-0646 >> 734-936-8662 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 799 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6