Frequently asked
questions
I have an Affymetrix scanner in my laboratory. Can I use you array?
Yes you can scan the Scripps/GNF malaria array with any Affymetrix scanner. However, the array does not have the "mismatch" features that are needed for Affymetrix MAS5 to calculate expression levels. For the MOID algorithm we use 4000 generic non-plasmodium probes for background subtraction. We can assist in analyzing your data with MOID if you obtain the array either through MR4 or directly from Affymetrix, or in pointing you toward other programs that will work with the array. All of the sequences on the array that were used in the analysis and the data are available for download and are not proprietary.
I know my gene is
cell cycle expressed but it isn't in one of the clusters. Does this mean the assignments are poor?
No, we were conservative in choosing what to cluster. If a gene falls below our detection threshold in one erythrocytic cycle but not the other, if there are only one or two probes per gene on the array or if the gene is not on the array, it may not have been clustered. We encourage you to download the spreadsheet and examine your gene, or go to www.plasmodb.org. The cluster assignments are mostly to direct researchers with little genome training to the data that we think would have the best likelihood of being reproducible. The data can also be reclustered using your favorite method.
I prefer using RMA
or D-chip to analyze my multiprobe expression data.
Can I use these programs on your data?
By all means try it. We use MOID because it was developed in house. Just remember that we have no mismatches on the array.
What are some of
the differences between this data other expression data described at
www.plasmodb.org
Some discrepancies may result from choice of strain. We used 3d7 for our studies. Other researchers have used other strains.
What are the
advantages of this microarray platform.
The primary advantage of our platform is that we can perform probe-level statistics because we have many probes per gene. For example, if there is only a single probe per gene on an array, if this probe misbehaves (hypothetically) you will collect poor quality data regardless of how many times this probe is spotted or how many times you repeat the experiment. Because we generally have multiple independent probes per gene we can consider the expression changes from each probe independently and then calculate the fold change from the percentile ranking of the fold changes rather than the fold change of a single or average expression value. We can also discard outliers. This doesn’t mean that the expression ratios calculated from a single oligonucleotide or a single spotted cDNA are of poor quality, of course, it just means that one has less confidence in the measurement.
What are the
disadvantages of the platform.
Inflexibility. We cannot change the design of the array easily because of the way the arrays are manufactured. We tried to make sure that every gene would be covered by choosing many more probes than would be needed from both coding and noncoding regions. For the analysis we only used about 1/5 of the probes on the array. If annotations change, we may already have the data for the gene.
What do the
expression levels mean?
Different oligonucleotide probes have different hybridization properties. This means that one oligonucleotide complementary to one part of a transcript may give a strong signal while another oligonucleotide complementary to a different part of the same transcript may give a weak signal, even if the oligonucleotides were designed to have similar GC-content and thermostabilities. These differences may be the result of secondary structure in the target. Thus most people generally believe that one cannot estimate a gene's expression level by measuring the signal intensity from a single probe. However, by examining the distribution of a probeset's signal intensities we can make an educated guess about transcript levels. For our array, low levels are below 100, moderate 100 to 1000 and above 1000, very high. These are estimates obtained by integrating all of the information from a set of probes. The quality of these predictions decreases for genes with only one or two probes.
What about
present/absent predictions?
For this analysis, we have predicted whether or not a gene is likely to be present based on the combination of a gene's computed expression level, as well as a probability function which relies to some extent on the number of probes to that gene. We generally have little confidence in genes with a single probe per gene, though this single probe may still be useful for determining expression ratios.
Why haven't you
created a searchable database?
Our laboratory is very small and we don't have the resources to provide significant support for this project. However, it is something we would like to do in the future. Gene queries are available from www.PlasmoDB.org
Why didn't you
rank your genes by time of maximum expression in the asexual cycle?
This would be difficult because we included sporozoite and gametocyte data in our analysis. In addition such an approach doesn’t reflect the fact that some genes have different patterns of expression within the erythrocytic asexual cycle- some genes have a sharp induction in the asexual cycle (invasion genes) and others have a broad induction within the asexual cycle (genes encoding the transcription apparatus). Some genes that cycle in the erythrocytic cycle are also expressed in gametocytes and sporozoites and some aren't. It is even conceivable that a gene could peak twice in the asexual cycle. K-means clustering isn't perfect (and if you have a better approach by all means try it) but it is useful for generating lists that can be used for ontology analysis. The clusters that were given in the paper merely provide guidelines for interpretation of the data. The numbering of the clusters does not exactly follow the timing of expression in the erythrocytic cycle either. The results from hierarchical clustering are also useful if you are interested in finding a gene that is very similar to your own.
Can I buy the
Scripps/GNF array from Affymetrix?
The array is a non-renewable tool that was designed in our laboratory for our research needs and is not a commercial product supported by Affymetrix. However, the sequences of the probes on the array are not proprietary and can be downloaded from our website and resynthesized by Affymetrix or by other companies. MR4 has been authorized to serve as distributor for the arrays pending interest from the malaria community. If you are interested in using the Scripps/GNF array in your research, send an email to John Rogers (jrogers@niaid.nih.gov), or Shiguang Yang (syang@atcc.org) expressing your interest as they may eventually arrange for the array's continued synthesis and distribution through MR4. Our laboratory is also interested in collaborating but has limited resources.