Mapping the conserved and variable domains of P. falciparum proteins

Paste gene names, space/comma-separated

Data for downloading

Figures 1 and 2 represent the linkage disequilibrium analysis of SFP-derived haplotypes obtained from 4,369 SFPs distinguishing the chloroquine resistant strains FCB and Dd2, across the 14 P. falciparum chromosomes. Linkage disequilibrium surrounding pfcrt in FCB and Dd2 on chromosome 7 is illustrated in Figure 1. Scores were generated by calculating the probability of observing the same genotype by chance over a moving 40 kb window (with the probability of observing the same genotype for any one SFP by chance placed at 0.33). The plot shows the ratio between the probability and the maximum possible probability for regions with at least four SFPs with 1 indicating the best possible score. The position of antigenic variation clusters (vars, stevors or rifins), are shown in blue and are marked. SFPs mapping to these genes were excluded from the calculations because our data indicates that mitotic recombination may be occurring in these genes. pfcrt, which is located between bases 307,926 and 311,020 on chromosome 7 is shown as a black triangle. The trough at pfcrt is likely due to the strong selective pressure on pfcrt and is consistent with the observation that FCB and Dd2 have different alleles of pfcrt even though published SNP data also shows disequilibrium in surrounding regions [Wootton et al., 2002].

Table 1. .CEL files (2.3Mb each)

The hybridization data for each strain is represented as a .CEL file. The .CEL files be retrieved from here. The relative .CEL file for each of the P. falciparum strains hybridized can be viewed in Table 1.

Table 2. Probe file (38Mb)

This file contains 327,989 P. falciparum specific 25 mer probes selected for use in this analysis. Probes located in both the coding and non-coding sequences from both strands are included. Non-specific and un-mapped probes have been removed. This file does not contain any probes from the Plasmodium yoelii sequence. Affymetrix standard control probes and probes from human and mouse are marked as "controls" in the "Reporter Usage" column. Generic mismatches are marked as "background" in the "Reporter Usage" column.
The file is tab separated and contains the following columns (from left to right):


1.     Name          Gene name
2.     Probeset      all _at
3.     X             X co-ordinate of 25 mer feature on the scrMal malaria full genome array
4.     Y             Y co-ordinate of 25 mer feature on the scrMal malaria full genome array
5.     Chr           Chromosome number, 1 through 14
6.     EXPOS         Sequence position of the twelve nucleotide in the 25-mer feature.
7.     Direction     probe positioned on the forward (+) or reverse (-) strand.
8.     Sequence      the nucleotide sequence of each probe
9.     Description   Gene annotation for each probe.
10.    Sense         probe located in either the sense or anti-sense strand.

Table 3. SFPdata_1 (3.2Mb)

Download the individual polymorphism data at each gene for every P. falciparum strain hybridized. This tab-separated file contains the data of 23,653 polymorphic features for all the P. falciparum strains analysed. A polymorphic probe is assigned the score 1, a non-polymorphic probe is assigned as 0.
The table contains the following columns (from left to right):


Name
Probeset
X and Y co-ordinates of each polymorphic feature (X, Y)
Chromosome number (chr)
EXPOS (position of the twelve nucleotide in each 25 mer feature).
Direction of probe
Probe sequence
Description
P. falciparum SFP data

Table 4. SFPdata_2 (250kb)

Download the polymorphism data for each gene in the P. falciparum genome. This tab-delimited file contains the following columns: gene name, description, number of unique SFPs (number of polymorphic probes per gene), total number of sense plus antisense probes for each gene and average SFPs per gene. The average SFP is calculated using the number of unique SFP's over the total number of sense plus anti-sense probes. The table summarizes the data from all P. falciparum strains analyzed. Those genes where no data is entered indicate that probes mapping to this gene were not included in the analysis. Of the 327,989 single-stranded probes (Table 2), 29,207 are perfect reverse complements of another. Because forward and reverse probes do not always exhibit identical hybridization behavior the forward probe might be capable of revealing an SFP while its complement might not. Thus reverse complement probes were excluded from the SFP tally only if both the forward and the reverse complement both detected an SFP. Reverse complements were excluded from the number of probes per genes listed in the table.

Deletion Analysis
Table 5. all_moid.xls (9.1Mb), Table 6. deletion.xls (223kb), Table 7. deletion6.xls (826kb)

Gene deletions were identified based on the match-only integral distribution (MOID) algorithm (Zhou & Abagyan, 2002) that returns a 'present' or 'not present' call based on whether a probe-set distribution for a gene is similar to a series of background control probes (Le Roch et al., 2003). The custom-designed Affymetrix malaria full genome array contains 2,397 probes for 100 viral genes that serve as background controls. The control probes, which are not expected to hybridize to human or P. falciparum sequences are predicted to have little signal associated with them and to show the same pattern as 'deleted' genes. Analysis of control background sequences indicated that they only had a 2% chance of being misclassified as 'present', if required to have a both an intensity level of E>10 and a Kolmogorov-Smirnov test of LogP <-0.5 (Tables 5 & 6). Using these parameters, and after excluding those genes with less than 6 probes, a total of 63 genes were classed as 'deleted' (Table 7). The '%Deletion' column in Tables 6 & 7 indicates if the gene was called as 'deleted' in all three hybridization experiments. For example, a value of 100% for strain "18.02" means the gene is called absent in 3 of the 3 hybridizations, whereas a value of '67%', indicates the gene is absent in only 2 out of 3 hybridizations. If the '%Deletion' is less than 50%, the gene is not considered 'deleted' and hence is not listed in either table.

About the TSRI/GNF malaria array

The arrays (scrmalaria) used in these studies can be ordered in lots of 90-100 from Affymetrix (www.affymetrix.com) at the institutional rate for a microbial array. Custom arrays can also be recreated using all or a subset of the sequences contained on this website. Contact your Affymetrix representative for details. The malaria resource and reagent center, MR4 (www.atcc.org) has been authorized to distribute these arrays as well and may do so if there is sufficient interest. However, because these arrays were not designed using the conventional design pipeline, standard GeneChip software cannot be currently used with this array to determine gene expression levels. Furthermore, Affymetrix did not provide the software that we used in the genotyping analysis.