Input sequence files -------------------- fg.fasta: 246 1000bp promoter sequence for the GO:GNF0004 gene cluster fg.X.fasta: same sequence as above, preprocessed by RepeatMask program bg.fasta: 2800 1000bp promoter sequence for genes outside the cluster bg.X.fasta: same sequence as above, preprocessed by RepeatMask program all.fasta: fg.fasta and bg.fasta combined, 3046 sequences all.X.fasta: fg.X.fasta and bg.X.fasta combined, 3046 sequences Create Markov background model ------------------------------ Created background model from sequence outside the cluster (bg.fasta and bg.X.fasta) using Markov model of 0th, 1st, and 2nd order. $MEME_BIN/fasta-get-markov -m 0 < bg.fasta > bg-model-0 $MEME_BIN/fasta-get-markov -m 1 < bg.fasta > bg-model-1 $MEME_BIN/fasta-get-markov -m 2 < bg.fasta > bg-model-2 $MEME_BIN/fasta-get-markov -m 0 < bg.X.fasta > bg-X-model-0 $MEME_BIN/fasta-get-markov -m 1 < bg.X.fasta > bg-X-model-1 $MEME_BIN/fasta-get-markov -m 2 < bg.X.fasta > bg-X-model-2 MEME analysis ------------- Ran MEME analysis using all 12 combinations, including two cluster sequence files (fg.fasta and fg.X.fasta), three corresponding background models (0th, 1st, and 2nd order), and two -mod parameters (zoops and anr). The top 10 motif candidates were retained. meme fg.fasta -dna -mod anr -w 8 -bfile bg-model-0 -nmotifs 10 -maxsize 1000000 > A8.0.anr.html meme ../fg.X.fasta -dna -mod anr -w 8 -bfile bg-X-model-0 -nmotifs 10 -maxsize 1000000 > X8.0.anr.html meme ../fg.fasta -dna -mod zoops -w 8 -bfile bg-model-0 -nmotifs 10 -maxsize 1000000 > A8.0.zoops.html meme ../fg.X.fasta -dna -mod zoops -w 8 -bfile bg-X-model-0 -nmotifs 10 -maxsize 1000000 > X8.0.zoops.html meme fg.fasta -dna -mod anr -w 8 -bfile bg-model-1 -nmotifs 10 -maxsize 1000000 > A8.1.anr.html meme ../fg.X.fasta -dna -mod anr -w 8 -bfile bg-X-model-1 -nmotifs 10 -maxsize 1000000 > X8.1.anr.html meme ../fg.fasta -dna -mod zoops -w 8 -bfile bg-model-1 -nmotifs 10 -maxsize 1000000 > A8.1.zoops.html meme ../fg.X.fasta -dna -mod zoops -w 8 -bfile bg-X-model-1 -nmotifs 10 -maxsize 1000000 > X8.1.zoops.html meme fg.fasta -dna -mod anr -w 8 -bfile bg-model-2 -nmotifs 10 -maxsize 1000000 > A8.2.anr.html meme ../fg.X.fasta -dna -mod anr -w 8 -bfile bg-X-model-2 -nmotifs 10 -maxsize 1000000 > X8.2.anr.html meme ../fg.fasta -dna -mod zoops -w 8 -bfile bg-model-2 -nmotifs 10 -maxsize 1000000 > A8.2.zoops.html meme ../fg.X.fasta -dna -mod zoops -w 8 -bfile bg-X-model-2 -nmotifs 10 -maxsize 1000000 > X8.2.zoops.html MAST and group specificity analysis ----------------------------------- For each MEME motif candidate, its PSSM was retrieved from the HTML output file. MAST program was applied to search all motif occurrences using corresponding sequence file (either all.fasta or all.X.bg, depending how the original PSSM was obtained by MEME) and background model. The E-value was set 1.0. mast pssm_file all_seq_file -bfile bg_model_file -ev 1 All sequences found to contain the PSSM by MAST are called hits. Hits are further split into PositiveHits and NegativeHits based on whether the gene is in the cluster or outside the cluster. Hypergeometric p-value was then calculated scoring the group-specificity of each MEME motif candidate. The final results are in MEME_final.xls