PHOCUS: PHOnotactic CUe Segmenter

The code to run PHOCUS and a description of all the corpora (with citations) can be downloaded here. Instructions are included in the README file.

JCL Results

In any of the tables below, click on the name of a model to get the segmentation it produces for that particular corpus. Click "Errors" for a given model to get a detailed error report.

WP = Word Precision; WR = Word Recall; WF = Word F0
BP = Boundary Precision; BR = Boundary Recall; BF = Boundary F0
LP = Lexicon Precision; LR = Lexicon Recall; LF = Lexicon F0

Original BR Corpus

Ran on original BR corpus. Results ignore first 1000 utterances (leaving a total of 8790 utterances).

Model WP WR WF BP BR BF LP LR LF Error Report
PHOCUS-1 67.67 71.81 69.68 80.20 87.12 83.51 59.19 52.02 55.37 Errors
PHOCUS-2 65.78 66.40 66.09 81.38 82.46 81.91 57.24 55.42 56.32 Errors
PHOCUS-3 45.26 60.35 51.73 61.45 90.36 73.15 46.62 31.12 37.32 Errors
PHOCUS-1s 76.84 69.66 73.08 91.03 79.03 84.61 45.65 64.05 53.31 Errors
PHOCUS-2s 75.15 64.24 69.26 93.70 74.51 83.01 43.00 63.74 51.36 Errors
PHOCUS-3s 77.69 73.95 75.77 89.66 83.57 86.51 47.25 63.97 54.36 Errors
MBDP-1 67.54 71.28 69.36 80.20 86.48 83.22 59.69 52.41 55.82 Errors
MBDP-Phon 67.62 53.15 59.52 92.12 64.30 75.74 37.16 58.67 45.50 Errors
Goldwater 73.14 69.08 71.05 88.44 81.51 84.84 57.51 49.41 53.15 Errors
Johnson 83.23 84.47 90.20 94.04 92.08 77.70 68.41 72.76 Errors

Modified BR Corpus

Ran on modified BR corpus. Results ignore first 1000 utterances (leaving a total of 8790 utterances).

Model WP WR WF BP BR BF LP LR LF Error Report
PHOCUS-1 59.18 67.42 63.03 73.15 87.51 79.69 54.40 42.60 47.78 Errors
PHOCUS-2 65.02 65.63 65.32 80.77 81.83 81.30 54.42 54.16 54.29 Errors
PHOCUS-3 38.01 54.22 44.69 56.24 90.06 69.24 45.21 26.92 33.75 Errors
PHOCUS-1s 80.16 78.89 79.52 89.12 87.12 88.11 61.57 65.72 63.58 Errors
PHOCUS-2s 80.19 73.07 76.47 93.37 81.67 87.13 57.35 68.25 62.33 Errors
PHOCUS-3s 79.74 81.98 80.84 87.40 90.87 89.10 66.61 66.83 66.72 Errors
MBDP-1 60.25 67.12 63.50 74.26 86.20 79.79 54.46 44.97 49.26 Errors
MBDP-Phon 69.96 58.33 63.61 91.08 69.71 78.97 42.91 59.62 49.90 Errors
Goldwater 72.29 68.84 70.52 87.76 81.86 84.71 60.80 51.46 55.75 Errors
Johnson 85.11 87.83 86.45 91.17 95.29 93.19 77.52 67.70 72.27 Errors

Sesotho Corpus

Ran on Sesotho corpus. Results ignore first 800 utterances (leaving a total of 7702 utterances).

Model WP WR WF BP BR BF LP LR LF Error Report
PHOCUS-1 19.35 37.60 25.55 32.19 83.16 46.42 28.07 6.90 11.07 Errors
PHOCUS-2 41.33 53.65 46.69 54.63 81.98 65.57 44.54 28.56 34.81 Errors
PHOCUS-3 5.82 18.25 8.82 20.86 95.76 34.26 25.47 1.11 2.13 Errors
PHOCUS-1s 24.86 42.71 31.43 38.61 85.16 53.13 26.61 8.20 12.54 Errors
PHOCUS-2s 41.25 53.55 46.60 54.78 82.21 65.75 44.51 29.08 35.18 Errors
PHOCUS-3s 14.27 30.65 19.47 32.33 94.64 48.19 21.26 3.48 5.97 Errors
MBDP-1 20.09 40.21 26.80 32.42 86.96 47.24 29.43 6.95 11.25 Errors
MBDP-Phon 54.99 56.58 55.78 67.92 71.22 69.53 44.79 37.33 40.72 Errors
Goldwater 32.97 50.43 39.87 44.51 84.09 58.21 39.46 9.96 15.91 Errors
Johnson - - 55.6 - - - - - - -