Brian Wichmann is the Editor of Voting matters and a visiting professor of The Open University.
An obvious question to raise is if the information provided in a ballot can somehow be simplified to provide the essential content. In this paper, a simple model is proposed which appears to provide the essential information from a preferential ballot.
Hence we consider the case with four candidates: Albert, Bernard, Clare and Diana, with the votes cast as follows:
20 AB 15 CDA 4 ADC 1 BFrom this data, we compute the number of each pair of preferences, adding both the starting position and a terminating position. For instance, the number of times the preference for A is followed by B is 20, and the number of times the starting position is 'followed by' A is 20+4=24. The complete table is therefore:
A B C D e s 24 1 15 0 - A - 20 0 4 15 B 0 - 0 0 21 C 0 0 - 15 4 D 15 0 4 - 0Obviously, a preference for X cannot be followed by X, resulting in the diagonal of dashes. The entry under s-e could represent the invalid votes.
Having now computed this table, we can use it to characterise voting behaviour. For instance, 24 out of 40, or 60% of voters gave A as their first preference. More than this, we can use the table to compute ballot papers having the same statistical properties. For example, if the first preference was A, then the second row of the table shows that the subsequent preference should be B, D or e in the proportions of 20:4:15. Due to the fortunately large number of zeros in the table, we can easily compute the distribution of all the possible ballot papers which can be constructed this way. Putting these in reducing frequency of occurrence we have:
AB 30.8% (50.0%) A 23.1% CDAB 16.9% CDA 12.7% (37.5%) C 7.9% ADC 6.1% (10.0%) B 2.5% ( 2.5%)The figures in brackets are the frequencies from the original data - which can be seen to be quite different.
A number of points arise from this example:
The conclusion so far is that the model characterises some aspects of voter behaviour, but does not mirror other aspects. However, from the point of view of preferential voting systems, we need to know if the characterization influences the results obtained by a variety of STV algorithms. The property can be checked by comparing sets of ballot papers constructed by the above process against those produced by random selection of ballot papers from the original data.
We take the ballot papers from a real election which was to select 7 candidates from 14, being election R33 from the STV database. From this data, which consists of 194 ballot papers, we select 100 elections of 25 votes by a) producing random subsets of the actual ballots, or by b) the process described above.
For each of the 200 elections we determine 4 properties as follows:
Subset Process Number Condorcet (G) 75 67 100 Meek (C) 42 34 100 ERS (N) 56 47 100 Tideman (E) 14 20 50I believe that the four properties above are sufficiently independent, and the elections themselves independent enough to undertake the chi-squared test to see if the two sets of elections could be regarded as having come from the same population. Passing this test would indicate that the statistical construction process is effective in providing 'election' data for research purposes.
The statistical testing is best done as a separate 2 × 2 table test of each line. The first line, for example, gives the table
Condorcet Analysis (G) other Subset 75 25 100 Process 67 33 100 ----------------- 142 58 200The four tables give P = 0.28, 0.31, 0.26 and 0.29 respectively, using a two-tailed test. So, so far as this test goes, these show no significant differences in the two methods.