Voting matters - Issue 3, December 1994

Comparing the stability of two STV algorithms

B A Wichmann

Brian Wichmann is the Editor of Voting matters and a visiting professor of The Open University.

The problem of stability

This note does not consider the usual properties of STV algorithms that have been the subject of Woodall's analysis, but that of stability. For a mechanical system modelled by continuous variables, the analysis of stability is an application of differential calculus. We cannot use such an approach with STV algorithms since the system is discrete, and we know that some small changes are bound to produce a discrete change in those elected.

For an STV algorithm, we could have too much stability in that part of the ballot papers are simply ignored - for instance, by only using the first preference. On the other hand, we could have an algorithm which lacks stability in vital respects by changing the result for inconsequential changes to a ballot paper.

One change made to a ballot paper can be regarded as small, due to the nature of the preferential system. Since the usual means of balloting does not provide for the voter to give equal preference, when the ballot paper records ABC, this might be because A and B were regarded as equal, but the voter specified A first arbitrarily. Hence the voter could equally have written BAC instead. Hence given the ballot ABC, the voter's true intentions could perhaps have been expressed as BAC or ACB. In general, given n preferences, n-1 ballot papers constructed by interchanging neighbouring preferences could be regarded as small differences.

Now consider two algorithms for STV which have broadly similar properties (as do all serious contenders). Figures 1 and 2 represent graphically these two algorithms.

Figure 1

Figure 1 represents a stable algorithm since small changes are unlikely to change the result of an election, while Figure 2 represents an unstable algorithm. If we were operating in two dimensions, then the property of stability could be measured rather like the game of shove-halfpenny: one would measure the probability that a small circle placed at random on the figure crossed one of the dividing lines.

Figure 2

In the case of STV algorithms, we do not have a simple two-dimensional system, and hence the figures are a crude diagrammatic representation. To measure the probabilities we must conduct a suitably controlled experiment. Fortunately, we can use a computer to aid this process so that we can perform the equivalent of shove-halfpenny sufficiently often to obtain results which are likely to be meaningful.

The experimental method

We now specify the experimental method to compare the ERS hand counting rules versus the Meek algorithm. (Any two algorithms could be chosen, but this seems the most interesting pair.)

We select an actual election for which the ballot papers are available. We also choose a number, about 20, which is the number of ballot papers from the full set that is to be selected at random. (We return to the choice of this number n later.)

From each real election, we derive 100 mini-elections by randomly selecting n ballot papers. The experimental method is to analyse the effect of making small changes to these mini-elections. The analysis is as follows. Firstly, we compute the result of the two algorithms from the mini-election. (The result need not be the same for the two algorithms, nor the same as for the full election.) We now consider all the possible similar mini-elections derived by making one small change to one of the ballot papers. (This is potentially hundreds of elections - hence the computer.) This particular mini-election is on the edge if a specific criterion is met, say at least one of the small changes produces a different result.

The choice of n is important. If n is very small (say 1), then it is clear that the mini-election will not be representative of the real election. On the other hand, if n is large (say the full election), then the computation of the 'edge' becomes too large, and also the number of possible mini-elections becomes too small (in this case only 1). Care must be taken over the specific criterion for being on the edge. If one takes something like the ERS council elections (i.e., several posts to fill with no parties, so that small changes are likely to make a difference to the outcome), with the criterion that any small change resulting in a difference implies being on the edge, then there is a danger that all mini-elections are on the edge!

For the 100 random mini-elections we perform a different analysis in each of the three experiments given here. If one could assume statistical independence, then it would be a simple matter to undertake a chi-squared test to see if the result is significant. Unfortunately, we do not have elections with a large enough number of ballot papers to ensure the independence, and therefore we must be content with a non-statistical treatment.

The programs and test data

Two programs have been written, one for using the ERS algorithm and the other the Meek version. Apart from the STV algorithm in use, both work in an identical fashion. They read 100 mini-elections in the conventional format. Firstly, the result is computed for this election, then every possible small change is made, and for each such change, the number of changes to those elected is recorded.

The number of changes to those elected for one small change is usually 0 (no change), but is sometimes 1, rarely 2 and very rarely 3. Hence for each ballot paper in the mini-election, n-1 integers are output, representing the number of changes arising from each of the n-1 possible interchanges of adjacent preferences, where n is the number of preferences marked on the ballot paper. This implies that the output is of similar length to the input - an important consideration, since if complete results were printed for each election result computed, hundreds of pages of material would be produced.

The analysis is most easily seen by considering an example. A mini-election from election R038 is as follows:

 17 5
1 11 9 10 0
1 10 17 5 9 11 16 0
1 6 16 2 1 14 17 10 9 11 5 8 4 12 13 15 0
1 4 8 12 15 13 0
1 17 5 11 1 16 10 2 0
1 5 9 10 11 17 0
1 3 7 9 14 17 0
...
1 8 10 13 11 17 0
1 9 5 6 17 0
1 11 10 5 17 9 0
1 6 4 15 14 16 8 1 0
1 6 14 16 1 2 4 13 12 8 15 0
1 13 4 15 12 8 0
0
"A.1 " "B.2 " "C.3 " "D.4 "
"E.5 " "F.6 " "G.7 " "H.8 "
"I.9 " "J.10 " "K.11 " "L.12 "
"M.13 " "N.14 " "O.15 " "P.16 "
"Q.17 "
"1R038: H3H "

The above data is for an election with 17 candidates for 5 seats, in which the first ballot paper selects candidate 11 (K) as the first preference, then 9 and lastly 10. The names of the candidates are the letters A-Q, a convention used throughout.

The program computes the effect of making all possible interchanges of adjacent preferences, which for Meek gives:

v1 +F-L-B-O-P-H-G-N-E-M-C+I+Q-K+J+A-D 68
0 1 1 1 
2 1 1 1 1 1 1 0 0 
2 1 1 0 0 1 
0 0 1 1 
...
2 1 1 1 
0 1 0 1 
1 0 0 1 
1 0 0 0 0 0 
2 1 0 1 
1 1 0 2 0 0 1 1 0 1 0 0 1 1 
1 1 0 1 1 
0 1 m

The first line gives the result (with Meek) for this mini-election, where +F-L means F is elected and L is excluded, etc. (The v1 and 68 are not relevant.) Then, starting with the last ballot paper and working back towards the beginning, the number of differences to the result is printed for each possible interchange. Hence the last ballot paper has four possible interchanges, the first one giving no difference, but the last three each making a single difference. So in this case, interchanging the first two preferences makes no difference, but interchanging the 2nd and 3rd preferences does change the result by one candidate. The 'm' relates to the third experiment and is explained later.

One other program is needed which selects n ballot papers at random from a real election, and repeats this 100 times. This program is fast and straightforward.

For the main election data, six real elections have been chosen from the data already available (see Voting matters, Issue 2). The statistics from these elections are as follows:

Identifier Papers  Candidates  Seats  n
R006       239        9         2      20
R008       261       10         3      25
R010       270        9         5      27
R017       479        8         1      15
R033       196       14         7      25
R038       177       17         5      20

Unfortunately, none of the elections in the data base are from elections involving parties, and so such elections could not be selected for this study.

We can now summarise the results obtained by example. For election R017, 100 mini-elections are computed by selecting 15 ballot papers from the actual 479. For each of these mini-elections, we compute what difference (if any) would be made by a single transposition of a preference. This is repeated for each possible transposition, which in this case, involves the analysis of 4585 elections!

Experiment 1

We now consider the issue raised initially - that of the 'size' of the edge dividing the line between different election results. We therefore need to devise a criterion for being on the 'edge', and compare the results for the six elections with the two algorithms.

Criterion: Some change for any transposition

Election  ERS edge  Meek edge
R006       74        65
R008       80        74    
R010       95        87    
R017       69        74
R033       99        95    
R038      100       100

This table means, for instance, that for the 100 mini-elections derived from R006, 74 are on the 'edge' for ERS and 65 for Meek - which implies that there were 26 or 35 elections for which no change was made by any transpositions. Hence a very high proportion of the mini-elections are on the 'edge', over three quarters in almost all cases. However, even the most optimistic assumption shows that there is not much difference between the two algorithms.

We now change the criterion for being on the edge so that a lower proportion are on the edge.

Criterion: More than three transpositions make a change

Election  ERS edge  Meek edge
R006       41        38
R008       55        46    
R010       76        57    
R017       32        49
R033       91        75    
R038       94        91

We again conclude that there is not much difference between the two algorithms.

We need to look at aspects other than the actual size of the edge to see significant differences.

Experiment 2

In this experiment, conducted with the programs and data as before, we look at properties of the edges rather than their actual magnitude.

Given a mini-election which is on the edge, then we know at least some transpositions of the preferences will change the result. It is therefore natural to ask which specific transpositions can change the result. Clearly, it is more likely that transposing the first two preferences will alter the result, but what about the subsequent transpositions? We therefore analyse the number of times a transposition makes a change, against the position of the transposition (pi).

Combined results
      p1  p2  p3  p4  p5  p6  p7  p8  p9 p10 p11 p12

R006
 ERS 283  31  14   6   4   0...
Meek 310 147  61  39  23  16  14   0...

R008
 ERS 452  56  11   8   5   4   0   1   3   0...
Meek 393 161  70  42  25  13  12  11   0...

R010
 ERS 668 173  36  21   9   4   2   0...
Meek 423 174  82  34  21  18  13   0...

R017
 ERS 214  27   4   5   2   1   0...
Meek 279 210 123 119 104  94   0...

R033
 ERS 979 227  78  31  17   8   3   2   1   0   0   1
Meek1876 392 225 144 117  91  69  61  57  41  41  34  37

R038
 ERS 734 203  44  31  17   6   3   2   0   1   1   0
Meek 723 502 376 346 157 138 107  97  91  44  36  33

In the table above, for each of the six elections, the number of times a transposition makes (at least) one change to the result is tabulated against the preference position for all the 100 mini-elections. The difference between ERS and Meek is now obvious. The number of changes for the first preference between the two algorithms is similar and is surely not significant. However, in all subsequent preferences, many more changes arise from Meek than from ERS.

In examining the subsequent preferences, there is no natural scale to work to, since a change in preference n is more significant if there are n candidates than 2n candidates. The number of seats is also relevant to this scale. Hence in analysing the table above, both the number of candidates and seats must be considered.

We can add up the results from each election for those positions beyond the number of seats (s) for each election, giving the following results:

                 Position
       s+1   s+2   s+3   >s+3

ERS     61    21    15    11
Meek   530   364   293   596
ratio  8.7    17    19    54

Hence we conclude that transposing preferences beyond the number of seats has virtually no effect with ERS as compared with Meek.

Experiment 3

In a paper in this issue of Voting matters, Woodall defines the property mono-raise. For the elections analysed by the experiments undertaken here, we can determine the extent to which a weaker property than mono-raise is violated. Since our analysis determines the effect of a single interchange in the preferences, given a preference pair A,B which is replaced by B,A, the raising of the order of B should not disadvantage B. This implies that if the election with A,B elects B, then that with B,A should also elect B. If this condition is not satisfied, then mono-raise is violated, and is marked by 'm' in the output files.

We can now compare the violation rate for ERS and Meek, which is as follows:

Election ERS violations  Meek violations
R006         0             32
R008         2             29
R010         5              8
R017         5             78
R033         5             70
R038         0            141

Hence there is no question that Meek violates mono-raise much more than ERS. This is likely to be due to the increased sensitivity of Meek to the effects of late preferences.

Conclusions

The analysis undertaken in this paper has led to the following conclusions:

There is no evidence that the ERS and Meek algorithms are any different with respect to the size of the boundary between the election of different candidates.
Making small changes by transposing preferences later than the number of seats makes virtually no difference with ERS but a substantial difference with the Meek algorithm.
Meek violates mono-raise much more than ERS.

Point 2 indicates that the Meek algorithm is much more sensitive to the voter's wishes than ERS, and moreover this sensitivity is not at the expense of making the algorithm less stable. However, the fact that Meek violates mono-raise so much more than ERS might question the extra sensitivity of Meek. It would appear that an ideal algorithm would have the sensitivity of Meek, but would only violate mono-raise with the same frequency as ERS. I suspect that it is actually the extra sensitivity of Meek that gives rise to the mono-raise violations, so that the best of Meek and ERS is not possible.

It appears that the results presented here have some limitations. Firstly, the mini-elections necessarily have a small number of ballot papers and so the results need not apply to larger elections. Secondly, a consequence of the small number of ballot papers is that in many cases, random choices are made by both the ERS and Meek algorithms.

Up: Issue 3 Next: Paper 2