With STV, recounts are very rarely undertaken due to the problems that this would give. In Newland and Britton rules[1], both first and second edition, there was an instruction, at the end of each stage 'Ascertain that candidates and/or their agents are content' and a recount of the stage could be called for if not. The difficulty with this is that it may not become evident that an early stage needs checking until a later one has occurred, and the only sure strategy for candidates was always to ask for a recount after every stage. In the latest edition of the rules, those words have, in any case, been omitted.
However, when the count is conducted by computer, the computer itself can be used to assess the need for a 'recount'. The article is not concerned with the actual process of undertaking a recount (merely running the counting program again would be pointless), but with providing a tool to assess the risks of an incorrect result being obtained due to a typing error when the papers are entered manually.
This article describes a set of computer programs, developed for Electoral Reform Ballot Services, which assesses the need for a recount.
The stages are as follows:
The data entry model is essentially one of key depressions using ballot papers in which the voter adds preference numbers. Since typing errors have known patterns, a reasonable guess can be made of the potential errors in terms of those errors. However, it is difficult to accurately calibrate the rate of errors. Such errors are naturally rare, say 1 in 5,000 characters, but at this rate one would need to double-check many thousands of characters to obtain a good estimate of the error rate. In addition, the computer entry programs used for ballot entry already include some checks and hence the error simulation program ensures that these checks will be passed. Also, the staff of ERBS are naturally familiar with the requirements and appear to take special care with the first preference (not actually allowed for in the current program). There is some evidence that the staff at ERBS may realise at the end of the ballot paper that they are 'out-of-step' and hence go back to correct an error. In view of the above, there is clearly some doubt as to the accuracy of the model of data errors, but the statistical nature of the problem makes some doubt inevitable.
After some experimentation, the data error rate was set at one key depression per 6,000 characters. However, if the error would then be detected by the STV program, such as arising from a repeated preference, the corresponding change is not made.
Data error analysis program, version 1.01 Basic data of original election: Title: R048: STV Selection Example 1 To elect 10 from 29 candidates. Number of valid votes: 944 Count according to Meek rules Data used to simulate input errors to count: Key errors taken as 1 in 6000 key depressions. Duplication and removal of papers taken as 1 in 6000 papers. Number of simulated elections produced: 100 Seeds were initially: 16215, 15062 and 7213 and finally: 17693, 15003 and 25920 Some statistics from the generated election data: Average number of commas added for each election: 1 Average number of commas deleted for each election: 1 Average number of interchanges for each election: 2 Average number of papers deleted for each election: 0 Average number of papers duplicated for each election: 0 Average number of papers changed for each election: 4 Average number of papers changed at preference: 1 is 1 Candidates elected in the original election and all simulated ones: Jane BENNETT Robert BROWNING Joan CRAWFORD Francis DRAKE Mary-Ann EVANS Kate GREENAWAY John MASEFIELD Alfred TENNYSON Sybil THORNDIKE Candidates not elected in the original election or any of the simulated ones: James BOSWELL Emily BRONTE George BYRON Eric COATES Ella FITZGERALD Stella GIBBONS Graham GREENE Sherlock HOLMES Samuel JOHNSON John KEATS Alice LIDDELL Harold PINTER Walter RALEIGH Margaret RUTHERFORD Will SHAKESPEARE Percy SHELLEY John WESLEY Virginia WOOLF Number of other candidates: 2 Original Result Simulated Result(95% conf. limits) Name Elected Elected 98% ( 93% to 100%) Clara BOW Not Elected Not Elected 98% ( 93% to 100%) Benjamin FRANKLIN End of reportThe program records the known details of the election which includes the type of count used: Meek in this case. Then the statistics are recorded on the simulated elections. Firstly, there is the key depression error rate used, then the seeds used for the pseudo-random generator so that the process can be re-run if required. Then a summary is produced of the changes made to the papers. Note that one of the changes is that of repeating and duplicating a paper (both changes are needed to reflect the checks made on the total number of papers). The commas indicate moving onto the next preference. Note that of nearly 1,000 papers, typically one change is made to the first preference position.
Of course, the changes that will be of most interest are those relating to the election of the candidates. The first two lists are the candidates which are always elected or always excluded - there should be no doubt about the status of these.
The last table indicates the position with those candidates whose status varied in the 101 elections performed (1 original and 100 simulated).
The number of such candidates is two. In the case of Clara Bow, she was elected in the original election and also in 98% of the simulated ones, ie in two cases she was not elected. The case with Benjamin Franklin is exactly the opposite. However, merely knowing that percentage is not what is required. We need an estimate of the probability of an incorrect result, which is the likely value of the percentage in the long run, that is if infinitely many simulated elections were used. This long-term value is estimated to lie between 93% and 100% (to a 95% probability).
In this particular case the result is not seriously in doubt. However if the percentage range included the 50% figure, then it is proposed that this would be sufficient to require a recount.
The method can be applied to assess the impact of data errors arising from mechanically produced data, assuming the data error rate is high enough to warrant its use.
I am grateful to David Hill who provided some Pascal code which gives the 95% probability ranges - a vital part of the system.