Up: Issue 2
Next: Paper 5
Previous: Paper 3
Voting matters - Issue 2, September 1994
An STV Database
B A Wichmann
Since we know that no single algorithm for STV can have all the properties
one might like, it appears that some statistical analysis may be needed to
select an optimal algorithm. People do not vote at random and therefore any
effective analysis must take into account voting patterns. For instance, if
voters always voted strictly along party lines, proportional representation
among such parties would be an important factor.
Collections of ballot papers from real elections would be useful for any
practical analysis. There is a de facto standard for the
representation of ballot papers in a computer, being the form used by the
Meek algorithm. Hence collection of such data is practical and useful. Both
David Hill, Nicholas Tideman and myself had such collections, accumulated
informally over several years. I have now put this collection into a
consistent framework so that the material can be provided to anybody who
would like it - merely post a floppy disc to me, and I can return the
disc with this data.
The data available has been classified in a number of ways as follows:
- Real: Data here is that from real elections, with the possible
exception that a statistical sample of the total ballot papers would be
acceptable. The reason for this is that it presents a means of providing
'real' data without providing the total information. There are potential
dangers in analysis of real data, since an alternative algorithm could elect
a different person, giving rise to concerns about the election itself,
rather than the principles involved. Another reason for accepting a subset
of all the votes is that this is all that may be feasible for a large
election. Obviously, this data is provided in a form which precludes the
identification of the election involved. There are currently 46 data sets in
this class.
- Mock: This is data from genuine elections, except that no
position or office is at stake. Mock elections are often used to educate
people into the principle of STV. There are currently 2 sets in this class.
- Semi: Elections in this class are not genuine elections, but are
clearly related to real elections. Examples in this class are 'ballot'
papers derived from published STV elections (from Northern Ireland),
elections from the Eurovision Song Contest and elections in which there was
no fixed number of 'seats'. There are currently 21 data sets in this class.
- Test: Data in this class are not derived from any election but
have been constructed to demonstrate the difference between some algorithms,
show a bug in a computer algorithm, or some similar purpose. There are
currently 129 in this class.
I would very much welcome additional data, especially from real elections in
which some 'party' aspect is involved. The data can be provided in a form in
which the origin cannot be traced. I have analysed an Irish election to
produce a single data set in the Semi class, but this is very time
consuming and has to make a number of assumptions to produce anything like
the actual ballot papers. Hence real data is much superior.
Up: Issue 2
Next: Paper 5
Previous: Paper 3