The Exam Polynomial

When students write exams, they want to know their score as fast as possible. However, the system that is supposed to deliver this information to the students is usually slow, for a lot of reasons that I will not go into. Where I work, it has become quite usual to publish a pdf on the course website that lists the student ID together with the number of points they scored. Since student IDs are usually anonymous, most people don't see it as a problem. However, the more this practice is used, the less anonymous a student ID actually is: Once there are enough of these lists public, this data might theoretically be used to connect a student to his or her student ID only by knowing which courses he or she attended. I propose a much better solution. I did not come up with this myself by the way, this is due to a brillian colleague of mine, but I still wanted to share it with the world. Let us assume that the student IDs are numbers between $0$ and a prime number $p$, so we can view them as elements of the finite field $\mathbb{F}_p$. You can always do that by possibly hashing your student IDs if they are not numerical already. By possibly (but most likely not) making $p$ even bigger, we can also assume that the scores of the students are between $0$ and $p$. Let $s_1,\ldots,s_n\in\mathbb{F}_p$ be the student IDs and $p_1,\ldots,p_n\in\mathbb{F}_p$ the points, i.e. student $s_i$ scored $p_i$ points. Let $f\in\mathbb{F}_p[X]$ be an interpolating polynomial, say of degree $n-1$ and such that $f(s_i)=p_i$. If you do not know how that works, here's one way: * Let $f_i := \prod_{j\ne i} (X-s_j) \in \mathbb F_p[X]$ for each $1\le i\le n$. * Let $f := \sum_{i=1}^n p_i\cdot f_i(s_i)^{-1}\cdot f_i$. Note that $f_i(s_j)=0$ for all $i\ne j$, so \[ f(s_j) = \sum_{i=1}^n p_i \cdot f_i(s_i)^{-1} \cdot f_i(s_j) = p_j \cdot f_j(s_j)^{-1}\cdot f_j(s_j) = p_j. \] Now expand $f$ and write it as a Horner Scheme. Examples say more than a thousand words. I will use Mathematica for the polynomial interpolation, but feel free to use anything else.
data =
  {{100, 9},
   {111, 11},
   {93, 13},
   {102, 14},
   {94, 15},
   {95, 15},
   {133, 16}};
p = NextPrime[Max[First /@ data]];
f[X_] := HornerForm[
  PolynomialMod[InterpolatingPolynomial[data, X, Modulus -> p], p]]

 In := f[x]
Out := 83 + x (26 + x (102 + x (51 + x (90 + x (117 + 83 x)))))
This is, of course, a small example. In reality, there will be much more student IDs and $p$ will be larger. We now publish only this polynomial on our course website, e.g. like this:
alert("You have " 
 + ( (83 + x*(26 + x*(102 + x*(51 + x*(90 + x*(117 + 83*x)))))) % 137)
 + " points.");}
ID: and there you go, now all the information about who took part in the exam is hidden inside this polynomial. It is theoretically impossible to deduce the interpolation data from the polynomial, since more than one data set can lead to the same interpolating polynomial. I have not given this a lot of thought, though - maybe you can get there using clever heuristics. It's still better than just publishing the entire list.

Leave a Reply

Your email address will not be published. Required fields are marked *