Fast XOR in Python



As I [have hinted at before](/2017/09/20/just-some-friendly-advice/), the [PyCrypto library](https://www.dlitz.net/software/pycrypto/) [seems to be dead](https://github.com/dlitz/pycrypto/issues/173). The [PyCryptodome](https://www.pycryptodome.org/en/latest/) library is a fork that is promising because it is maintained and works in Python 3, but they have a bit of a finger-wagging attitude which sometimes means that you have to fight the library a bit:
>>> from Crypto.Cipher import ARC4
>>> cipher = ARC4.new(B'funk')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python37\lib\site-packages\Crypto\Cipher\ARC4.py", line 132, in new
    return ARC4Cipher(key, *args, **kwargs)
  File "C:\Python37\lib\site-packages\Crypto\Cipher\ARC4.py", line 57, in __init__
    len(key))
ValueError: Incorrect ARC4 key length (4 bytes)
>>> ARC4.key_size = range(1,257)
>>> ARC4.new(B'funk').decrypt( ARC4.new(B'funk').encrypt( B'Hello World' ))
b'Hello World'
They certainly mean well, but the library is no place to impose security standards, in my opinion. In malware research for example, we often have to verbatim copy the appalling use of certain ciphers, like ARC4 with a 4-byte key. It happens all the time! I have been particularly struggling with [the removal of the XOR cipher](https://pycryptodome.readthedocs.io/en/latest/src/vs_pycrypto.html). The XOR implementation of PyCrypto was very fast, and in this article I will both benchmark how fast exactly it was and give you a drop-in replacement which degrades gracefully based on your options. ### The Drop In Replacement Because this may be what most people care for, here it is:
from itertools import cycle, islice

try:
    from Crypto.Cipher import XOR as PyCrypto_XOR_Factory
except ImportError:
    PyCrypto_XOR_Factory = None

try:
    from Crypto.Util.strxor import strxor
    def _xor(key, data):
        return strxor(data, bytes(bytearray(islice(cycle(key), len(data)))))
except ImportError:
    def _xor(key, data):
        k = cycle(bytearray(key))
        return bytes(bytearray(b^next(k) for b in bytearray(data)))

class XOR_Cipher:
    def __init__(self, key):
        self.key = key 
    def encrypt(self, ciphertext):
        return _xor(self.key, ciphertext)
    def decrypt(self, plaintext):
        return _xor(self.key, plaintext)

class XOR:
  @staticmethod
  def new(key):
    try:
      return PyCrypto_XOR_Factory.new(key)
    except (AttributeError, ValueError):
      return XOR_Cipher(key)
This code gracefully falls back from PyCrypto's XOR to an implementation based on PyCryptodome's strxor and finally to my pure Python implementation. My implementation performes about 10 times worse than the original XOR, but I received other suggestions that managed to perform even worse. In case you're wondering, this code is Python 2 compatible. ### The Tests This was tested with Python 2, installing the packages pycryptodomex and pycrypto so both could be used at a time. I was simply unable to get this to work under Python 3, but feel very free to educate me in the comments. The hideous XOR implementations xor_5 and xor_6 are the most naive thing a battle-hardened C programmer could come up with, and they perform miserably. The variant xor_4 is something you see a lot, for example it is used in the [xor_string](https://pypi.org/project/xor_string/) module. However, I had a gut feeling that xor_3 would be slightly faster, and my tests do confirm that hypothesis. Finally, xor_2 is my take on using the PyCryptodome strxor routine, which turns out to be roughly two times slower than PyCrypto's original XOR, which is somewhat unsurprising considering the additional Python code overhead.
from itertools import cycle, islice, izip
from operator import xor 
from Crypto.Cipher import XOR 
from Cryptodome import Random
from Cryptodome.Util import strxor

import random
import sys
import time

class timer():
  def __init__(self):
    self.start = time.time()
  def __enter__(self):
    return self
  def __exit__(self, exc_type, exc_val, exc_tb):
    rt = time.time() - self.start
    sys.stdout.write('{:6.2f}'.format(rt))
    sys.stdout.flush()

def xor_1(data, key):
  return XOR.new(key).decrypt(data)

def xor_2(data, key):
  return strxor.strxor(data, bytearray(islice(cycle(key),len(data))))

def xor_3(data, key):
  k = cycle(bytearray(key))
  return bytearray(b^next(k) for b in bytearray(data))  

def xor_4(data, key):
  return b''.join(chr(ord(c)^ord(k)) for c,k in izip(data, cycle(key)))

def xor_5(data, key):
  d, n, i = [], len(key), 0
  for byte in data:
    d.append( chr(ord(byte) ^ ord(key[i % n])) )
    i += 1
  return b''.join(d)

def xor_6(data, key):
  d, n, i = b'', len(key), 0
  for byte in data:
    d += chr(ord(byte) ^ ord(key[i % n]))
    i += 1
  return d

key = Random.get_random_bytes(23)
rounds = 1 << 12

xors = [ xor_1, xor_2, xor_3, xor_4, xor_5, xor_6 ]

sys.stdout.write(''.join('XOR{}'.format(x+1).rjust(6) for x in range(len(xors))))
print('  BUFFERSIZE  EVALUATIONS')

for n in range(8, 24, 5):
  buffer = Random.get_random_bytes(1 << n)
  for function in xors:
    with timer():
      for k in range(rounds):
        function(buffer, key)
  sys.stdout.write('  0x{:08X}  {}\n'.format(1<<n, rounds))
  rounds >>= 4
Note that I _really_ wanted to test the [cxor](https://bitbucket.org/barbuza/cxor/) module because it looks to be _just_ the thing, but I gave up after about 30 minutes of trying to get it to compile.

2 Replies to “Fast XOR in Python”

  1. Oh yea, these are the results on my machine:
      XOR1  XOR2  XOR3  XOR4  XOR5  XOR6  BUFFERSIZE  EVALUATIONS
      0.05  0.21  0.42  0.46  0.39  0.33  0x00000100  4096
      0.02  0.03  0.24  0.37  0.48  0.59  0x00002000  256
      0.03  0.05  0.41  0.75  1.16  1.15  0x00040000  16
      0.07  0.12  0.85  1.74  2.11  5.63  0x00800000  1
    

Leave a Reply to T.D. Cancel reply

Your email address will not be published. Required fields are marked *