I recently played around with scraping public Telegram channels and finally want to do something with the data. In particular, I wanted to play around with Named Entity Recognition (NER) and Sentiment Analysis on the content of the Telegram posts. I think this will probably work much better when you do it for each language separately. So before we answer the question "what's this post about?" or "how angry is the person in the post?", let's answer the question "what language is the post in?" first. And we'll do it while warming up with that hole machine learning (ML) topic. Be aware: I have no idea what I'm doing. Do you want to know more?


I was writing some Python code that uses [ctypes][] for interfacing with the [Windows ToolHelp API](https://docs.microsoft.com/en-us/windows/win32/api/_toolhelp/). Specifically, I had to [define a Python equivalent][structs] for the [PROCESSENTRY32](https://docs.microsoft.com/en-us/windows/win32/api/tlhelp32/ns-tlhelp32-processentry32) struct. Now, the [ctypes][] [struct definitions][structs] are a little annoying because they do not give you type hints. You can add the type hints _on top_ of the _fields_ attribute but that looks a little silly because you _literally_ [write everything twice][wet]. In Python 3.7+ ((Starting in Python 3.7, dictionaries keep the insertion order.)), you can solve this using a metaclass, and one stigma of metaclasses is that they have very few applications, so I decided to blog about this one:
class FieldsFromTypeHints(type(ctypes.Structure)):
    def __new__(cls, name, bases, namespace):
        from typing import get_type_hints
        class AnnotationDummy:
            __annotations__ = namespace.get('__annotations__', {})
        annotations = get_type_hints(AnnotationDummy)
        namespace['_fields_'] = list(annotations.items())
        return type(ctypes.Structure).__new__(cls, name, bases, namespace)
and now you can write:
class PROCESSENTRY32(ctypes.Structure, metaclass=FieldsFromTypeHints):
    dwSize              : ctypes.c_uint32
    cntUsage            : ctypes.c_uint32
    th32ProcessID       : ctypes.c_uint32
    th32DefaultHeapID   : ctypes.POINTER(ctypes.c_ulong)
    th32ModuleID        : ctypes.c_uint32
    cntThreads          : ctypes.c_uint32
    th32ParentProcessID : ctypes.c_uint32
    pcPriClassBase      : ctypes.c_long
    dwFlags             : ctypes.c_uint32
    szExeFile           : ctypes.c_char * 260
The metaclass simply deduces the _fields_ attribute of the new class for you before the class is _even created_. It might make sense to think of metaclasses as "class creation hooks", i.e. you get to modify what you have written before the class is actually being defined. The following bit is just to be compatible with [PEP-563](https://www.python.org/dev/peps/pep-0563/):
        from typing import get_type_hints
        class AnnotationDummy:
            __annotations__ = namespace.get('__annotations__', {})
        annotations = get_type_hints(AnnotationDummy)
And then, the following is the actual magic line:
        namespace['_fields_'] = list(annotations.items())
This adds the attribute _fields_ before the class is created. [ctypes]: https://docs.python.org/3/library/ctypes.html [structs]: https://docs.python.org/3/library/ctypes.html#structures-and-unions [wet]: https://en.wikipedia.org/wiki/Don%27t_repeat_yourself Do you want to know what I was coding?


I often require a wrapping integer type in Python, by which I actually mean a subclass of int where all operations are performed modulo some constant number $N$. There are two main use cases for this: 1. Working in a finite field for some cryptographic stuff, or solving problems on [Project Euler](https://projecteuler.net/about). 2. Having Python integers behave like machine registers (8, 16, 32, 64, even 128 bits - you name it.) I decided to solve this once and for all and wrote the integer wrapper class to end all integer wrapper classes. I also managed to keep it rather compact by using *»gasp«* metaclassing. Do you want to know more?


As I [have hinted at before](/2017/09/20/just-some-friendly-advice/), the [PyCrypto library](https://www.dlitz.net/software/pycrypto/) [seems to be dead](https://github.com/dlitz/pycrypto/issues/173). The [PyCryptodome](https://www.pycryptodome.org/en/latest/) library is a fork that is promising because it is maintained and works in Python 3, but they have a bit of a finger-wagging attitude which sometimes means that you have to fight the library a bit:
>>> from Crypto.Cipher import ARC4
>>> cipher = ARC4.new(B'funk')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python37\lib\site-packages\Crypto\Cipher\ARC4.py", line 132, in new
    return ARC4Cipher(key, *args, **kwargs)
  File "C:\Python37\lib\site-packages\Crypto\Cipher\ARC4.py", line 57, in __init__
    len(key))
ValueError: Incorrect ARC4 key length (4 bytes)
>>> ARC4.key_size = range(1,257)
>>> ARC4.new(B'funk').decrypt( ARC4.new(B'funk').encrypt( B'Hello World' ))
b'Hello World'
They certainly mean well, but the library is no place to impose security standards, in my opinion. In malware research for example, we often have to verbatim copy the appalling use of certain ciphers, like ARC4 with a 4-byte key. It happens all the time! I have been particularly struggling with [the removal of the XOR cipher](https://pycryptodome.readthedocs.io/en/latest/src/vs_pycrypto.html). The XOR implementation of PyCrypto was very fast, and in this article I will both benchmark how fast exactly it was and give you a drop-in replacement which degrades gracefully based on your options. Do you want to know more?


I started to play around with ArangoDB and used Python to get some data into my first database. Long story short: if you want to set your own key for the documents, do it on the document, not on the initialization data. EDIT: this is only true for the most recent version 1.3.1 release on pypi by the time of writing ((See conversation on github for details)). Read the longer story!


For some reason, [Jira]'s [formatting] is not Markdown. Since you write everything in Markdown, you might be looking for a converter. If you furthermore hate node.js as much as yours truly, the search can easily claim your soul. Rest assured - I think [mistletoe] is the answer we are seeking. It is a pure Python Markdown parser which can render the parsed Markdown in any format, and one of them is Jira. It even comes with a [script] for this exact purpose. [Jira]: https://jira.atlassian.com/ [formatting]: https://jira.atlassian.com/secure/WikiRendererHelpAction.jspa?section=all [mistletoe]: https://github.com/miyuchina/mistletoe [script]: https://github.com/miyuchina/mistletoe/blob/dev/contrib/md2jira.py


Spoiler: My main point in this post is not given away by the title. But first things first: What are all those words? Would you like to know more?


Part of me wants to write about all the [horror](https://www.bleepingcomputer.com/news/security/ten-malicious-libraries-found-on-pypi-python-package-index/) and [glory](https://crates.io) there is to be seen in package management, but quite frankly it'll take too long. Instead, I will just leave you with a tiny piece of advice. Here comes. If you are on Windows and you want to install a *legitimate* Python package (like for example [PyCryptodome](https://pypi.python.org/pypi/pycryptodome), because naturally you are **fully aware** that [PyCrypto is dead](https://github.com/dlitz/pycrypto/issues/173).), which in reality is a bottomless pit, at the center of which there is a C library, straight from hell - then maybe get the [Microsoft Compiler for Python](http://www.microsoft.com/en-us/download/details.aspx?id=44266) instead of, who knows, wasting hours or even days looking for a less reasonable solution. Credit, as so very often, [goes to stackoverflow](https://stackoverflow.com/a/27327236/1578458).


The color that [PuTTY](http://www.putty.org/) uses for blue is simply too dark. The scrollback buffer, by default, is 200 lines. That's ridiculous, I have several gigabytes of RAM going to waste here. In case you have a lot of stored sessions, it's quite tiresome to use PuTTY to go through all of them and fix whatever settings you would like to change. You can edit them directly in the registry, though - or use this Python 3 script!
from winreg import OpenKey, EnumKey, QueryValueEx, SetValueEx, \
  KEY_WRITE, KEY_READ, HKEY_CURRENT_USER as HKCU

class callrange:       
    def __init__(self, call_function):
        self.call = call_function
    def __getitem__(self,index):
        try: return self.call(index)
        except OSError: raise IndexError

with OpenKey(HKCU,"SOFTWARE\SimonTatham\PuTTY\Sessions") as sessions:
    for s in callrange(lambda i: EnumKey(sessions,i)):
        with OpenKey(sessions,s,access=KEY_WRITE|KEY_READ) as key:
            for name,value in [
                ('Colour14','100,100,250'),
                ('Colour15','120,120,250'),
                ('TerminalType','xterm'),
                ('ScrollbackLines',6000)
            ]:
                type = QueryValueEx(key,name)[1]
                SetValueEx(key,name,0,type,value)
By the way, searching the tubes reveals [some useful suggestions to improve PuTTY's default settings](http://dag.wiee.rs/blog/content/improving-putty-settings-on-windows).


Member variables in python are horrible. They are not visible in the layout of the class which is instantiated, but instead the __init__ function of a class creates certain member variables for the instance. I have never liked this about python, to be honest. For a recent project, I devised the following solution. Assume you would want this behaviour:
>>> class test(Base):
...     # Variables
...     number = 4
...     string = "hodor"
...     # Functions
...     def stringmult(self):
...         return self.number * self.string
...
>>> test().stringmult()
'hodorhodorhodorhodor'
>>> test(number=2).stringmult()
'hodorhodor'
>>> test(string="Na",number=8).stringmult() + " - Batman!"
'NaNaNaNaNaNaNaNa - Batman!'
>>> 
>>> test(end="Batman!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ctypes.ArgumentError
>>>
In other words, any class that inherits from Base can be constructed with keyword arguments who must match exactly the correct class variables which you specify. This one does it:
from ctypes import ArgumentError
class Base(object):
    def __init__(self, **kwargs):
        # "given" is the list of keyword arguments passed to the constructor
        # of this object, "needed" is the list of class variables which belong to  
        # the base class of the object which is being created, which do not end 
        # with two underscores and which are not a function. Trust me, we do not 
        # want to meddle with those.
        given = list(kwargs.keys())
        needed = [attr for attr in dir(self.__class__) if attr[-2:] != '__' \
             and type(self.__class__.__dict__[attr])!=type(lambda:0) ]

        # Check if keyword arguments have been provided which are not among the
        # required arguments and throw an exception if so. Remove this check for
        # a less restrictive base class. I wouldn't recommend it.
        if not set(given) <= set(needed):
            raise ArgumentError()

        # First, initialize the attribute dictionary of the object being created
        # with a list of default values, indicated by the values of the class 
        # variables. Then, update the attribute dictionary again with the values
        # provided to this constructor.
        self.__dict__.update({k: self.__class__.__dict__[k] for k in needed})
        self.__dict__.update(kwargs)
I personally like this approach a lot and hereby dare you to tell me even a single reason not to do this, in the comments.


Nikolai asked me how to open a Python interpreter prompt from the console with a certain module already imported. For the record, the Python command line switches tell us that python -i -m module is the way to start the prompt and load module.py. That made me wonder whether I can stuff it all into one batch file, and I came up with the following test.bat:
REM = '''
@COPY %0.bat %0.py
@python %0.py
@DEL %0.py 
@goto :eof ::'''
del REM
for k in range(70):
    print(k)
That script will ignore the first line because it is a comment, then copy itself to test.py, then launch python with this argument. Afterwards, test.py is deleted and the script terminates without looking at any of the following lines. Note that :: is yet another way to comment in Batch. Python, however, will see a script where the variable REM is defined as a multi-line string and deleted right after that. After this little stub, you can put any python code you want. Well. I thought it was funky.


I was riding the backseat of a car, a pal of mine with a large Sudoku book on the seat beside me. I glared over at him and remarked that I find Sudokus utterly boring and would feel that my time is wasted on any of them. He looked up at me, clearly demanding an explanation for that statement. I continued to explain that a computer program could solve a Sudoku with such ease that there is no need for humans to do it. He replied that something similar could be said about chess, but still it's an interesting game. And it was then, that I realized why Sudoku is so horribly boring, and chess is not. It was the fact that I could code a Sudoku solver and solve the Sudoku he was puzzling about, and I would be able to do it faster than it would take him to solve the entire thing by hand. This does not apply to chess, evidently. Of course, I confidently explained this to him. »Prove it.«, he said. So I did. Do you want to know more?