Imagination: October 2012

Using signal as wires?

Learning that signals were software interrupts in a book on Unix, I thought, «hey, let's try to play with signals in python since it is in stdlib!»

The proposed example is not what one should really do with signals, it is just for the purpose of studying.

Well remember this point it will prove important later: http://docs.python.org/dev/library/signal.html#signal.getsignal

There is no way to “block” signals temporarily from critical sections (since this is not supported by all Unix flavors).

The idea transforming a process as a pseudo hardware component

Signals are like wires normally that carries a rising edge. On low level architecture you may use a wire to say validate when results are safe to propagate. and a wire to clear the results.
I do a simple component that just sets to 1 the bit at the nth position of a register according to the position of the wire/signal (could be used for multiplexing).

Here is the code:

#!/usr/bin/env python3.3 
import signal as s
from time import sleep
from time import asctime as _asctime
from random import randint
import sys

asctime= lambda : _asctime()[11:19]

class Processor(object):
    def __init__(self,signal_map, 
            slow=False, clear_sig=s.SIGHUP, validate_sig=s.SIGCONT):
        self.cmd=0
        self.slow=slow
        self.signal_map = signal_map
        self.clear_sig = clear_sig
        self.validate_sig = validate_sig
        self.value = 0
        self._help = [ "\nHow signal are wired"]
        self._signal_queue = []
        self.current_signal=None

        if validate_sig in signal_map or clear_sig in signal_map:
            raise Exception("Dont wire twice a signal")

        def top_half(sig_no, frame):
            ## UNPROTECTED CRITICAL SECTION 
            self._signal_queue.append(sig_no)
            ## END OF CRITICAL

        for offset,sig_no in enumerate(signal_map):
            s.signal(sig_no, top_half)
            self._help += [ "sig(%d) sets v[%d]=%d"%(sig_no, offset, 1) ]

        self._help += [ "attaching clearing to %d" % clear_sig]
        s.signal(clear_sig, top_half)
        self._help += [ "attaching validating to %d" % validate_sig ]
        s.signal(validate_sig,top_half)
        self._help = "\n".join( self._help)
        print(self._help)

    def bottom_half(self):
        sig_no = self._signal_queue.pop()
        now = asctime()
        seen = self.cmd
        self.cmd += 1
        treated=False
        self.signal=None

        if sig_no in self.signal_map:
            offset=self.signal_map.index(sig_no)
            beauty = randint(3,10) if self.slow else 0
            if self.slow:
                print("[%d]%s:RCV: sig%d => [%d]=1 in (%d)s" % (
                    seen,now,sig_no, offset, beauty
                ))
                sleep(beauty)
            self.value |= 1 << offset 
            now=asctime() 
            print("[%d]%s:ACK: sig%d => [%d]=1 (%d)" % (
                seen,now,sig_no, offset, beauty
            ))
            treated=True

        if sig_no == self.clear_sig:
            print("[%d]%s:ACK clearing value" % (seen,now))
            self.value=0
            treated=True

        if sig_no == self.validate_sig:
            print("[%d]%s:ACK READING val is %d" % (seen,now,self.value))
            treated=True

        if not treated:
            print("unhandled execption %d" % sig_no)
            exit(0)

wired=Processor([ s.SIGUSR1, s.SIGUSR2, s.SIGBUS, s.SIGPWR ])

while True:
    s.pause()
    wired.bottom_half()
    sys.stdout.flush()

Now, let's do some shell experiment:

$ ./signal_as_wire.py&
[4] 9332
660 jul@faith:~/src/signal 19:04:03
$ 
How signal are wired
sig(10) sets v[0]=1
sig(12) sets v[1]=1
sig(7) sets v[2]=1
sig(30) sets v[3]=1
attaching clearing to 1
attaching validating to 18

660 jul@faith:~/src/signal 19:04:04
$ for i in 1 12 10 7 18 1 7 30 18; do sleep 1 && kill -$i %4; done
[0]19:04:31:ACK clearing value
[1]19:04:32:ACK: sig12 => [1]=1 (0)
[2]19:04:33:ACK: sig10 => [0]=1 (0)
[3]19:04:34:ACK: sig7 => [2]=1 (0)
[4]19:04:35:ACK READING val is 7
[5]19:04:36:ACK clearing value
[6]19:04:37:ACK: sig7 => [2]=1 (0)
[7]19:04:38:ACK: sig30 => [3]=1 (0)
[8]19:04:39:ACK READING val is 12

Everything works as it should, no? :) I have brilliantly used signals to transmit data asynchronously to a process. With 1 signal per bit \o/

What about «not being able to block the signal»

$ for i in 1 12 10 7 18 1 7 30 18; do echo "kill -$i 9455; " ; done | sh
[0]22:27:06:ACK clearing value
[1]22:27:06:ACK clearing value
[2]22:27:06:ACK: sig7 => [2]=1 (0)
[3]22:27:06:ACK: sig30 => [3]=1 (0)
[4]22:27:06:ACK: sig10 => [0]=1 (0)
[5]22:27:06:ACK: sig12 => [1]=1 (0)
[6]22:27:06:ACK READING val is 15

Oh a race condition, it already appears with the shell launching the kill instruction sequentially: the results are out of order. Plus you can clearly notice my critical section is not small enough to be atomic. And I lost signals :/

Is python worthless?

Not being able to block your code, is even making a top half/bottom half strategy risky. Okay, I should have used only atomic operations in the top half (which makes me wonder what operations are atomic in python) such has only setting one variable and doing the queuing in the while loop, but I fear it would have been worse.

Which means actually with python, you should not play with signals such as defined in stdlib since without blocking you have systematical race conditions or you risk loosing signals if you expect them to be reliable.

I am playing with signals as I would be playing with a m68K interrupt (I would still block signal before entering the critical section). To achieve the blocking and processing of pending signals I would need POSIX.1 sigaction, sisget, sigprocmask, sigpending.

Why python does not support them (in the stdlib)?

Well python is running on multiple operating systems, some do support POSIX.1 some don't. As signals are not standardized the same way except for POSIX compliant systems with the same POSIX versions, therefore it should not be in st(andar)dlib. And since it is *that* risky I would advocate not allowing to put a signal handler at first place (except for alarm maybe). But, take your own risk accordingly :)

If you feel it is a problem, then just remember binding C code to python is quite easy, and that on POSIX operating system we have everything we need. This solution given in stackoverflow is funky but less than having unprotected critical section: http://stackoverflow.com/a/3792294/1458574.

My problem with Computer Science pseudo code

I remembered opening books of Computer Science to learn algorithm when I began. And my first reaction was: it seems unnecessarily complicated like maths, but I must be an idiot since I learnt physics.

So 15 years later I reopened Introduction To Algorithms By Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, Clifford Stein. And I decided to see if my experience in coding would help me appreciate this book for all the knowledge it could bring me.

So I decided to take a look at the simple heap structure and algorithms:

Let's look at the pseudo code (see here for the whole chapter):

When I see this pseudo code, and think of actual production code, I am dazzled:

it has meaningless variable;
API and variable names (like the global heap_size) are quite hard to grok;
I wonder if recursion is really important knowing it is hard to read, debug, and may reach a recursion limit;
I have a hard time understanding how it works.

So I thought, maybe I am just an idiot and I tried to see if I could do a better job at writing pseudo code which would help understand. (I used python as a pseudo code checker)

Here is my heapify function (the whole code is here https://gist.github.com/3927135 if you want to check the correctness):

def heapify(_array,index=1, heap_size=SENTINEL):
    """bubbles down until it is correctly located.
    heap_size can be used internally for optimization purpose"""
    if heap_size == SENTINEL:
        heap_size= len(_array)
    a_child_is_bigger= True
    while a_child_is_bigger:
        largest= index
        left_pos, right_pos = left_offset(index), right_offset(index)

        #check left
        if left_pos < heap_size and _array[left_pos] > _array[index]:
            largest= left_pos

        #check right
        if right_pos < heap_size and _array[right_pos]>_array[largest]:
            largest= right_pos

        if largest == index:
            # we are finally the king of the hill, end of story
            a_child_is_bigger= False
        else:
            #swap with child // bubble down
            _array[index], _array[largest] = _array[largest], _array[index]
            index= largest

And by coding it revealed what I feared the most: CS and computer developers don't live in the same world: the way a code is written matters.

getting read of the asymmetric test line 5 greatly improves the understanding of the logic;
by using full name for tests conditions you really help people (you included) understanding thus maintaining your code;
recursion comes in the way of understanding straightforward code;
one letter variable names are really unhelpful variable names;
their code typography is even worse than mine.

I have made python code to check the whole logic: writing a tinge more readable code does not seems to prevent it from working (I may have a problem on the boundary because this hidden heap_size variable is a pain to understand).

By the way if you want really efficient heap structure in python please use : http://docs.python.org/library/heapq.html because it is just plain better, superbly documented, tested and the source code is quite nice too.

Now, I am plain lost: when I check CS papers because I want to solve a problem I often stumble on unreadable pseudo code and verbiage. Understanding the API, the variable name ... takes an awful lot of time. When I write in python I have the feeling that it is a Pseudo Code Translator, so if we assume my implementation is close visually and logically from pseudo code, does it costs that much to improve pseudo code readability?

If I can do it knowing that I am one of the most stupid developer among the one I know, why or how can't they do the same. Pseudo code is for sharing knowledge, do we really need to waste time de-ciphering these pseudo codes to be a real or better developer?

When you see equivalent code in industry grade application, you normally hate that. It is plain bad practices. Isn't computer sciences supposed to teach elite grade developers?

And when I see such code, I sometime wonder if the guy understood what he wrote.

Meaningful variables names, test conditions, readability is what matters, because code is meant to be improved and fixed.

With all these questions in my head I came to a conclusion: I don't take for profound what I think is uselessly obscure, so my opinion of CS papers and academics has dramatically dropped.

EDIT: now, I see that bulbbling down and bubbling up can be extracted from the logic, and I guess it can be used to make a heap usable for min/max efficient inserting/extraction.

Unicode is tough

So today, I dared open a bug on python. Which at one point should make me feel mortified, since it has proven that I misunderstood what a character is.

The point was in python 3.2:

foo⋅bar=42
#  File "stdin", line 1
#    foo⋅bar=42
#            ^
#SyntaxError: invalid character in identifier
### This is another bug that is not in the scope of the post
### http://bugs.python.org/issue2382
print(ord("foo⋅bar"[3]))
# 8901
foo·bar = 42
print(ord("foo·bar"[3]))
# 183

A point is a punctuation mark, no? And variable names shouldn't use punctuation.
Plus it looks the same, shouldn't it be considered the same?

So I opened a bug and I was pointed very nicely to the fact that unicode characters "MIDDLE DOT" is indeed a punctuation but it also has the unicode property Other_ID_Continue. And as stated in python rules for identifiers, it is totally legitimate.

That is the point where you actively search for a good documentation to understand what in your brain malfunctioned. Then a Perl coder pointed me to Perl Unicode Essentials from Tom Christiansen. Even if the 1st third is about Perl, it is the best presentation so far on unicode I have read.

And then I understood my mistakes:

I (visually) confused a glyph with a character: a same glyph can be used for different characters;
unicode is much more than simply extending the usable glyphs (that I knew, but I did not grasped that I new so little).

By the way if you need a reason to switch to the current production version 3.3.0
remember Py3.3 is still improving in unicode support

py3.2 :

"ß".upper()
# ß

which is a wrong result while in py3

"ß".upper()
# SS