Becoming a Nerd in Polynomial Time: programming

Thursday, June 7, 2012

Write your own Kernel: BootLoader stub

Since I was young, one of my dreams was to write my own operating system.
It was still the DOS era, and as a young boy I started studying lot of tech stuff trying to figure out what was really happening inside my computer.

Actually, at the time, it was too much 'tech' stuff for me, I didn't have the necessary knowledge to understand the subtle details behind a low-level programming approach.

Then I took my Computer Engineering degree, the technical details are no longer obscure, but since then, the old dream remained just... an old dream!

Yesterday, while messing up with a broken hard disk trying to recover some data, a funny idea came up into my mind, and I launched:

dd if=/dev/sda of=MBR count=1 bs=512 > MBR

This dumped into a file the content of that disk's Master Boot Record (MBR). Of course, I couldn''t resist this, and I disassembled it:

objdump -D -b binary -mi386 -Maddr16,data16 MBR

While staring at that code, my old dream came back to me, and so I started messing around a bit. The outcome, was a small bootloader stub, which is actually working, and which I am here describing... In the future, I will surely spend some time extending this, but this will remain just a project-for-fun!

And yes, this is somewhat reinventing the wheel, but for a low-level programmer, this is really funny!

So, the work setting is this:

i386/x86_64 target machines
C and Assembly languages
standard GNU compilation toolchain (gcc, gas, ld, make, ...)
VirtualBox for early testing, then I will switch to a legacy old machine to real hardware test...

As for VirtualBox configuration, I have set a 40MiB virtual IDE Hard Disk, FAT32 partitioned, for future development (my legacy machine will mount a similar one), and a floppy device loading a binary raw image (the actual bootloader + early kernel).

So, let's now start analyzing the BootLoader's early stub. This is a 'stage 1' bootloader which simply prints a "Hello World" message, waits for a keystroke and then reboots the system. No actual loading of any stage 2 loader, nor mode change.

Given the hardware architecture we are targeting, we have some constraints about the bootloader to keep in mind for writing it successfully:

The bootloader is loaded by the BIOS (which is loaded into RAM at startup as well), given some preconditions
Both code and data must be placed within one sector of the booting disk (the MBR), which is 512B
The last two bytes of this sector must have the value 0xaa55, otherwise the sector will not be considered as bootable
The BIOS will load the bootloader at address 0x0000:0x7c00, so the code must be relocated to that address, otherwise it won't work.
The %dl register will contain the disk number, useful for reading additional data from it. Nevertheless, I am ignoring this up to this up to this stage.
Of course, we run in real mode, so we have 16-bit code and we can do almost everything! :)

So, our MBR will contain both code and data. in the future, it will contain some partition table as well, but since it is stored into a floppy disk, we must provide a Disk Description Table (DDT) as well, to make it a valid floppy. Additionally, the first byte of the MBR will be part of the first instruction which will get executed, so we have to properly merge these things. The beginning of our boot1.S code is therefore this:

 .code16
 .text

.globl _start; _start:

 jmp stage1_start
 
OEMLabel:  .string "BOOT"
BytesPerSector:  .short 512
SectorsPerCluster: .byte 1
ReservedForBoot: .short 1
NumberOfFats:  .byte 2
RootDirEntries:  .short 224
LogicalSectors:  .short 2880
MediumByte:  .byte 0x0F0
SectorsPerFat:  .short 9
SectorsPerTrack: .short 18
Sides:   .short 2
HiddenSectors:  .int 0
LargeSectors:  .int 0
DriveNo:  .short 0
Signature:  .byte 41   #41 = floppy
VolumeID:  .int 0x00000000   # any value
VolumeLabel:  .string "myOS       "
FileSystem:  .string "FAT12   "

 stage1_start:

The .code16 and .text are things to make GNU assembler (gas) produce valid code. In particular, .code16 tells the assembler to produce 16-bits code (the default would be 32-bits, of course!) and .text ensures that we have everything into one single section (we don't actually mind which one, as they will be stripped, later on). _start is an actual symbol which describes the entry point for the executable, but we will be using this in a different way. The first instruction, jmp, tells the machine to skip "executing" the DDT, so we can have a correctly executing program, keeping the correct format for a floppy (of course, this bites some bytes out of the small 512B available space).

The code now must setup the runtime stack, since it will allow using the "call" instruction for calling subroutines. This looks like this:

 cli              
 movw $0x07C0, %ax
 addw $544, %ax  # 8K buffer
 mov %ax, %ss
 movw $4096, %sp
 sti

This creates a 4KiB stack space above the bootloader, which is (hardcoded) 8KiB large (544 paragraphs). We use the cli and sti couple, in order to disable and then re-enable interrupts, since it is not safe here to perform any interrupt operation, before having the stack correctly set up.

Then, we display our dummy message, "Hello World", and then reboot.

 cld
 movw $hello_s, %si
 call  print_string
 jmp reboot
 hlt

cld clears the direction flag, so that the internal implementation of print_string will read the string placed into %si from the beginning to the end. The final hlt is actually never executed, but placing it there is a good practice, nevertheless. So, let's see how do print_string function and reboot subroutine work.

1:
 movw $0x0001, %bx
 movb $0x0e, %ah
 int $0x10
print_string:
 lodsb
 cmpb $0, %al
 jne 1b
 ret

This routine is actually nicely optimized in space (remember, a bootloader suffers from space!). The funny part is that its entry point is in the middle of its code!
So, when we call it, a lodsb instruction gets executed, which loads one byte from %si into %al and increments %si by one: we read one character of the string from memory!
cmpb checks whether the byte just read is 0, i.e., if it is a NUL terminating character, the end of the string. If not, it goes executing from the 1: label.
There, we find the int $0x10 instruction, which generates an interrupt and searches into the Interrupt Vector Table the entry 0x10, which is associated with the BIOS teletype function family. The value 0x0e stored into %ah tells the BIOS to activate the screen printing function, which displays on screen exactly the caracter stored into %al, using the old-fashioned Codepage Font (which is usually hard-coded within the BIOS itself).
This process goes on until a '\0' is found in the string, then a ret is executed.

Rebooting the system is far more easy:

reboot:
 movw $reboot_s, %si
 call print_string
 movw $0,%ax
 int $0x16 # Wait for keystroke
 movw $0,%ax
 int $0x19 # Reboot the system

A nice string asking the user to press any key is shown, using the same function as before. Then, a keystroke is waited for, using the BIOS int $0x16 which, having %ax == 0, activates the "get keystroke" BIOS function. The BIOS does not return from the interrupt routine until a key is pressed, and then int $0x19 is executed, i.e. the "bootstrap loader" interrupt.

The end of stage1.S goes like this:

 hello_s: .string "Hello World!\n\r"
 reboot_s: .string "Press any key to reboot..."

 . = _start + 0x0200 - 2
 .short 0x0AA55 #Boot Sector signature

The two strings which we want to display are declared, so that we can load their addresses for the printing function.
Then, the line:
. = _start + 0x0200 - 2
tells gas, the GNU Assembler, to move the location of the code being generated (the '.' variable) 512B after _start, and then 2B backwards, which is 2 bytes before the end of the MBR. At this point, we make gas emit the 0xAA55 signature, which tells the BIOS that the current disk is bootable, and we are done: this is our bootloader stub!

Now, the last part, is to actually compile this code. The GNU Compiling Toolchain is targeted at 32/64-bits executables, so by defaults it usually produces ELF Programs. Which we do not want here, as we just need a stream of bytes representing the instructions to be executed.

gas alone cannot do this: it creates headers and everything and cannot be disabled. But we can rely on ld, the GNU linker, asking it to produce a raw binary. So, the two steps are:

as boot.S -c -o boot.o

which creates an ELF executable, containing our code. Then:

ld --oformat binary --Ttext 0x7C00 -o boot.bin boot.o

which, with some magic, strips every header (--oformat binary) and relocates the code starting from the address 0x7c00 (--Ttext 0x7C00) for the text section, which (considering our source) contains our whole bootloader, whose first instruction jumps to the actual initialization code.

Making VirtualBox launch our boot.bin image, this is the actual outcome:

So, now, the next step is to write a second stage bootloader, and make the first stage one be able to load it and transfer control to it!

Sunday, December 18, 2011

Code Obfuscation (Example 1)

Following through the series of posts about code obfuscation and unmaintainable code, I want to show here a C code snippet:

#define x(y,z) z , y
#define y(z,x) x z
#define a(b,c) c##b
#define x167 }
#define x879 x167;
#define x798 '\0'x879
#define x98 {
#define _ a(ain,m)
#define __ ()
#define b(c,d) a(c,d)
#define w a(ar,ch)
w ____[] = x98 x(0154, x(0145, 0110)), 108, 111, 30+02, 87, 0x6f, 0162, 0x6c, 0x64, x798
w ___[] = x98 '%', 's', x(x798, '\n')
_ __ x98 y((___, ____),a(ntf,pri)); x167

This is indeed a full working program, and the techniques used here to screw the code are simple enough. What's the actual purpose of this program?

Compile & running it (or gcc -E) are good means to see in its internals.

No further explanation is being given here, as I consider this to be easy enough!

Monday, December 12, 2011

Converting PDF into Plain Text

At work I often receive pdf documents with no OCR information in it, and in order to modify/extract parts from them I need to convert them to plain text.

As a Unix user, I love the command line, and I have been searching for something like:

./pdf2text file.pdf

So this is what I have written in order to provide support for this useful command line tool. Is is based on tesseract, which is now under development by Google.
Remember that in order to interpret documents which are non-English, data files for the language must be installed separately, as most distributions just bring the English one with the main package.

As tesseract is able to work with TIFF images, the first tool to be developed is pdf2tif. If none is already available on your machine, you can use the following script which is based on ghostscript:

pdf2tif:

#!/bin/sh
# Derived from pdf2ps.
# Convert PDF to TIFF file.

OPTIONS=""
while true
do
        case "$1" in
                -?*) OPTIONS="$OPTIONS $1" ;;
                *) break ;;
        esac
shift
done

if [ $# -eq 2 ]
then
        outfile=$2
elif [ $# -eq 1 ]
then
        outfile=`basename "$1" .pdf`-%02d.tif
else
        echo "Usage: `basename $0` [-dASCII85EncodePages=false]
        [-dLanguageLevel=1|2|3] input.pdf [output.tif]" 1>&2
        exit 1
fi

# Doing an initial 'save' helps keep fonts from being flushed between pages.
# We have to include the options twice because -I only takes effect if it
# appears before other options.
gs $OPTIONS -q -dNOPAUSE -dBATCH -dSAFER -r300x300 -sDEVICE=tiffg3 "-sOutputFile=$outfile" $OPTIONS -c save pop -f "$1"

This script will be later invoked by out pdf2text, which will take as input a pdf file, which will create a temporary folder, convert every pdf page into a separate tif image, and then feed tesseract with them.

pdf2text:

#!/bin/sh

#!/bin/sh

# takes one parameter, the path to a pdf file to be processed.
# uses custom script 'pdf2tif' to generate the tif files,
# generates them at 300x300 dpi.
# then runs tesseract on them

mkdir $1-dir
cp $1 $1-dir
cd $1-dir

pdf2tif $1

for j in *.tif
do
        x=`basename $j .tif`
        tesseract ${j} ${x}
        rm ${x}.raw
        rm ${x}.map
        #un-comment next line if you want to remove the .tif files when done.
        #rm ${j}
done

cat *.txt > $1.txt
mv $1.txt ..

cd ..
rm -rf $1-dir

After its execution, a .txt file with the same basename as the pdf's will be created, containing the OCR'd text in it!

Saturday, December 3, 2011

Latex in HTML (revisited)

Some time ago I published a post where I was presenting an easy tool to be integrated into web pages which allows to render LaTeX equations just embracing the code into $ sings.

I never noticed how much I used the dollar sign until I developed this script!! It has texified so much text which wasn't expected to be, that I decided to change the approach a bit. Now, Instead of parsing the whole document, I'm just parsing the divs the class of which is "latex".

So now, depending on the class of the actual container of the text I can have this behaviour:

$3 + 2 = 5$

or this one:

$3 + 2 = 5$

As easy as changing the div's class. The code in the previous post has been updated accordingly!

Saturday, November 26, 2011

Itaca

Quando ti metterai in viaggio per Itaca

devi augurarti che la strada sia lunga,

fertile in avventure e in esperienze.

I Lestrigoni e i Ciclopi

o la furia di Nettuno non temere,

non sarà questo il genere di incontri

se il pensiero resta alto e un sentimento

fermo guida il tuo spirito e il tuo corpo.

In Ciclopi e Lestrigoni, no certo,

ne' nell'irato Nettuno incapperai

se non li porti dentro

se l'anima non te li mette contro.

Devi augurarti che la strada sia lunga.

Che i mattini d'estate siano tanti

quando nei porti - finalmente e con che gioia -

toccherai terra tu per la prima volta:

negli empori fenici indugia e acquista

madreperle coralli ebano e ambre

tutta merce fina, anche profumi

penetranti d'ogni sorta; piu' profumi inebrianti che puoi,

va in molte città egizie

impara una quantità di cose dai dotti.

Sempre devi avere in mente Itaca -

raggiungerla sia il pensiero costante.

Soprattutto, non affrettare il viaggio;

fa che duri a lungo, per anni, e che da vecchio

metta piede sull'isola, tu, ricco

dei tesori accumulati per strada

senza aspettarti ricchezze da Itaca.

Itaca ti ha dato il bel viaggio,

senza di lei mai ti saresti messo

sulla strada: che cos'altro ti aspetti?

E se la trovi povera, non per questo Itaca ti avrà deluso.

Fatto ormai savio, con tutta la tua esperienza addosso

già tu avrai capito ciò che Itaca vuole significare.

- Kostantin Kavafis

Tuesday, November 8, 2011

Print Bits

I was handling a bitmap some time ago, and I came into the need to see what was the actual bit representation of the numbers I was using. So I wrote this function:

void print_bits(int number) {
        unsigned long mask = 0b10000000000000000000000000000000 ;
        char digit;

        while(mask) {
                digit = ((mask & number) ? '1' : '0');
                putchar(digit);
                mask >>= 1;
        }
}

This function accepts an integer, and prints on screen it's actual binary representation. Basically it takes a mask whit only the most significant bit set, bitwise ands it with the number and prints either 0 or 1 depending on that bit's value. Then it shifts the mask one position right and iterates, until all the bits are shown.

The 'b' in the mask is a GNU extension, which tells that the number is represented directly in binary. If you are not using gcc, or don't like that series of 0's, simply replace it with the hex counterpart: 0x80000000.

This works both in 32 and 64 bits on Intel x86, as integers there are both 32 bit wide. If you want to deal with a larger number representation, either change the initial mask, or set it to 0x1 and change the shift from >> to <<.

You can use this code to see how numbers are represented. With some minor changes, you can use it to show any representation, be that a float or a more complex struct, for example by dereferencing a pointer.

Monday, October 31, 2011

Multithread Barrier (yes, it's a join!)

This post doesn't want to clearify all aspects of parallel programming. It's just an example to show how you have to thinkg when writing parallel code!

When you deal with concurrent programming, everything is going to be screwed up very easily. Debugging is a pain in the ass, and every one who has ever written a multithreaded software, although very simple as a concurrent server, has experimented with such a pain.

One fundamental operation in multithreaded programs is synchronization. In particular, one operation which I consider very important is a thread barrier, that is a line of code in the program which guarantees that if a thread is executing before it, no one else is executing after it, and if there is some thread after it, then every other thread is, at least, executing that line, but not anything before...

Friday, October 7, 2011

LaTeX in HTML with Javascript and Google Charts

Using LaTeX in HTML documents is really useful. I'm myself using it in this blog.

Of course, creating images for each formula and manually embedding it would be a pain. So, why don't we automatize it? It would be really nice to use them as in LaTeX documents, so simply writing them in between the dollar sign.

Well, several services exists on the Internet. I've been using the one offered by WatchMath for a while painlessly, until this summer when what everyone can expect when relying on an external service happened. I found that the formulas where not rendering anymore. I visited their page and found out that they changed the location of their script.

Too bad, fixing it was just as easy as pointing to the new location.

Nevertheless, today I came into the Mathematical Formulas page on Google Charts. This is a set of APIs Google has developed to support several things such as statistics and charts on Blogger, or equations in Google Documents. Luckily, they are exposing all of them to the public!

So the idea came to me immediately: why cannot I code a script to exploit this service and replace the formulas in my blog?! Google is not going to redirect its engine, likely! So this approach should not produce any service interruption!

I felt lazy today, so I decided to go for JQuery, instead of coding plain Javascript. As a first step, we have to "include" the JQuery library in our page. Since I am going to implement this into my blog, I'm not likely given any space to store the library downloaded from the official site. So, I went to Google to look for an online version to link to, and (guess what?!) I found it on Google APIs:

<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.6.4/jquery.min.js">

Now, let's look at how to convert formulas enclosed in dollar signs all within an HTML page:

(Oct 20th 2011 update: as a great dumbass, I used code which contains dollar signs... therefore the script was trying to texify portions of this code. I only noticed this few minutes ago and fixed it replacing the dollar signs with a dollar-sign image... Then, be careful if you copy&paste this code, as you will have to re-insert the dollars, or JQuery won't understand a bit!)

(Dec 3rd 2011 update: I never noticed how much I used the dollar sign until I developed this script!! It has texified so much text which wasn't expected to be, that I decided to change the approach a bit. Now, Instead of parsing the whole document, I'm just parsing the divs the class of which is "latex". The code in this post has been updated accordingly.)

The first thing I do is to specify some constants so that if I want to change the look&feel I can leave the code untouched. These are:

height (unspecified, which means: "Google, give me your default height", which is good for my blog);
background color (which is set as black, but completely transparent, this is what the extra 00's are there for);
text color (again, black).

Then I declare the convert_latex() function, which takes the <body>'s content and scans it using a regular expression, looking for all the occurrences of text enclosed in dollar signs. Every occurrence is replaced with the value returned by an ad-hoc function which takes the found text (not including the dollars), formats it accordingly to the Google Mathematical Formulas syntax and inserting our formatting, and inserts it in an <img> tag.

The last touch is to insert the following line:

$(document).ready(function() {convert_latex();});
which calls our function when the document is completely loaded. Of course, the last two snippets must be enclosed within <script> tags.

The final result is that now I can write $\LaTeX$ using Google!

Monday, October 3, 2011

Fortunato di essere un programmatore

Questo scritto è datato "21 ottobre 2008, ore 1.38" ed era apparso sul mio primo blog (oramai distrutto da qualche system restyle). Gli impegni di questi giorni ed una lunga telefonata di qualche giorno fa me l'hanno fatto tornare in mente. Sono andato in giro a ripescarlo ed eccolo qui. Ho solo riadattato qua e là lo stile da quello che usavo 3 anni fa a quello di oggi. Ma il senso decisamente non cambia...

Fortunato di essere un programmatore
pubblicato da Alessandro Pellegrini il giorno martedì 21 ottobre 2008 alle ore 1.38

Nelle scorse settimane, ho lavorato ad un progetto che richiedeva ogni tipo di sforzo di programmazione. Ora è a buon punto, quindi posso tornare (quasi!) agli impegni normali. Tuttavia, quando la gente mi sente parlare delle "ore di pazzia" che ho trascorso, spesso mi dice che le dispiace. Non dovrebbe!

Non è un tipo di vita che farei spesso, né per lunghi periodi, o senza un compenso adatto, ma la verità è che questi concentrati di programmazione sono alcuni tra i periodi che preferisco nella mia vita. Nelle giuste condizioni, scrivere software provoca un piacere talmente intenso che dovrebbe essere reso illegale.

Thursday, September 22, 2011

Backwards compatibility of PNG

If you design website templates, the possibility to use transparent images is very useful. In particular, relying on PNG images is somewhat awesome, because it can contain an alpha (transparency) channel, which tells the renderer how much every single pixel must be transparent. This allows, unlike GIF, to create smooth blendings of the images, no matter what the background color/image is.

Unfortunately, this format is not completely supported by every browser (especially by Internet Explorer, prior to version 7), even though the more modern ones are in step with the times. This problem was (is!) so annoying that the W3C Consortium has one page dedicated to inline transparency testing.

Compatibility issues are an important aspect when dealing with websites, most of all if you're running commercial pages.

Several strategies have been proposed to address the alpha channel issue in PNGs. The one I am proposing here is a javascript-based one.

The principle of this technique comes bundled in a filter included in Microsoft Internet Explorer since version 5.5, called AlphaImageLoader. It takes an image and displays it within the boundaries of a pre-existing image in the HTML DOM, giving support for PNG transparency.

The idea is now straightforward: a completely-blank 1px GIF image is displayed, adapted to the desidered size, and then the requested PNG is applied over it. This could be easily done with the following code:

Unfortunately, this is not as fair as we might think. In fact, this code is completely unportable: Chrome, Mozilla or Opera browsers don't implement AlphaImageLoader filter, so under those browsers, we would get just a blank image.

Then, we have to find a workaround to the workaround. This can be done through Microsoft IE's Conditional Comments. In short, what we're going to do, is to produce some javascript code, save it in a .js file, and include it in the html document as follows:

<!--[if gte IE 5.5000]>
<!--[if lte IE 7]>
<script language="JavaScript" src="pngtransp.js"></script>
<![endif]>
<![endif]>

The full source code for pngtransp.js can be found here. Let's now have a closer look at the interesting snippets, starting from the last line:

window.attachEvent("onload", correctPNG);

This is put there just to "arm the code": when the document is fully loaded, the correctPNG starts up automatically.
That function's purpose is to scan for all images in the document, and check whether they are PNGs or not. This is achieved as follows:

for(var i=0; i<document.images.length; i++)
if (imgName.substring(imgName.length-3, imgName.length) == "PNG")

Later on in the code, we retrieve all information about the image, such as its title, alignment, class name and so on.
Then, we generate new code for the image, which is:


var strNewHTML = "<span " + imgID + imgClass + imgTitle;

strNewHTML += " style=\"" + "width:" + img.width + "px; height:" + img.height + "px;" + imgStyle + ";";
strNewHTML += "filter:progid:DXImageTransform.Microsoft.AlphaImageLoader";
strNewHTML += "(src=\'" + img.src + "\', sizingMethod='scale');\"";
strNewHTML += onMouseOver + onMouseOut + "></span>";

in which we create a span with all the previously retrieved information, but with the AlphaImageLoader filter specified. This way, when we invoke:

img.outerHTML = strNewHTML;

the current PNG will be correctly displayed. The procedure is repeated for each PNG image in the document.

The rest of the code is just for overriding the commonly used Dreamweaver's functions to swap images on mouse over, and are not of deep interest to this arcticle.

This way of handling PNG will leave the images untouched if we're using a more "modern" browser, but will correct the alpha blending in more primitive ones.
Though, it has some drawbacks:

Doesn't work in IE versions earlier than 5.5, since AlphaImageLoader is not supported. No cure for those versions.
Javascript must be enabled: you can't assume this on all machines!
CSS image background with PNGs is not supported at all!
If you are displayng a huge document, before the onload event triggers, a while could pass, so at the beginning PNGs could be displayed with the unpleasant gray box around!

The only solution to correctly handle PNGs, is to switch definitively to another web browser!

Monday, April 11, 2011

Sei modi per mandare Linux in crash. Perché è divertente ed inutile!

Questo è uno di quei post assolutamente senza senso, ma che mi piacciono. È come giocare alla WII: non ha senso, ma ti (ci) piace!
Qui di seguito ci sono alcuni modi per mandare in crash un sistema Unix. Alcuni di essi sono ben conosciuti, come la Fork Bomb, altri sono più oscuri. Alcuni sono irreversibili, altri invece durano soltanto "una sessione", e dopo un bel riavvio tutti gli effetti sono spariti.

La Fork Bomb: è un classico che non può essere omesso da questo elenco. In effetti, gli viene dedicata persino un'intera pagina su Wikipedia!
:(){ :|: & };:
Questo comando, incollato in una shell, farà terminare alla macchina tutte quante le risorse, a meno che il numero di processi per utente non venga limitato in /etc/security/limits.conf.
Il comando seguente sovrascriverà l'MBR com dati (pseudo) casuali. In questo modo, si potrà essere sicuri che la propria macchina non riparta più (no, neppure con Windows).
dd if=/dev/urandom of=/dev/sda bs=512 count=1
Leggere dalle porte di I/O può avere dei simpatici effetti secondari. Provate ad eseguire questo comando, e capirete quello di cui sto parlando:
sudo less -f /dev/port
Il risultato sarà che la vostra macchina si inchioderà. Non ho ancora approfondito il motivo alle spalle di questo fenomeno, ma è comunque divertente!
Cosa succede se la memoria di un processo viene sovrascritta? Di solito, ci si imbatte in un segfault. Si può infilare un po' di divertimento (casino!) nella memoria di sistema:
cp /dev/zero /dev/mem
Quest'ultimo è diventato un cult, dal momento che un sacco di gente ci si è imbattuta (forse anche tu). In qualche modo si fa confusione, e si rimuove ogni singolo file dal disco rigido
rm -rf /*

In ultimo, si può utilizzare il potere del comando 'find', con l'argomento 'exec' che eseguirà il comando scritto immediatamente dopo. Questo è esattamente quello che può capitare quando si va di fretta.
find . -type f -name * -exec rm -f {} \;
Per fortuna, molti di questi comandi (eccezion fatta per la Fork Bomb), non danneggeranno il vostro computer se lanciati come utenti senza privilegi di amministratore, perché il sistema non vi lascerà accedere a file per i quali non avete i permessi.
Ovviamente, se si prova a scrivere qualche modulo del kernel, è molto probabile che esso vada in crash con la vostra approvazione! :) A quel punto, qualsiasi cosa è possibile, perché sarete diventati l'"Onnipotente"!

Saturday, March 12, 2011

Come scrivere codice incomprensibile: camuffaggio

Molte delle abilità alla base della scrittura di codice incomprensibile è l'arte del camuffagio: nascondere le cose, o farle apparire per quello che non sono. Molte di queste abilità dipendono da fatto che il compilatore è più bravo dell'occhio umano nel fare fini distinzioni.

Codice che si maschera da commento, e viceversa
Un'ottima idea è quella di includere del codice commentato, che però non sembra esserlo:

for(j = 0; j %lt; array_len; j += 8) { 
   total += array[j+0]; 
   total += array[j+1]; 
   total += array[j+2]; /* Il corpo principale
   total += array[j+3]; * del ciclo è espanso
   total += array[j+4]; * per aumentarne la
   total += array[j+5]; * velocità
   total += array[j+6]; */
   total += array[j+7]; 
}

Se non ci fosse il coloratore della sintassi, qualcuno si accorgerebbe mai che quattro righe di codice sono commentate?

Wednesday, March 9, 2011

Showing a Tree

In these GUIful days, where everything we do on a computer involves using the mouse pointer, sometimes there comes situations where we want an old-fashioned visualization of data.

For example, in these days I'm working hard on the restyling of an application which be released soon. This entails cutting old functions, adding comments, rewriting Makefiles, and rearranging the directory tree. Yes, the boring part off programming.

For easiness, I decided to print the whole directory tree listing of my project, to better visualize the changes I have to apply to my project. Thus, before coming into tree(1) I opted for the hard-coded solution, and produced this command:

ls -R | grep ":" | sed -e 's/://' -e 's/[^-][^\/]*\//--/g' -e 's/^/   /' -e 's/-/|/'

Which will draw the currents' directory tree listing. Surely, using tree is much easier and more efficient, but this old-fashioned approach is what I like most!

Monday, February 21, 2011

Intercepting Keystrokes in any Application

This post is about one simple question: is Windows' software secure?

The answer is: no, it's not.

In the next few lines, I will show you how to easily write some code which will allow you to intercept text which is being typed into any other application! In particular, I will illustrate how to write a simple dll which will delete any lowercase vowel typed by the user...

And yes, this is one of the techniques behind keyloggers...

Since these are multimedial days, here there is a video showing how this dll can interact with a common application (Microsoft Word):

Sunday, April 25, 2010

Pippo e le metavariabili

Qualsiasi programmatore si scontra, neanche troppo tardi, con la necessità di spiegare qualche algoritmo a qualche amico.

Se eccettuiamo casi di alto livello (come algoritmi sui grafi, o sui flussi, o di programmazione dinamica), il metodo migliore è quasi sempre partire dal codice.

Ed immediatamente, ecco che ci si imbatte in qualche variabile d'appoggio, di cui capiamo perfettamente il senso, ma a cui non sappiamo che nome dare...

Ed ecco quindi che, qualsiasi programmatore, ricorre agli stra abusati pippo, pluto, ciao, mamma, o che dir si voglia...

Queste variabili sono chiamate variabili metasintattiche. Una variabile metasintattica è un nome usato per indicare un generico elemento all'interno di una determinata categoria; l'uso è analogo a quello che in matematica si fa del nome x o y e z per indicare tre valori o tre variabili qualsiasi; oppure, in numerosi contesti, all'uso di Tizio, Caio e Sempronio per indicare tre generiche persone.

Wednesday, February 3, 2010

Fortunato di essere un programmatore

Nelle scorse settimane, ho lavorato ad un progetto che richiedeva ogni tipo di sforzo di programmazione. Ora è arrivato a un buon punto, quindi posso quasi ritornare agli impegni normali, ma quando la gente mi sente parlare delle "ore di pazzia" che ho passato, spesso dice che le dispiace. Non dovrebbe. Non lo farei mai spesso, o per lunghi periodi, o senza un compenso adatto se lo facessi per lavoro, ma la verità è che questi "concentrati di programmazione" sono alcuni tra i periodi che preferisco nella mia vita. Sotto le giuste condizioni, scrivere software provoca piacere talmente intenso che dovrebbe essere reso illegale.

Molti programmatori ci si ritrovano, ma altri restano sorpresi quando lo sentono. Credo che sia perché le istituzioni sono brave a spremere via il divertimento da ogni cosa. Fa impallidire, ad esempio, come le scuole possano prendere gli argomenti più vivaci e sfigurarli in mediocri e ripetitive "sgobbate". Ed è così anche per la programmazione. Molto spesso si trasforma un'esperienza naturalmente gratificante in qualcosa che la gente può a malapena sopportare, in cambio di un voto o di un assegno.

Che peccato. Sono poche le cose migliori di passare del tempo in uno stato creativo, consumati dalle idee, osservando il proprio lavoro venire alla luce, andando a letto tardi, desiderosi di alzarzi presto per andare a vedere se le cose funzionano. Non voglio dire che un numero di ore eccessivo è necessario o consigliabile, dei ritmi sani sono un dovere, ma facendo eccezione per "brevi" periodi intensi. Il fatto è che programmare è un intenso piacere creativo, un miscuglio perfetto di rompicapi, scrittura ed artigianato.

La programmazione offre sfide intriganti ed ampio spazio per l'inventiva. Alcuni problemi sono di tipo investigativo e di riduzione: perché questo codice gira lento? Cosa diamine sta causando questo bug? Altri sono di tipo costruttivo, come progettare un algoritmo o un'architettura. Sono entrambi una delizia, se si ama il lavoro analitico, immersi in un mondo pieno di "bestie" come il malware, i router, le cache, i protocolli, i database, i grafi, le codifiche, le rappresentazioni di dati ed i numeri.

Questo lato analitico è quello che viene associato, dalla maggior parte delle persone, alla programmazione. La rende interessante, come un complesso gioco di strategia. Ma nella maggior parte del software la prima sfida è la comunicazione: con gli "amici programmatori" attraverso il codice, con gli utenti attraverso le interfacce. Con poche eccezioni, scrivere codice è un esperimento piuttosto che un rompicapo. È dare forma alle proprie idee e ai propri schemi in un corpo coerente; è andare alla ricerca della chiarezza, della semplicità e della sintesi. Sia il codice sia le interfacce abbondano della semplice gioia della creazione.

Un'altra fonte di piacere è che, sotto alcune circostanze, dalla programmazione sorge la bellezza. Potranno sembrare cazzate, ma è la verità: quel tipo di cose che rendono una giornata migliore. Prendiamo per esempio la dimostrazione di due righe, di Euclide, che i numeri primi sono infiniti. Credo che molti la troverebbero bella, un risultato così succinto ed affascinante. Questa è la bellezza della matematica, fredda e austera, e questa stessa bellezza pervade il software. È insita in algoritmi intelligenti come il quicksort, nei sorgenti dei kernel e dei compilatori, in eleganti exploit e nei trucchi che ci inventiamo per risolvere i problemi di ogni giorno. Quando si osservano queste soluzioni, siano esse algoritmi famosi o trucchi di tutti i giorni, si sorride e si esclama "geniale!", e uno si sente bene. Il nobile ragionamento!

Nei programmi esiste anche una bellezza non matematica, analoga all'eloquenza nei discorsi. È presente nel software ben congegnato, che fa molto con poco codice, in metodi brevi e frizzanti, in architetture ben fatte. Alcuni linguaggi rendono tutto ciò difficile da fare, e non tutti i programmatori ne sono in grado, ma è un piacere leggere e lavorare su questo tipo di codice. Se si lavora con un linguaggio espressivo o su del codice che piace, spesso sembra che le cose si illuminino.

Ora, per quanto riguarda l'artigianato: in un certo senso il software è astratto. Dov'è che esiste il comportamento di un programma se non nella nostra mente? Tuttavia, si dice che il software viene costruito (built) per un motivo ben preciso: ai programmi viene data una forma, funzionalità per funzionalità, le architetture nascono come impalcature e poi crescono, le interfacce vengono assemblate, i bug vengono corretti ed i punti critici vengono ottimizzati per far sì che le cose girino più velocemente. Il software dà un profondo e soddisfacente "senso di manufatto". Le cose vengono costruite, a partire da idee pure, e poi le si osservano risolvere problemi reali, permettendo alle persone di stare un po' meglio. O molto meglio, in certi casi.

Ad esempio, la biologia. Dopo circa 400 anni di rivoluzione scientifica, la biologia non è stata in grado di svilupparsi riguardo a problemi cruciali come cure efficaci per infezioni virali o il cancro. Alcuni dei progressi migliori, come gli antibiotici, sono dovuti al caso oppure ad esperimenti casuali. Si avvia una sperimentazione clinica su un farmaco per l'ipertensione ed all'improvviso... Wow! Tutti i soggetti hanno un'erezione. E così è nato il Viagra. Sicuramente il caso gioca un ruolo fondamentale in ogni progetto di ricerca, ma se la fisica e la chimica hanno delle basi teoriche corpose, la biologia è stata confinata nell'ambito delle soluzioni improvvisate. Si vuole trattare il cancro? Ecco, bombardiamo il paziente con radiazioni e veleni, magari il cancro muore per primo! Sono soluzioni improvvisate anche brillanti, e sono contento che ci siano, ma non è nemmeno lontanamente paragonabile alla precisione che abbiamo in altri campi.

Il software sta cambiando questa situazione. Soltanto 50 anni fa fu scoperta la forma del DNA, ma ora chiunque può cercare su Internet e scaricare centinaia di sequenze complete di genomi. Oppure consultare centinaia di geni (il DLEC1 per fare un esempio preso a caso), completi di sequenze di nucleotidi, sequenze di amminoacidi per le proteine, bibliografia che tratta del gene, basta chiedere! Oppure cercare in database enormi di geni e proteine, fornendo come chiavi sequenze di nucleotidi o amminoacidi, magari dopo averle tirate fuori da qualche strumento ancora più economico, ottenendo un rapporto completo sui risultati. E non importa se le informazioni sono tutte esatte, perché l'algoritmo in BLAST, lo strumento standard di ricerca delle sequenze, tira fuori risultati parziali da database, ordinandoli per importanza. Questi avanzamenti permetteranno alla medicina di tirare fuori risultati enormi. La biologia sta per entrare in una nuova era, come la fisica del XVIII secolo, spinta in avanti dal software.

Sì, senz'altro, i biologi hanno un ruolo minore (:P), ma noi ne abbiamo uno importante nel permettere sviluppi maggiori nella scienza, nella cultura e nell'economia. Quando un bambino del terzo mondo consulta una pagina di Wikipedia, è anche merito dei programmatori! Siamo noi ad aver scritto le RFC, gli stack delle reti, i browser, MediWiki, i sistemi operativi, ed i server HTTP. Senza contare un gran numero delle pagine di Wikipedia stessa. L'influenza dei programmatori va oltre i bit e i byte: è stato un programmatore ad aver inventato i wiki, e la comunità degli informatici ha dato il via ai blog. Henry Mencken ha giustamente fatto notare che "la libertà di stampa è limitata a chi possiede una testata". È un peccato che ora non sia qui attorno ad osservare le nostre creazioni che rompono l'opprimente conformismo ed il servilismo sociale del giornalismo professionale. Con meno stile, ma con grandi benefici, si può dire che le applicazioni hanno portato crescenti guadagni in termini di produttività alle economie. E questi sono solo pochi esempi, da una possibile lunga lista.

Negli ultimi tre anni di università, molte esperienze mi hanno fatto pensare male dei computer e di quello che c'è attorno ad essi, ogni tanto (raramente!) facendomi anche passare la voglia di proseguire su questa strada. Ora sono contento di averci sbattuto la testa e di aver raggiunto un buon livello nella scrittura del software. Anche se probabilmente mamma pensa ancora che io scriva cose senza senso, ma vabbè, che ci posso fare! :)

Se vi trovate in una situazione che sta per uccidere la vostra innata passione per la tecnologia, senza ombra di dubbio, smuovetevi! Non fissatevi, mentre la vostra passione si affievolisce. È difficile trovare in giro gente motivata, in qualunque ambito! Per tutti quelli che pensano che programmare possa essere una cosa interessante, a livello economico i guadagni non saranno necessariamente alti. Ma credo che sia una delle carriere giuste. Non è soltanto per le prospettive di lavoro fighe che uno può avere, ma è perché, dal momento che il ruolo del software nella società cresce, si potranno vedere molti più benefici per le persone. Sono felice di essere in gioco, perché così posso cercare costantemente di migliorare la mia arte e la mia tecnica per un ideale.

Becoming a Nerd in Polynomial Time

Thursday, June 20, 2013

Good programming

Thursday, June 7, 2012

Write your own Kernel: BootLoader stub

Sunday, December 18, 2011

Code Obfuscation (Example 1)

Monday, December 12, 2011

Converting PDF into Plain Text

Saturday, December 3, 2011

Latex in HTML (revisited)

Saturday, November 26, 2011

Itaca

Tuesday, November 8, 2011

Print Bits

Monday, October 31, 2011

Multithread Barrier (yes, it's a join!)

Friday, October 7, 2011

LaTeX in HTML with Javascript and Google Charts

Monday, October 3, 2011

Fortunato di essere un programmatore

Thursday, September 22, 2011

Backwards compatibility of PNG

Monday, April 11, 2011

Sei modi per mandare Linux in crash. Perché è divertente ed inutile!

Saturday, March 12, 2011

Come scrivere codice incomprensibile: camuffaggio

Wednesday, March 9, 2011

Showing a Tree

Monday, February 21, 2011

Intercepting Keystrokes in any Application

Sunday, April 25, 2010

Pippo e le metavariabili

Wednesday, February 3, 2010

Fortunato di essere un programmatore