Becoming a Nerd in Polynomial Time: June 2012

Monday, June 18, 2012

Multilanguage Posts in Blogger

A common use-case is the one where we want to broadcast multiple language content on a blog.

Blogger itself does not provide such a facility, but I'm here presenting a javascript hack which allows posts to be written in multiple languages, and have flags on top of the page to switch among them.

First of all, you have to write within a single post a different version for every language which you want to support. To do this, you must first set the correct environment. When editing a post, you should switch to HTML view first:

Then, suppose you are writing content in Italian and English, you should add this skeleton:

<div class="lang:italian">
Questo è del contenuto in Italiano
</div>
<div class="lang:english">
This is English content
</div>

Then, switching back to Compose mode will show you the two sentences. Replace them with the actual content you want to display in your blog post.

Then, go to the Layout Editor of your blog and add a new HTML/JavaScript block on top of your blog posts:

Enter there the following code:

<script language="Javascript">
function getElementsByClass(searchClass,node,tag) {
        var classElements = new Array();
        if ( node == null )
                node = document;
        if ( tag == null )
                tag = '*';
        var els = node.getElementsByTagName(tag);
        var elsLen = els.length;
        var pattern = new RegExp('(^|\\\\s)'+searchClass+'(\\\\s|$)');
        for (i = 0, j = 0; i < elsLen; i++) {
                if ( pattern.test(els[i].className) ) {
                        classElements[j] = els[i];
                        j++;
                }
        }
        return classElements;
}

function show_en() {
        var it = getElementsByClass('lang:italian', null, null);
        var en = getElementsByClass('lang:english', null, null);

        for(i = 0; i < it.length; i++) {
                it[i].style.display = 'none';
        }

        for(i = 0; i < en.length; i++) {
               en[i].style.display = 'block';
        }
}

function show_it() {
        var it = getElementsByClass('lang:italian', null, null);
        var en = getElementsByClass('lang:english', null, null);

        for(i = 0; i < it.length; i++) {
                it[i].style.display = 'block';
        }

        for(i = 0; i < en.length; i++) {
               en[i].style.display = 'none';
        }
}

</script>


<a href="#" onclick="show_en();"><img src="PATH-TO-EN-FLAG" /></a>
<a href="#" onclick="show_it();"><img src="PATH-TO-IT-FLAG" /></a>

This adds at the top of you posts two flag images, which can be clicked, showing either italian or english content. Now, the last bit of editing entails activating either language as the default one (otherwise, you will see both languages, until one of the two flags is clicked).

To do so, open you Blog's model editor and click "Edit HTML".

A popup telling you about the dangers of this operation will show up. Tell blogger to take it easy, and start editing your model. You should find the <body> tag, and add the following attribute in it:

onload='show_en();'

Selecting which is the default language which you want to display.

For a live preview of how this works, point your browser to CranEntertainment's blog!

Sunday, June 17, 2012

Relative Date in LaTeX

It's usual, when writing documents, to use sentences like "I have been doing something since MM/YYYY". This is no problem.

If, on the other hand, you want to write a sentence like "I have been doing something for XX years and YY months", every time you update your document you have to remember to change that sentence.

When writing documents in LaTeX, you can work this problem out using a relatively simple command:

\usepackage{datenumber}
\newcounter{dateone}
\newcounter{datetwo}
\newcommand{\difftoday}[3]{%
      \setmydatenumber{dateone}{\the\year}{\the\month}{\the\day}%
      \setmydatenumber{datetwo}{#1}{#2}{#3}%
      \addtocounter{datetwo}{-\thedateone}%
      \the\numexpr-\thedatetwo/365\relax\space years and
      \the\numexpr(-\thedatetwo - (-\thedatetwo/365)*365)/30\relax\space months
}

The package datenumber allows to handle dates as timestamps, therefore we can use counters to perform operations on them (namely, dateone and datetwo). We set dateone to the current date, and datetwo to the user-specified date.

Then, in our document we can use:

\difftoday{2002}{01}{01}

And, say, if we are on March 3rd, 2012, it will render as "10 years and 2 months".

Multilanguage Documents with TeX

Today I have been working on a LaTeX document which was supposed to produce a document in two versions, one in Italian, another in English. It was a technical report, so a lot of stuff, but not so much text.

The classic way to approach this is to have the document written in one language, and then translate it into the second. This can be fair, until you have to apply changes. Editing two documents in a row can be too much effort-consuming.

Luckily enough, $\LaTeX$ is a programming language, before an office automation tool. So we can use conditional variables to fulfill this task. So, the skeleton of my source looks like this:

\documentclass[10pt]{article}

\newif\ifit
\newif\ifen

\newcommand{\langit}[1]{\ifit#1\fi}
\newcommand{\langen}[1]{\ifen#1\fi}

\entrue
%\ittrue


\ifen\usepackage[english]{babel}\fi
\ifit\usepackage[italian]{babel}\fi

\begin{document}

\langen{This is English text}
\langit{Questo è del testo in italiano}

\end{document}

Basically, I declare two conditional variables, \ifit and \ifen which select which language code must be generated. By uncommenting either \entrue or \ittrue, the source will produce the output in English or in Italian.

To ease the task of writing the TeX source, two new commands are declared: \langit and \langen which accept one parameter (i.e., the text in the corresponding language) and output it only if the corresponding conditional variable is set.

Additionally, depending on which conditional variable is set, the babel package is loaded with the corresponding language.

This allows to work on one single TeX source (which decreases the maintainability effort), but allows to produce documents in multiple languages. Adding new languages is just a matter of creating new variables and commands.

Another interesting this, which does not appear in this example, is the charset. Some languages have different charsets, so it would be interesting to set it accordingly. For the italian case, it would entail adding before the document's begin, the following code:

\ifit\usepackage[utf8]{inputenx}\fi

Friday, June 8, 2012

Transit of Venus

My small contribution to the 2012 Transit of Venus.

Thursday, June 7, 2012

Write your own Kernel: BootLoader stub

Since I was young, one of my dreams was to write my own operating system.
It was still the DOS era, and as a young boy I started studying lot of tech stuff trying to figure out what was really happening inside my computer.

Actually, at the time, it was too much 'tech' stuff for me, I didn't have the necessary knowledge to understand the subtle details behind a low-level programming approach.

Then I took my Computer Engineering degree, the technical details are no longer obscure, but since then, the old dream remained just... an old dream!

Yesterday, while messing up with a broken hard disk trying to recover some data, a funny idea came up into my mind, and I launched:

dd if=/dev/sda of=MBR count=1 bs=512 > MBR

This dumped into a file the content of that disk's Master Boot Record (MBR). Of course, I couldn''t resist this, and I disassembled it:

objdump -D -b binary -mi386 -Maddr16,data16 MBR

While staring at that code, my old dream came back to me, and so I started messing around a bit. The outcome, was a small bootloader stub, which is actually working, and which I am here describing... In the future, I will surely spend some time extending this, but this will remain just a project-for-fun!

And yes, this is somewhat reinventing the wheel, but for a low-level programmer, this is really funny!

So, the work setting is this:

i386/x86_64 target machines
C and Assembly languages
standard GNU compilation toolchain (gcc, gas, ld, make, ...)
VirtualBox for early testing, then I will switch to a legacy old machine to real hardware test...

As for VirtualBox configuration, I have set a 40MiB virtual IDE Hard Disk, FAT32 partitioned, for future development (my legacy machine will mount a similar one), and a floppy device loading a binary raw image (the actual bootloader + early kernel).

So, let's now start analyzing the BootLoader's early stub. This is a 'stage 1' bootloader which simply prints a "Hello World" message, waits for a keystroke and then reboots the system. No actual loading of any stage 2 loader, nor mode change.

Given the hardware architecture we are targeting, we have some constraints about the bootloader to keep in mind for writing it successfully:

The bootloader is loaded by the BIOS (which is loaded into RAM at startup as well), given some preconditions
Both code and data must be placed within one sector of the booting disk (the MBR), which is 512B
The last two bytes of this sector must have the value 0xaa55, otherwise the sector will not be considered as bootable
The BIOS will load the bootloader at address 0x0000:0x7c00, so the code must be relocated to that address, otherwise it won't work.
The %dl register will contain the disk number, useful for reading additional data from it. Nevertheless, I am ignoring this up to this up to this stage.
Of course, we run in real mode, so we have 16-bit code and we can do almost everything! :)

So, our MBR will contain both code and data. in the future, it will contain some partition table as well, but since it is stored into a floppy disk, we must provide a Disk Description Table (DDT) as well, to make it a valid floppy. Additionally, the first byte of the MBR will be part of the first instruction which will get executed, so we have to properly merge these things. The beginning of our boot1.S code is therefore this:

 .code16
 .text

.globl _start; _start:

 jmp stage1_start
 
OEMLabel:  .string "BOOT"
BytesPerSector:  .short 512
SectorsPerCluster: .byte 1
ReservedForBoot: .short 1
NumberOfFats:  .byte 2
RootDirEntries:  .short 224
LogicalSectors:  .short 2880
MediumByte:  .byte 0x0F0
SectorsPerFat:  .short 9
SectorsPerTrack: .short 18
Sides:   .short 2
HiddenSectors:  .int 0
LargeSectors:  .int 0
DriveNo:  .short 0
Signature:  .byte 41   #41 = floppy
VolumeID:  .int 0x00000000   # any value
VolumeLabel:  .string "myOS       "
FileSystem:  .string "FAT12   "

 stage1_start:

The .code16 and .text are things to make GNU assembler (gas) produce valid code. In particular, .code16 tells the assembler to produce 16-bits code (the default would be 32-bits, of course!) and .text ensures that we have everything into one single section (we don't actually mind which one, as they will be stripped, later on). _start is an actual symbol which describes the entry point for the executable, but we will be using this in a different way. The first instruction, jmp, tells the machine to skip "executing" the DDT, so we can have a correctly executing program, keeping the correct format for a floppy (of course, this bites some bytes out of the small 512B available space).

The code now must setup the runtime stack, since it will allow using the "call" instruction for calling subroutines. This looks like this:

 cli              
 movw $0x07C0, %ax
 addw $544, %ax  # 8K buffer
 mov %ax, %ss
 movw $4096, %sp
 sti

This creates a 4KiB stack space above the bootloader, which is (hardcoded) 8KiB large (544 paragraphs). We use the cli and sti couple, in order to disable and then re-enable interrupts, since it is not safe here to perform any interrupt operation, before having the stack correctly set up.

Then, we display our dummy message, "Hello World", and then reboot.

 cld
 movw $hello_s, %si
 call  print_string
 jmp reboot
 hlt

cld clears the direction flag, so that the internal implementation of print_string will read the string placed into %si from the beginning to the end. The final hlt is actually never executed, but placing it there is a good practice, nevertheless. So, let's see how do print_string function and reboot subroutine work.

1:
 movw $0x0001, %bx
 movb $0x0e, %ah
 int $0x10
print_string:
 lodsb
 cmpb $0, %al
 jne 1b
 ret

This routine is actually nicely optimized in space (remember, a bootloader suffers from space!). The funny part is that its entry point is in the middle of its code!
So, when we call it, a lodsb instruction gets executed, which loads one byte from %si into %al and increments %si by one: we read one character of the string from memory!
cmpb checks whether the byte just read is 0, i.e., if it is a NUL terminating character, the end of the string. If not, it goes executing from the 1: label.
There, we find the int $0x10 instruction, which generates an interrupt and searches into the Interrupt Vector Table the entry 0x10, which is associated with the BIOS teletype function family. The value 0x0e stored into %ah tells the BIOS to activate the screen printing function, which displays on screen exactly the caracter stored into %al, using the old-fashioned Codepage Font (which is usually hard-coded within the BIOS itself).
This process goes on until a '\0' is found in the string, then a ret is executed.

Rebooting the system is far more easy:

reboot:
 movw $reboot_s, %si
 call print_string
 movw $0,%ax
 int $0x16 # Wait for keystroke
 movw $0,%ax
 int $0x19 # Reboot the system

A nice string asking the user to press any key is shown, using the same function as before. Then, a keystroke is waited for, using the BIOS int $0x16 which, having %ax == 0, activates the "get keystroke" BIOS function. The BIOS does not return from the interrupt routine until a key is pressed, and then int $0x19 is executed, i.e. the "bootstrap loader" interrupt.

The end of stage1.S goes like this:

 hello_s: .string "Hello World!\n\r"
 reboot_s: .string "Press any key to reboot..."

 . = _start + 0x0200 - 2
 .short 0x0AA55 #Boot Sector signature

The two strings which we want to display are declared, so that we can load their addresses for the printing function.
Then, the line:
. = _start + 0x0200 - 2
tells gas, the GNU Assembler, to move the location of the code being generated (the '.' variable) 512B after _start, and then 2B backwards, which is 2 bytes before the end of the MBR. At this point, we make gas emit the 0xAA55 signature, which tells the BIOS that the current disk is bootable, and we are done: this is our bootloader stub!

Now, the last part, is to actually compile this code. The GNU Compiling Toolchain is targeted at 32/64-bits executables, so by defaults it usually produces ELF Programs. Which we do not want here, as we just need a stream of bytes representing the instructions to be executed.

gas alone cannot do this: it creates headers and everything and cannot be disabled. But we can rely on ld, the GNU linker, asking it to produce a raw binary. So, the two steps are:

as boot.S -c -o boot.o

which creates an ELF executable, containing our code. Then:

ld --oformat binary --Ttext 0x7C00 -o boot.bin boot.o

which, with some magic, strips every header (--oformat binary) and relocates the code starting from the address 0x7c00 (--Ttext 0x7C00) for the text section, which (considering our source) contains our whole bootloader, whose first instruction jumps to the actual initialization code.

Making VirtualBox launch our boot.bin image, this is the actual outcome:

So, now, the next step is to write a second stage bootloader, and make the first stage one be able to load it and transfer control to it!

Wednesday, June 6, 2012

diff e patch in dieci minuti

Prima situazione: stai cercando di compilare un pacchetto dai sorgenti e scopri che qualcuno ha già fatto il lavoro per te, modificandolo un po' per farlo compilare sul tuo sistema. Il suo lavoro è disponibile come "patch", ma non sei sicuro di come si fa ad utilizzarlo. La risposta è che puoi applicare la patch ai sorgenti originali con un comando chiamato, appropriatamente, patch.

Seconda situazione: hai scaricato i sorgenti di un pacchetto open source e dopo un'oretta di piccoli cambiamenti, riesci a compilarlo sul tuo sistema. Vorresti rendere il tuo lavoro disponibile agli altri programmatori, oppure agli autori del pacchetto, senza dover ridistribuire tutto quanto il pacchetto modificato. Ti trovi quindi in una situazione in cui devi creare da te una patch, e lo strumento necessario è diff.

Questa è una breve guida a diff e patch, che ti aiuterà in queste situazioni descrivendo questi strumenti e come vengono utilizzati nella maniera più comune. Ti dirà abbastanza per cominciare ad utilizzarli subito. Dopo, potrai imparare da te i vari comandi aggiuntivi per il tuo piacere personale, utilizzando le pagine del manuale.

Applicare una patch con patch

Per applicare una patch ad un singolo file, spostati nella cartella dove è situato il file ed invoca patch:

patch < foo.patch

Queste istruzioni assumono che la patch sia distribuita in un formato unificato, che identifica il file al quale si deve applicare la patch. Se non è questo il caso, si può specificare il file da riga di comando:

patch foo.txt < bar.patch

Applicare delle patch ad intere cartelle (forse il caso più comune) è simile, ma si deve fare attenzione a specificare un "livello p". Questo vuol dire che, all'interno dei file di patch, i file cui applicare la patch sono identificati da percorsi che possono essere diversi ora che i file sono situati sul tuo computer, piuttosto che su quello in cui la patch è stata creata. Il livello p dice a patch di ignorare porzioni del percorso ai file, così da poterli identificare in maniera corretta. Nella maggior parte dei casi un livello p corrispondente a uno funziona, quindi si può usare:

patch -p1 < baz.patch

Ti dovresti spostare nella cartella principale dei sorgenti prima di lanciare questo comando. Se il livello uno non identifica correttamente nessun file cui applicare la patch, verifica il contenuto del file di patch per controllare i nomi dei file. Se trovi un nome del tipo:

/users/stephen/package/src/net/http.c

e stai lavorando nella cartella che contiene net/http.c, usa:

patch -p5 < baz.patch

In generale, aggiungi uno per ciascun separatore di cartella (la barra '/') che vuoi rimuovere dall'inizio del percorso, fino a che quello che resta è un percorso che esiste nella tua cartella di lavoro. Il valore che raggiungi, è il corrello livello p.

Per rimuovere una patch, utilizza il flag -R, ad esempio:

patch -p5 -R < baz.patch

Creare delle patch con diff

Utilizzare diff è molto semplice, sia che si stia lavorando con singoli file, sia che si stia lavorando su intere cartelle. Per creare una patch per un solo file, usa la forma:

diff -u original.c new.c > original.patch

Per creare una patch per un intero albero di cartelle, fanne una copia:

cp -R original new

Applica tutte le modifiche che intendi fare nella cartella new/. Dopo crea un file di patch con il seguente comando:

diff -rupN original/ new/ > original.patch

Questo è tutto quello che serve per incominciare ad usare diff e patch. Per altre informazioni, puoi sempre usare:

man diff
man patch

Freely translated from this original post, for my own convenience.