LibreOffice Development Talk at Triangle C++ Developer’s Group

It was a pleasure to have been given an opportunity to talk about LibreOffice development the other day at the Triangle C++ Developer’s Group. Looking back, what we went through was a mixture of hardship, accomplishments, and learning experience intertwined in such a unique fashion. It was great to be able to talk about it and hopefully it was entertaining enough to those of you who decided to show up to my talk.

Here is a link to the slides I used during my talk.

Thanks again, everyone!

Edit: Here is a PDF version of my slides for those of you who don’t have a program that can open odp files.

Ixion – threaded formula calculation library

I spent my entire last week on my personal project, by taking advantage of Novell’s HackWeek. Officially, HackWeek took place two weeks ago, but because I had to travel that week I postponed mine till the following week.

Ixion is the project I worked on as part of my HackWeek. This project is an experimental effort to develop a stand-alone library that supports parallel computation of formula expressions using threads. I’d been working on this on and off in my spare time, but when the opportunity came along to spend one week of my paid time on any project of my choice (personal or otherwise), I didn’t hesitate to pick Ixion.

Overview

So, what’s Ixion? Ixion aims to provide a library for calculating the results of formula expressions stored in multiple named targets, or “cells”. The cells can be referenced from each other, and the library takes care of resolving their dependencies automatically upon calculation. The caller can run the calculation routine either in a single-threaded mode, or a multi-threaded mode. The library also supports re-calculation where the contents of one or more cells have been modified since the last calculation, and a partial calculation of only the affected cells gets performed. It is written entirely in C++, and makes extensive use of the boost library to achieve portability across different platforms. It has currently been tested to build on Linux and Windows.

The goal is to eventually bring this library up to the level where it can serve as a full-featured calculation engine for spreadsheet applications. But right now, this project remains as an experimental, proof-of-concept project to help me understand what is required to build a threaded calculation engine capable of performing all sorts of tasks required in a typical spreadsheet app.

I consider this project a library project; however, building this project only creates a single stand-alone console application at the moment. I plan to separate it into a shared library and a front-end executable in the future, to allow external apps to dynamically link to it.

How it works

Building this project creates an executable called ixion-parser. Running it with a -h option displays the following help content:

Usage: ixion-parser [options] FILE1 FILE2 ...
 
The FILE must contain the definitions of cells according to the cell definition rule.
 
Allowed options:
  -h [ --help ]         print this help.
  -t [ --thread ] arg   specify the number of threads to use for calculation.  
                        Note that the number specified by this option 
                        corresponds with the number of calculation threads i.e.
                        those child threads that perform cell interpretations. 
                        The main thread does not perform any calculations; 
                        instead, it creates a new child thread to manage the 
                        calculation threads, the number of which is specified 
                        by the arg.  Therefore, the total number of threads 
                        used by this program will be arg + 2.

The parser expects one or more cell definition files as arguments. A cell definition file may look like this:

%mode init
A1=1
A2=A1+10
A3=A2+A1*30
A4=(10+20)*A2
A5=A1-A2+A3*A4
A6=A1+A3
A7=A7
A8=10/0
A9=A8
%calc
%mode result
A1=1
A2=11
A3=41
A4=330
A5=13520
A6=42
A7=#REF!
A8=#DIV/0!
A9=#DIV/0!
%check
%mode edit
A6=A1+A2
%recalc
%mode result
A1=1
A2=11
A3=41
A4=330
A5=13520
A6=12
A7=#REF!
A8=#DIV/0!
A9=#DIV/0!
%check
%mode edit
A1=10
%recalc

I hope the format of the cell definition rule is straightforward. The definitions are read from top down. I used the so-called A1 notation to name target cells, but it doesn’t have to be that way. You can use any naming scheme to name target cells as long as the lexer recognizes them as names. Also, the format supports a command construct; a line beginning with a ‘%’ is considered a command. Several commands are currently available. For instance the mode command lets you switch input modes. The parser currently supports three input modes:

  • init – initialize cells with specified contents.
  • result – pick up expected results for cells, for verification.
  • edit – modify cell contents.

In addition to the mode command, the following commands are also supported:

  • calc – perform full calculation, by resetting the cached results of all involved cells.
  • recalc – perform partial re-calculation of modified cells and cells that reference modified cells, either directly or indirectly.
  • check – verify the calculation results.

Given all this, let’s see what happens when you run the parser with the above cell definition file.

./ixion-parser -t 4 test/01-simple-arithmetic.txt 
Using 4 threads
Number of CPUS: 4
---------------------------------------------------------
parsing test/01-simple-arithmetic.txt
---------------------------------------------------------
A1: 1
A1: result = 1
---------------------------------------------------------
A2: A1+10
A2: result = 11
---------------------------------------------------------
A3: A2+A1*30
A3: result = 41
---------------------------------------------------------
A4: (10+20)*A2
A4: result = 330
---------------------------------------------------------
A5: A1-A2+A3*A4
A5: result = 13520
---------------------------------------------------------
A8: 10/0
result = #DIV/0!
---------------------------------------------------------
A6: A1+A3
A6: result = 42
---------------------------------------------------------
A9: 
result = #DIV/0!
---------------------------------------------------------
A7: result = #REF!
---------------------------------------------------------
checking results
---------------------------------------------------------
A2 : 11
A8 : #DIV/0!
A3 : 41
A9 : #DIV/0!
A4 : 330
A5 : 13520
A6 : 42
A7 : #REF!
A1 : 1
---------------------------------------------------------
recalculating
---------------------------------------------------------
A6: A1+A2
A6: result = 12
---------------------------------------------------------
checking results
---------------------------------------------------------
A2 : 11
A8 : #DIV/0!
A3 : 41
A9 : #DIV/0!
A4 : 330
A5 : 13520
A6 : 12
A7 : #REF!
A1 : 1
---------------------------------------------------------
recalculating
---------------------------------------------------------
A1: 10
A1: result = 10
---------------------------------------------------------
A2: A1+10
A2: result = 20
---------------------------------------------------------
A3: A2+A1*30
A3: result = 320
---------------------------------------------------------
A4: (10+20)*A2
A4: result = 600
---------------------------------------------------------
A5: A1-A2+A3*A4
A5: result = 191990
---------------------------------------------------------
A6: A1+A2
A6: result = 30
---------------------------------------------------------
(duration: 0.00113601 sec)
---------------------------------------------------------

Notice that at the beginning of the output, it displays the number of threads being used, and the number of “CPU”s it detected. Here, the “CPU” may refer to the number of physical CPUs, the number of cores, or the number of hyper-threading units. I’m well aware that I need to use a different term for this other than “CPU”, but anyways… The number of child threads to use to perform calculation can be specified at run-time via -t option. When running without the -t option, the parser will run in a single-threaded mode. Now, let me go over what the above output means.

The first calculation performed is a full calculation. Since no cells have been calculated yet, we need to calculate results for all defined cells. This is followed by a verification of the initial calculation. After this, we modify cell A6, and perform partial re-calculation. Since no other cells depend on the result of cell A6, the re-calc only calculates A6.

Now, the third calculation is also a partial re-calculation following the modification of cell A1. This time, because several other cells do depend on the result of A1, those cells also need to be re-calculated. The end result is that cells A1, A2, A3, A4, A5 and A6 all get re-calculated.

Under the hood

Cell dependency resolution

cell-dependency-graph
There are several technical aspects of the implementation of this library I’d like to cover. The first is cell dependency resolution. I use a well-known algorithm called topological sort to sort cells in order of dependency so that cells can be calculated one by one without being blocked by the calculation of precedent cells. Topological sort is typically used to manage scheduling of tasks that are inter-dependent with each other, and it was a perfect one to use to resolve cell dependencies. This algorithm is a by-product of depth first search of directed acyclic graph (DAG), and is well-documented. A quick google search should give you tons of pseudo code examples of this algorithm. This algorithm work well both for full calculation and partial re-calculation routines.

Managing threaded calculation

The heart of this project is to implement parallel evaluation of formula expressions, which has been my number 1 goal from the get-go. This is also the reason why I put my focus on designing the threaded calculation engine as my initial goal before I start putting my focus into other areas. Programming with threads was also very new to me, so I took extra care to ensure that I understand what I’m doing, and I’m designing it correctly. Also, designing a framework that uses multiple threads can easily get out-of-hand and out-of-control when it’s overdone. So, I made an extra effort to limit the area where multiple threads are used while keeping the rest of the code single-threaded, in order to keep the code simple and maintainable.

As I soon realized, even knowing the basics of programming with threads, you are not immune to falling into many pitfalls that may arise during the actual designing and debugging of concurrent code. You have to go extra miles ensuring that access to thread-global data are synchronized, and that one thread waits for another thread in case threads must be executed in certain order. These things may sound like common sense and probably are in every thread programming text book, but in reality they are very easy to overlook, especially to those who have not had substantial exposure to concurrency before. Parallelism seemed that un-orthodox to conventional minds like myself. Having said all that, once you go through enough pain dealing with concurrency, it does become less painful after a while. Your mind can simply adjust to “thinking in parallel”.

Back to the topic. I’ve picked the following pattern to manage threaded calculation.

thread-design

First, the main thread creates a new thread whose job is to manage cell queues, that is, receiving queues from the main thread and assigning them to idle threads to perform calculation. It is also responsible for keeping track of which threads are idle and ready to take on a cell assignment. Let’s call this thread a queue manager thread. When the queue manager thread is created, it spawns a specified number of child threads, and waits until they are all ready. These child threads are the ones that perform cell calculation, and we call them calc threads.

Each calc thread registers itself as an idle thread upon creation, then sleeps until the queue manager thread assigns it a cell to calculate and signals it to wake up. Once awake, it calculates the cell, registers itself as an idle thread once again and goes back to sleep. This cycle continues until the queue manager thread sends a termination request to it, after which it breaks out of the cycle and reaches the end of its execution path to terminate.

The role of the queue manager thread is to receive cell calculation requests from the main thread and pass them on to idle calc threads. It keeps doing it until it receives a termination request from the main thread. Once receiving the termination request from the main thread, it sends all the remaining cells in queue to the calc threads to finish up, then sends termination requests to the calc threads and wait until all of them terminate.

Thanks to the cells being sorted in topological order, the process of putting a cell in queue and having a calc thread perform calculation is entirely asynchronous. The only exception is that when referencing another cell during calculation, the result of that referenced cell may not be available at the time of the value query due to concurrency. In such cases, the calculating thread needs to block its execution until the result of the referenced cell becomes available. When running in a single-threaded mode, on the other hand, the result of a referenced cell is guaranteed to be available as long as cells are calculated in topological order and contain no circular references.

What I accomplished during HackWeek

During HackWeek, I was able to accomplish quite a few things. Before the HackWeek, the threaded calculation framework was not even there; the parser was only able to reliably perform calculation in a single-threaded mode. I had some test code to design the threaded queue management framework, but that code had yet to be integrated into the main formula interpreter code. A lot of work was still needed, but thanks to having an entire week devoted for this, I was able to

  • port the test threaded queue manager framework code into the formula interpreter code,
  • adopt the circular dependency detection code for the new threaded calculation framework,
  • test the new framework to squeeze lots of kinks out,
  • put some performance optimization in the cell definition parser and the formula lexer code,
  • implement result verification framework, and
  • implement partial re-calculation.

Had I had to do all this in my spare time alone, it would have easily taken months. So, I’m very thankful for the event, and I look forward to having another opportunity like this in hopefully not-so-distant future.

What lies ahead

So, what lies ahead for Ixion? You may ask. There are quite a few things to get done. Let me start first by saying that, this library is far from providing all the features that a typical spreadsheet application needs. So, there are still lots of work needed to make it even usable. Moreover, I’m not even sure whether this library will become usable enough for real-world spreadsheet use, or it will simply end up being just another interesting proof-of-concept. My hope is of course to see this library evolve into maturity, but at the same time I’m also aware that it would be hard to advance this project with only my scarce spare time to spend in.

With that said, here are some outstanding issues that I plan on addressing as time permits.

  • Add support for literal strings, and support textural formula results in addition to numerical results.
  • Add support for empty cells. Empty cells are those cells that are not defined in the model definition file but can still be referenced. Currently, referencing a cell that is not defined causes a reference error.
  • Add support for cell ranges. This implies that I need to make cell instances addressable by 3-dimensional coordinates rather than by pointer values.
  • Split the code into two parts: a shared library and an executable.
  • Use autoconf to make the build process configurable.
  • Make the expression parser feature-complete.
  • Implement more functions. Currently only MAX and MIN are implemented.
  • Support for localized numbers.
  • Lots and lots more.

Conclusion

This concludes my HackWeek report. Thank you very much, ladies and gentlemen.

Back from OOoCon 2007

I just came back from OOoCon 2007. It’s great to see those folks whom I had only interacted with on mailing lists and IRCs. And I have to say talking face-to-face is way, way, way different from talking with text only. I was particularly delighted to finally meet with Niklas Nebel of the Calc team. We’d been interacting in various places for ages, but had never met in person. Now we have. :-) It was a real shame to not see Eike Rathke and Daniel Rentz, but I’m sure we’ll have our chance some day.

Also pleased to meet and chat with Ricardo Cruz (IRC nick: blacksheep), who is a Google Summer of Code student working on VCL layout support mentored by Michael Meeks. It was such a joy to talk with him on various topics. His demo on VCL dialog resizing was mind-blowing!

I was pleasantly surprised to see the Mac port team (led by Eric Bachard et. al.) using the src-to-xml converter that I wrote during Novell’s Hack Week (which they referred to as the “Novell parser” during their talk). Glad to know it’s being useful to you all.

Of course, it’s always great to see my fellow Novell hackers, whom I don’t see on a daily basis because of our geographical constraint. The trip to the beach in Saturday afternoon, with Florian, Noel and Hubert was particularly fun. Water in the Mediterranean Sea was very cold, and the wines were fantastic. BTW, I’m always amazed by Fridrich’s language skill: you speak how many languages in total!? So far I’ve seen him speak in Czech, German, Catalan, French, and English.

Aside from meeting with various people, I was also able to hack during my stay in Barcelona on one data pilot feature I’d been working on. The hack continued from my flight to Barcelona, back from Barcelona, and I was finally able to make my cut while being stranded at the Philadelphia airport because of my flight delay. Isn’t modern technology great? You can be stranded at an airport and still being able to hack. :-)

Some pictures from the conference:

Mingling at breakfast on 1st day
Mingling at breakfast on 1st day

(from left) Back of Tor, Fong Lin, Michael, and Hubert (front)
(from left) Back of Tor, Fong Lin, Michael, and Hubert (front)

(from left) Michael, Noel, and Tor
(from left) Michael, Noel, and Tor

Eric Bachard giving talk on native Mac OS X port
Eric Bachard giving talk on native Mac OS X port

Petr Mladek talking about splitting OO.o source code into pieces
Petr talking about splitting OO.o source code into pieces

Kendy (Jan Holesovsky) on git
Kendy (Jan Holesovsky) on git (there is Jon’s follow-up to this talk).

Fong Lin and Tor posing for photo
Fong Lin and Tor posing for photo

I’ve uploaded all of my photos here.

Hack Week: Day 5 (Friday) – The last day

Well, today was the last day of Hack Week, but unfortunately I wasn’t able to squish the remaining 20% unconverted resource files like I planned to do yesterday. I squished only 5%. This brings my conversion success rate from yesterday’s 80% to 85%. I’m pretty happy with this result, however, considering that some of those resource files I tried to convert are not even dialog resource files.

Here is what I did today:

  • Fixed incorrect expansion of preprocessing macros. It just didn’t do the right thing when performing recursive macro expansion. This time I really got it right, but it consumed the majority of today’s hacking time. :-(
  • Reworked my expression evaluation code to fully support the reverse Polish notation (RPN). The absence of this feature caused a parse failure on some files because the position and the size of some widgets are given as a mathematical expression (e.g. (24 + 10)/2) instead of a single number. I got the RPN parser to work, but then I realized that I could have just used Python’s builtin eval function to evaluate a whole expression in one step. Well, duh! I learned how to code the RPN builder to evaluate an expression, though, which was fun exercise.

So, this concludes this week’s Novell Hack Week event. It was certainly fun, although I couldn’t do everything I wanted to do. I’ll be back to my normal hacking activities on next Monday.

Hack Week: Day 4 (Thursday) – The joy of preprocessing macros (not!)

Well, I didn’t have much huge achievement yesterday – day 4 of our Novell’s Hack Week. But here is a list of things I’ve done to improve the robustness of the converter script.

  • Added support to (semi-)correctly parse the preprocessing macros, both ones that take no arguments and ones that do take arguments, as well as ones that include other macros recursively.
  • Added support to parse header files, without which many preprocessing macros would be left undefined, thus causing a parse failure.
  • Added arithmetic support, again in the preprocessing macros.
  • Numerous bug fixes that were uncovered while working on the preprocessing macro support, as well as some re-write of the algorithms to make them work better.

My conclusion? Preprocessing macros are evil! Since macros are expanded before the source file is parsed, it has its own syntax rules that are different from the host language. A simple expansion is rather easy, but once they start taking arguments, recursively using other macros (or the combination of the two), things become a bit tricky. Anyway, the worst is over I hope…

With this improvement, I can now correctly convert 80% of all of the src files we have in our OO.o source tree. Hopefully I can squish the remaining 20% today.

To recap (for those who missed my previous Hack Week posts), I am working on writing a .src to .xml converter script to migrate the existing dialog resource files (which are statically designed) to new xml format that has layout information. The new xml files will be used as a starting point for re-designing all our existing dialogs for the new dialog layout engine in development.

Hack Week: .src converter to convert ~700 .src files

So, after some discussion with Ricardo, I have decided to take on the task of writing a converter script to convert ~700 .src files into xml files, which will be used as a starting point for re-designing each and every dialog for the new dialog layout engine. I was initially thinking about working on the dialog editor, but sounds like Ricardo has it under control. So better not mess with that. :-)

When writing a converter script, it of course involves parsing a source file in order to generate output. Typically there are two ways to go about this.

  1. Parse the source file partially for just the information you need using a flat search, and ignore the rest, or
  2. Parse the source file fully according to the syntax of the language, using a lexer-parser pattern.

The advantage of the first method is simplicity; it’s pretty easy to set up a simple regexp-based parser and start parsing. The disadvantage of it is that, once the parsing need grows, as you need to pick up more and more information, the parser code becomes complex with full of special case handling, and eventually requires a total re-write. Good luck with extending such code as the need grows even further.

The second method, while it takes a little upfront effort, is extensible once the framework is set up, and the code usually becomes better structured with only a minimum special case handling if designed correctly. This method is also well-suited for parsing a token-based language, where whitespace and linebreak characters are only for syntactic sugar and does not affect its semantics. For example, C/C++ and Java are token-based, while Python is not. Since the syntax of the src files is very similar to that of C, I’ve decided to use the second method for this task.

I spent yesterday and today writing this converter script from scratch (in Python), and I’ve come to a point where it parses a large number of src files and correctly generate their xml output files. Here is one example case.

The source file:

/*************************************************************************
 *
 *  OpenOffice.org - a multi-platform office productivity suite
 *
 *  $RCSfile: crnrdlg.src,v $
 *
 *  $Revision: 1.44 $
 *
 *  last change: $Author: ihi $ $Date: 2007/04/19 16:36:48 $
 *
 *  The Contents of this file are made available subject to
 *  the terms of GNU Lesser General Public License Version 2.1.
 *
 *
 *    GNU Lesser General Public License Version 2.1
 *    =============================================
 *    Copyright 2005 by Sun Microsystems, Inc.
 *    901 San Antonio Road, Palo Alto, CA 94303, USA
 *
 *    This library is free software; you can redistribute it and/or
 *    modify it under the terms of the GNU Lesser General Public
 *    License version 2.1, as published by the Free Software Foundation.
 *
 *    This library is distributed in the hope that it will be useful,
 *    but WITHOUT ANY WARRANTY; without even the implied warranty of
 *    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 *    Lesser General Public License for more details.
 *
 *    You should have received a copy of the GNU Lesser General Public
 *    License along with this library; if not, write to the Free Software
 *    Foundation, Inc., 59 Temple Place, Suite 330, Boston,
 *    MA  02111-1307  USA
 *
 ************************************************************************/
#include "crnrdlg.hrc"
ModelessDialog RID_SCDLG_COLROWNAMERANGES
{
    OutputSize = TRUE ;
    Hide = TRUE ;
    SVLook = TRUE ;
    Size = MAP_APPFONT ( 256 , 181 ) ;
    HelpId = HID_COLROWNAMERANGES ;
    Moveable = TRUE ;
     // Closeable = TRUE;   // Dieser Dialog hat einen Cancel-Button !
    FixedLine FL_ASSIGN
    {
        Pos = MAP_APPFONT ( 6 , 3 ) ;
        Size = MAP_APPFONT ( 188 , 8 ) ;
        Text [ en-US ] = "Range" ;
    };
    ListBox LB_RANGE
    {
        Pos = MAP_APPFONT ( 12 , 14 ) ;
        Size = MAP_APPFONT ( 179 , 85 ) ;
        TabStop = TRUE ;
        VScroll = TRUE ;
        Border = TRUE ;
    };
    Edit ED_AREA
    {
        Border = TRUE ;
        Pos = MAP_APPFONT ( 12 , 105 ) ;
        Size = MAP_APPFONT ( 165 , 12 ) ;
        TabStop = TRUE ;
    };
    ImageButton RB_AREA
    {
        Pos = MAP_APPFONT ( 179 , 104 ) ;
        Size = MAP_APPFONT ( 13 , 15 ) ;
        TabStop = FALSE ;
        QuickHelpText [ en-US ] = "Shrink" ;
    };
    RadioButton BTN_COLHEAD
    {
        Pos = MAP_APPFONT ( 20 , 121 ) ;
        Size = MAP_APPFONT ( 171 , 10 ) ;
        TabStop = TRUE ;
        Text [ en-US ] = "Contains ~column labels" ;
    };
    RadioButton BTN_ROWHEAD
    {
        Pos = MAP_APPFONT ( 20 , 135 ) ;
        Size = MAP_APPFONT ( 171 , 10 ) ;
        TabStop = TRUE ;
        Text [ en-US ] = "Contains ~row labels" ;
    };
    FixedText FT_DATA_LABEL
    {
        Pos = MAP_APPFONT ( 12 , 151 ) ;
        Size = MAP_APPFONT ( 179 , 8 ) ;
        Text [ en-US ] = "For ~data range" ;
    };
    Edit ED_DATA
    {
        Border = TRUE ;
        Pos = MAP_APPFONT ( 12 , 162 ) ;
        Size = MAP_APPFONT ( 165 , 12 ) ;
        TabStop = TRUE ;
    };
    ImageButton RB_DATA
    {
        Pos = MAP_APPFONT ( 179 , 161 ) ;
        Size = MAP_APPFONT ( 13 , 15 ) ;
        TabStop = FALSE ;
        QuickHelpText [ en-US ] = "Shrink" ;
    };
    OKButton BTN_OK
    {
        Pos = MAP_APPFONT ( 200 , 6 ) ;
        Size = MAP_APPFONT ( 50 , 14 ) ;
        TabStop = TRUE ;
    };
    CancelButton BTN_CANCEL
    {
        Pos = MAP_APPFONT ( 200 , 23 ) ;
        Size = MAP_APPFONT ( 50 , 14 ) ;
        TabStop = TRUE ;
    };
    PushButton BTN_ADD
    {
        Pos = MAP_APPFONT ( 200 , 104 ) ;
        Size = MAP_APPFONT ( 50 , 14 ) ;
        Text [ en-US ] = "~Add" ;
        TabStop = TRUE ;
        DefButton = TRUE ;
    };
    PushButton BTN_REMOVE
    {
        Pos = MAP_APPFONT ( 200 , 122 ) ;
        Size = MAP_APPFONT ( 50 , 14 ) ;
        Text [ en-US ] = "~Delete" ;
        TabStop = TRUE ;
    };
    HelpButton BTN_HELP
    {
        Pos = MAP_APPFONT ( 200 , 43 ) ;
        Size = MAP_APPFONT ( 50 , 14 ) ;
        TabStop = TRUE ;
    };
    Text [ en-US ] = "Define Label Range" ;
};

and here is the output after the conversion:

<modeless-dialog height="181" help-id="HID_COLROWNAMERANGES" hide="true" moveable="true" output-size="true" sv-look="true" text="Define Label Range" width="256" xmlns="http://openoffice.org/2007/layout" xmlns:cnt="http://openoffice.org/2007/layout/container">
    <vbox>
        <fixed-line id="FL_ASSIGN" height="8" text="Range" width="188" x="6" y="3"/>
        <ok-button id="BTN_OK" height="14" tab-stop="true" width="50" x="200" y="6"/>
        <list-box id="LB_RANGE" border="true" height="85" tab-stop="true" vscroll="true" width="179" x="12" y="14"/>
        <cancel-button id="BTN_CANCEL" height="14" tab-stop="true" width="50" x="200" y="23"/>
        <help-button id="BTN_HELP" height="14" tab-stop="true" width="50" x="200" y="43"/>
        <hbox>
            <image-button id="RB_AREA" height="15" quick-help-text="Shrink" tab-stop="false" width="13" x="179" y="104"/>
            <push-button id="BTN_ADD" def-button="true" height="14" tab-stop="true" text="~Add" width="50" x="200" y="104"/>
        </hbox>
        <edit id="ED_AREA" border="true" height="12" tab-stop="true" width="165" x="12" y="105"/>
        <radio-button id="BTN_COLHEAD" height="10" tab-stop="true" text="Contains ~column labels" width="171" x="20" y="121"/>
        <push-button id="BTN_REMOVE" height="14" tab-stop="true" text="~Delete" width="50" x="200" y="122"/>
        <radio-button id="BTN_ROWHEAD" height="10" tab-stop="true" text="Contains ~row labels" width="171" x="20" y="135"/>
        <fixed-text id="FT_DATA_LABEL" height="8" text="For ~data range" width="179" x="12" y="151"/>
        <image-button id="RB_DATA" height="15" quick-help-text="Shrink" tab-stop="false" width="13" x="179" y="161"/>
        <edit id="ED_DATA" border="true" height="12" tab-stop="true" width="165" x="12" y="162"/>
    </vbox>
</modeless-dialog>

These are the steps I take to convert each file. First, the source file is read character-by-character to get tokenized by the lexer class, and this is where the comments (both multi-line and single line) get stripped out and the preprocessing macros are defined. The tokens are then passed to the parser class to build a syntax tree (preprocessor macros are expanded here), which is then converted into an intermediate XML tree with names translated and some attribute types converted properly, such as the position and the size, which are originally given as MAP_APPFONT( a, b ) format. Also, some unnecessary information is discarded at this stage.

Once that’s done, it further translates the intermediate XML tree into another XML tree that has layout elements. The X and Y positions of each widget are used in order to layout the widgets properly by wrapping them with <vbox> and <hbox> elements as needed. The tree is then dumped into a stream of text, which is what you see above.

Unfortunately this task is not done yet. As it turns out, some src files even require inclusion of header files in order to be parsed correctly, which means I need to honor those #include "foo.hrc" header include directives. Right now, they are ignored. On top of that, there may also be cases where the #ifdef directives might need to be interpreted correctly, but so far ignoring them has not caused any side-effect.

I’m sure there are other problems I’ll encounter as I parse more src files, but I’d say the end is near. :-)

Hack Week: Helping make OO.o’s dialog resizable

So, this is day one for Novell’s Hack Week. This week, we, Novell hackers, are allowed to work on whatever project we like. And I chose to work on making VCL dialog resizable.

Michael Meeks already did the ground work, and all I’m trying to do is to do what I can in one week to expand on his work. This is also one of on-going GSoC tasks, so I’m also co-ordinating with the student who’s been assigned to work on this (his name is Ricardo Cruz) so that we won’t step on each other’s toes.

Here is what I did today. I added a wrapper code for a list box control so that I can actually use it in my resizable dialog and add items to it. Let’s show some screenshots here.

OO.o resizable dialog demo (small)

OO.o resizable dialog demo (large)

I posted two shots of the same, but differently-sized dialog just to show that it’s resizable. Pretty cool, huh? :-)

Oh, BTW, since I’m away from my normal business this week, I won’t be working on the OOXML filter. I’ll be back on my regular schedule on next Monday.

Trip to Prague

I just came back from a week-long trip to the City of Prague – the capital and largest city of the Chech Republic – to participate in Novell developer’s team summit. Just saying that I had such a good time is an understatement. It was a blast! The weird thing is that this was the first time I ever met with anybody who is involved in the OO.o project face-to-face, and instead of text-only communication that we normally conduct, talking with actual voice and seeing their physical face brings such a warm, pleasant feeling to the conversation.

Aside from the meetings we had in the office, we spent the evenings and the Saturday exploring the city. The most memorable moment of course is the “blackout” incident on Thursday.

Here is the story. We went by train to a small town outside of Prague on Thursday evening. The plan was to find a restaurant in that town to sit down, relax, eat and chat (the usual stuff). We got off the train, and headed for the first restaurant closest from the train station, but unfortunately it was closed. But hey, accidents happen all the time, so we immediately regrouped and headed for the second restaurant in town, thinking that the odds of two restaurants being closed were very low.

And guess what, the second restaurant was also closed! Jan, our trusted local guide, sensed that something was wrong, so he found someone local and asked him what was going on. We then found out that there was a power outage in that town, and as a result of that all their local restaurants were forced to close for the evening.

At that point, our only choices were either to wait one hour for the next train, or walk 6 kilometers to the next town and hope that the power outage didn’t reach there. We chose the latter.

Long story short, we ended up walking that long 6-kilometer trail through the woods to the next town, to finally find a restaurant! In retrospect, though, it was probably the best team-building exercise anyone could have come up with (plus a good exercise physically). :-) But I’d rather not go through that again. ;-)

This trip was actually my first visit to Europe, and I’m sure I’ll be back again. It’s sad that we don’t know when we will meet each other again the next time, but hopefully not too distant future.

Last day at SlickEdit

Today was officially my last day at SlickEdit. I was pretty busy all day trying to transfer knowledge to my coworkers so that they can pick up where I left off. I hope I did all I could to minimize disruption as a result of my departure. They were kind enough to do a farewell lunch for me, and even ice cream celebration in the evening! Thanks guys. I’ve really been my pleasure working with you all. I will certainly miss it.

So, starting tomorrow, I will be with Novell, working full-time on OpenOffice.org. I can’t thank Novell enough for giving me this opportunity, and I will do my best to help bring this wonderfull office suite to the next level! We have a lot of work to do, so let’s get started. :-)