June 2007 – Roundtrip to Shanghai via Tokyo

Hack Week: Day 5 (Friday) – The last day

June 30, 2007August 7, 2021 Kohei Yoshida

Well, today was the last day of Hack Week, but unfortunately I wasn’t able to squish the remaining 20% unconverted resource files like I planned to do yesterday. I squished only 5%. This brings my conversion success rate from yesterday’s 80% to 85%. I’m pretty happy with this result, however, considering that some of those resource files I tried to convert are not even dialog resource files.

Here is what I did today:

Fixed incorrect expansion of preprocessing macros. It just didn’t do the right thing when performing recursive macro expansion. This time I really got it right, but it consumed the majority of today’s hacking time. :-(
Reworked my expression evaluation code to fully support the reverse Polish notation (RPN). The absence of this feature caused a parse failure on some files because the position and the size of some widgets are given as a mathematical expression (e.g. (24 + 10)/2) instead of a single number. I got the RPN parser to work, but then I realized that I could have just used Python’s builtin eval function to evaluate a whole expression in one step. Well, duh! I learned how to code the RPN builder to evaluate an expression, though, which was fun exercise.

So, this concludes this week’s Novell Hack Week event. It was certainly fun, although I couldn’t do everything I wanted to do. I’ll be back to my normal hacking activities on next Monday.

Hack Week: Day 4 (Thursday) – The joy of preprocessing macros (not!)

June 29, 2007August 7, 2021 Kohei Yoshida2 Comments

Well, I didn’t have much huge achievement yesterday – day 4 of our Novell’s Hack Week. But here is a list of things I’ve done to improve the robustness of the converter script.

Added support to (semi-)correctly parse the preprocessing macros, both ones that take no arguments and ones that do take arguments, as well as ones that include other macros recursively.
Added support to parse header files, without which many preprocessing macros would be left undefined, thus causing a parse failure.
Added arithmetic support, again in the preprocessing macros.
Numerous bug fixes that were uncovered while working on the preprocessing macro support, as well as some re-write of the algorithms to make them work better.

My conclusion? Preprocessing macros are evil! Since macros are expanded before the source file is parsed, it has its own syntax rules that are different from the host language. A simple expansion is rather easy, but once they start taking arguments, recursively using other macros (or the combination of the two), things become a bit tricky. Anyway, the worst is over I hope…

With this improvement, I can now correctly convert 80% of all of the src files we have in our OO.o source tree. Hopefully I can squish the remaining 20% today.

To recap (for those who missed my previous Hack Week posts), I am working on writing a .src to .xml converter script to migrate the existing dialog resource files (which are statically designed) to new xml format that has layout information. The new xml files will be used as a starting point for re-designing all our existing dialogs for the new dialog layout engine in development.

Hack Week: .src converter to convert ~700 .src files

June 28, 2007August 7, 2021 Kohei Yoshida

So, after some discussion with Ricardo, I have decided to take on the task of writing a converter script to convert ~700 .src files into xml files, which will be used as a starting point for re-designing each and every dialog for the new dialog layout engine. I was initially thinking about working on the dialog editor, but sounds like Ricardo has it under control. So better not mess with that. :-)

When writing a converter script, it of course involves parsing a source file in order to generate output. Typically there are two ways to go about this.

Parse the source file partially for just the information you need using a flat search, and ignore the rest, or
Parse the source file fully according to the syntax of the language, using a lexer-parser pattern.

The advantage of the first method is simplicity; it’s pretty easy to set up a simple regexp-based parser and start parsing. The disadvantage of it is that, once the parsing need grows, as you need to pick up more and more information, the parser code becomes complex with full of special case handling, and eventually requires a total re-write. Good luck with extending such code as the need grows even further.

The second method, while it takes a little upfront effort, is extensible once the framework is set up, and the code usually becomes better structured with only a minimum special case handling if designed correctly. This method is also well-suited for parsing a token-based language, where whitespace and linebreak characters are only for syntactic sugar and does not affect its semantics. For example, C/C++ and Java are token-based, while Python is not. Since the syntax of the src files is very similar to that of C, I’ve decided to use the second method for this task.

I spent yesterday and today writing this converter script from scratch (in Python), and I’ve come to a point where it parses a large number of src files and correctly generate their xml output files. Here is one example case.

The source file:

/*************************************************************************
 *
 *  OpenOffice.org - a multi-platform office productivity suite
 *
 *  $RCSfile: crnrdlg.src,v $
 *
 *  $Revision: 1.44 $
 *
 *  last change: $Author: ihi $ $Date: 2007/04/19 16:36:48 $
 *
 *  The Contents of this file are made available subject to
 *  the terms of GNU Lesser General Public License Version 2.1.
 *
 *
 *    GNU Lesser General Public License Version 2.1
 *    =============================================
 *    Copyright 2005 by Sun Microsystems, Inc.
 *    901 San Antonio Road, Palo Alto, CA 94303, USA
 *
 *    This library is free software; you can redistribute it and/or
 *    modify it under the terms of the GNU Lesser General Public
 *    License version 2.1, as published by the Free Software Foundation.
 *
 *    This library is distributed in the hope that it will be useful,
 *    but WITHOUT ANY WARRANTY; without even the implied warranty of
 *    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 *    Lesser General Public License for more details.
 *
 *    You should have received a copy of the GNU Lesser General Public
 *    License along with this library; if not, write to the Free Software
 *    Foundation, Inc., 59 Temple Place, Suite 330, Boston,
 *    MA  02111-1307  USA
 *
 ************************************************************************/
#include "crnrdlg.hrc"
ModelessDialog RID_SCDLG_COLROWNAMERANGES
{
    OutputSize = TRUE ;
    Hide = TRUE ;
    SVLook = TRUE ;
    Size = MAP_APPFONT ( 256 , 181 ) ;
    HelpId = HID_COLROWNAMERANGES ;
    Moveable = TRUE ;
     // Closeable = TRUE;   // Dieser Dialog hat einen Cancel-Button !
    FixedLine FL_ASSIGN
    {
        Pos = MAP_APPFONT ( 6 , 3 ) ;
        Size = MAP_APPFONT ( 188 , 8 ) ;
        Text [ en-US ] = "Range" ;
    };
    ListBox LB_RANGE
    {
        Pos = MAP_APPFONT ( 12 , 14 ) ;
        Size = MAP_APPFONT ( 179 , 85 ) ;
        TabStop = TRUE ;
        VScroll = TRUE ;
        Border = TRUE ;
    };
    Edit ED_AREA
    {
        Border = TRUE ;
        Pos = MAP_APPFONT ( 12 , 105 ) ;
        Size = MAP_APPFONT ( 165 , 12 ) ;
        TabStop = TRUE ;
    };
    ImageButton RB_AREA
    {
        Pos = MAP_APPFONT ( 179 , 104 ) ;
        Size = MAP_APPFONT ( 13 , 15 ) ;
        TabStop = FALSE ;
        QuickHelpText [ en-US ] = "Shrink" ;
    };
    RadioButton BTN_COLHEAD
    {
        Pos = MAP_APPFONT ( 20 , 121 ) ;
        Size = MAP_APPFONT ( 171 , 10 ) ;
        TabStop = TRUE ;
        Text [ en-US ] = "Contains ~column labels" ;
    };
    RadioButton BTN_ROWHEAD
    {
        Pos = MAP_APPFONT ( 20 , 135 ) ;
        Size = MAP_APPFONT ( 171 , 10 ) ;
        TabStop = TRUE ;
        Text [ en-US ] = "Contains ~row labels" ;
    };
    FixedText FT_DATA_LABEL
    {
        Pos = MAP_APPFONT ( 12 , 151 ) ;
        Size = MAP_APPFONT ( 179 , 8 ) ;
        Text [ en-US ] = "For ~data range" ;
    };
    Edit ED_DATA
    {
        Border = TRUE ;
        Pos = MAP_APPFONT ( 12 , 162 ) ;
        Size = MAP_APPFONT ( 165 , 12 ) ;
        TabStop = TRUE ;
    };
    ImageButton RB_DATA
    {
        Pos = MAP_APPFONT ( 179 , 161 ) ;
        Size = MAP_APPFONT ( 13 , 15 ) ;
        TabStop = FALSE ;
        QuickHelpText [ en-US ] = "Shrink" ;
    };
    OKButton BTN_OK
    {
        Pos = MAP_APPFONT ( 200 , 6 ) ;
        Size = MAP_APPFONT ( 50 , 14 ) ;
        TabStop = TRUE ;
    };
    CancelButton BTN_CANCEL
    {
        Pos = MAP_APPFONT ( 200 , 23 ) ;
        Size = MAP_APPFONT ( 50 , 14 ) ;
        TabStop = TRUE ;
    };
    PushButton BTN_ADD
    {
        Pos = MAP_APPFONT ( 200 , 104 ) ;
        Size = MAP_APPFONT ( 50 , 14 ) ;
        Text [ en-US ] = "~Add" ;
        TabStop = TRUE ;
        DefButton = TRUE ;
    };
    PushButton BTN_REMOVE
    {
        Pos = MAP_APPFONT ( 200 , 122 ) ;
        Size = MAP_APPFONT ( 50 , 14 ) ;
        Text [ en-US ] = "~Delete" ;
        TabStop = TRUE ;
    };
    HelpButton BTN_HELP
    {
        Pos = MAP_APPFONT ( 200 , 43 ) ;
        Size = MAP_APPFONT ( 50 , 14 ) ;
        TabStop = TRUE ;
    };
    Text [ en-US ] = "Define Label Range" ;
};

/************************************************************************* * * OpenOffice.org - a multi-platform office productivity suite * * $RCSfile: crnrdlg.src,v $ * * $Revision: 1.44 $ * * last change: $Author: ihi $ $Date: 2007/04/19 16:36:48 $ * * The Contents of this file are made available subject to * the terms of GNU Lesser General Public License Version 2.1. * * * GNU Lesser General Public License Version 2.1 * ============================================= * Copyright 2005 by Sun Microsystems, Inc. * 901 San Antonio Road, Palo Alto, CA 94303, USA * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License version 2.1, as published by the Free Software Foundation. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, * MA 02111-1307 USA * ************************************************************************/ #include "crnrdlg.hrc" ModelessDialog RID_SCDLG_COLROWNAMERANGES { OutputSize = TRUE ; Hide = TRUE ; SVLook = TRUE ; Size = MAP_APPFONT ( 256 , 181 ) ; HelpId = HID_COLROWNAMERANGES ; Moveable = TRUE ; // Closeable = TRUE; // Dieser Dialog hat einen Cancel-Button ! FixedLine FL_ASSIGN { Pos = MAP_APPFONT ( 6 , 3 ) ; Size = MAP_APPFONT ( 188 , 8 ) ; Text [ en-US ] = "Range" ; }; ListBox LB_RANGE { Pos = MAP_APPFONT ( 12 , 14 ) ; Size = MAP_APPFONT ( 179 , 85 ) ; TabStop = TRUE ; VScroll = TRUE ; Border = TRUE ; }; Edit ED_AREA { Border = TRUE ; Pos = MAP_APPFONT ( 12 , 105 ) ; Size = MAP_APPFONT ( 165 , 12 ) ; TabStop = TRUE ; }; ImageButton RB_AREA { Pos = MAP_APPFONT ( 179 , 104 ) ; Size = MAP_APPFONT ( 13 , 15 ) ; TabStop = FALSE ; QuickHelpText [ en-US ] = "Shrink" ; }; RadioButton BTN_COLHEAD { Pos = MAP_APPFONT ( 20 , 121 ) ; Size = MAP_APPFONT ( 171 , 10 ) ; TabStop = TRUE ; Text [ en-US ] = "Contains ~column labels" ; }; RadioButton BTN_ROWHEAD { Pos = MAP_APPFONT ( 20 , 135 ) ; Size = MAP_APPFONT ( 171 , 10 ) ; TabStop = TRUE ; Text [ en-US ] = "Contains ~row labels" ; }; FixedText FT_DATA_LABEL { Pos = MAP_APPFONT ( 12 , 151 ) ; Size = MAP_APPFONT ( 179 , 8 ) ; Text [ en-US ] = "For ~data range" ; }; Edit ED_DATA { Border = TRUE ; Pos = MAP_APPFONT ( 12 , 162 ) ; Size = MAP_APPFONT ( 165 , 12 ) ; TabStop = TRUE ; }; ImageButton RB_DATA { Pos = MAP_APPFONT ( 179 , 161 ) ; Size = MAP_APPFONT ( 13 , 15 ) ; TabStop = FALSE ; QuickHelpText [ en-US ] = "Shrink" ; }; OKButton BTN_OK { Pos = MAP_APPFONT ( 200 , 6 ) ; Size = MAP_APPFONT ( 50 , 14 ) ; TabStop = TRUE ; }; CancelButton BTN_CANCEL { Pos = MAP_APPFONT ( 200 , 23 ) ; Size = MAP_APPFONT ( 50 , 14 ) ; TabStop = TRUE ; }; PushButton BTN_ADD { Pos = MAP_APPFONT ( 200 , 104 ) ; Size = MAP_APPFONT ( 50 , 14 ) ; Text [ en-US ] = "~Add" ; TabStop = TRUE ; DefButton = TRUE ; }; PushButton BTN_REMOVE { Pos = MAP_APPFONT ( 200 , 122 ) ; Size = MAP_APPFONT ( 50 , 14 ) ; Text [ en-US ] = "~Delete" ; TabStop = TRUE ; }; HelpButton BTN_HELP { Pos = MAP_APPFONT ( 200 , 43 ) ; Size = MAP_APPFONT ( 50 , 14 ) ; TabStop = TRUE ; }; Text [ en-US ] = "Define Label Range" ; };

and here is the output after the conversion:

<modeless-dialog height="181" help-id="HID_COLROWNAMERANGES" hide="true" moveable="true" output-size="true" sv-look="true" text="Define Label Range" width="256" xmlns="http://openoffice.org/2007/layout" xmlns:cnt="http://openoffice.org/2007/layout/container">
    <vbox>
        <fixed-line id="FL_ASSIGN" height="8" text="Range" width="188" x="6" y="3"/>
        <ok-button id="BTN_OK" height="14" tab-stop="true" width="50" x="200" y="6"/>
        <list-box id="LB_RANGE" border="true" height="85" tab-stop="true" vscroll="true" width="179" x="12" y="14"/>
        <cancel-button id="BTN_CANCEL" height="14" tab-stop="true" width="50" x="200" y="23"/>
        <help-button id="BTN_HELP" height="14" tab-stop="true" width="50" x="200" y="43"/>
        <hbox>
            <image-button id="RB_AREA" height="15" quick-help-text="Shrink" tab-stop="false" width="13" x="179" y="104"/>
            <push-button id="BTN_ADD" def-button="true" height="14" tab-stop="true" text="~Add" width="50" x="200" y="104"/>
        </hbox>
        <edit id="ED_AREA" border="true" height="12" tab-stop="true" width="165" x="12" y="105"/>
        <radio-button id="BTN_COLHEAD" height="10" tab-stop="true" text="Contains ~column labels" width="171" x="20" y="121"/>
        <push-button id="BTN_REMOVE" height="14" tab-stop="true" text="~Delete" width="50" x="200" y="122"/>
        <radio-button id="BTN_ROWHEAD" height="10" tab-stop="true" text="Contains ~row labels" width="171" x="20" y="135"/>
        <fixed-text id="FT_DATA_LABEL" height="8" text="For ~data range" width="179" x="12" y="151"/>
        <image-button id="RB_DATA" height="15" quick-help-text="Shrink" tab-stop="false" width="13" x="179" y="161"/>
        <edit id="ED_DATA" border="true" height="12" tab-stop="true" width="165" x="12" y="162"/>
    </vbox>
</modeless-dialog>

These are the steps I take to convert each file. First, the source file is read character-by-character to get tokenized by the lexer class, and this is where the comments (both multi-line and single line) get stripped out and the preprocessing macros are defined. The tokens are then passed to the parser class to build a syntax tree (preprocessor macros are expanded here), which is then converted into an intermediate XML tree with names translated and some attribute types converted properly, such as the position and the size, which are originally given as MAP_APPFONT( a, b ) format. Also, some unnecessary information is discarded at this stage.

Once that’s done, it further translates the intermediate XML tree into another XML tree that has layout elements. The X and Y positions of each widget are used in order to layout the widgets properly by wrapping them with <vbox> and <hbox> elements as needed. The tree is then dumped into a stream of text, which is what you see above.

Unfortunately this task is not done yet. As it turns out, some src files even require inclusion of header files in order to be parsed correctly, which means I need to honor those #include "foo.hrc" header include directives. Right now, they are ignored. On top of that, there may also be cases where the #ifdef directives might need to be interpreted correctly, but so far ignoring them has not caused any side-effect.

I’m sure there are other problems I’ll encounter as I parse more src files, but I’d say the end is near. :-)

Hack Week: Helping make OO.o’s dialog resizable

June 26, 2007August 7, 2021 Kohei Yoshida8 Comments

So, this is day one for Novell’s Hack Week. This week, we, Novell hackers, are allowed to work on whatever project we like. And I chose to work on making VCL dialog resizable.

Michael Meeks already did the ground work, and all I’m trying to do is to do what I can in one week to expand on his work. This is also one of on-going GSoC tasks, so I’m also co-ordinating with the student who’s been assigned to work on this (his name is Ricardo Cruz) so that we won’t step on each other’s toes.

Here is what I did today. I added a wrapper code for a list box control so that I can actually use it in my resizable dialog and add items to it. Let’s show some screenshots here.

OO.o resizable dialog demo (small)

OO.o resizable dialog demo (large)

I posted two shots of the same, but differently-sized dialog just to show that it’s resizable. Pretty cool, huh? :-)

Oh, BTW, since I’m away from my normal business this week, I won’t be working on the OOXML filter. I’ll be back on my regular schedule on next Monday.

Importing Excel 2007 files

June 20, 2007August 7, 2021 Kohei Yoshida8 Comments

The Excel 2007 filter for Calc is still on its way, but perhaps now is a good time to show the progress of this new Excel 2007 import filter work by Daniel Rentz and myself.

Here is a screenshot of a file created by Excel 2007 (left), and one for the same file opened in Calc (right).

There are still a lot to be done, however. Formula import is still to be completed, which blocks other features that rely on the formula parser. Charts, text boxes, and other graphic objects are still not imported yet. There is also a performance issue of a large xlsx file import, which needs to be addressed at some point.

But all in all, things are coming along very nicely.

Extending Calc’s autofilter

June 13, 2007August 7, 2021 Kohei Yoshida1 Comment

As I work on implementing the OOXML import filter for Calc, I notice quite a few features that are in Excel 2007 but are not in Calc. While not all of them deserve special attention, one particular feature has caught my eye, which is the ability to filter rows based on a set of multiple string values instead of just one.

To see the benefit of this feature, let’s take a look at the current autofilter implementation in Calc as of OO.o 2.1.

Current autofilter implementation (2.1)

As you can see, if you want to filter by the cell content, you can only specify one value. If you want to specify “show either Bruce or David”, you will have to use the Standard filter and use this regular expression ^(Bruce|David)$ to accomplish the effect. Alternatively, if the filtering criteria involves only one column field, you could use two filter conditions, each one specifying textual equality to one text value, and connect them with OR, but this still will not work if more than one fields are involved because Calc doesn’t allow nested AND/OR’s between filter conditions.

Excel 2007 does this quite nicely. It allows a user to specify multiple text values in a single filtering criteria by presenting a list of check boxes like this:

Multi-string autofilter in Excel 2007

This allows a user to quickly filter his/her data, without resorting to something more complex, like regular expressions.

I believe our OO.o users will benefit enormously if we implement something similar in Calc, and I’ve already started some work toward implementing this. But to implement this feature in Calc requires a change in the ODF file format specification. The ODF spec, as of version 1.1, does not allow a clean storage of multiple text values in a single filter condition. An effort is on-going, however, to change the ODF spec in order to accommodate this feature, so there is hope.

SSE2 Instructions

June 4, 2007August 7, 2021 Kohei Yoshida1 Comment

For the past several weeks I have been studying X86 assembly language, mainly because I wanted to update my knowledge on the assembly language to match the latest CPU technology. I had previosly taken an X86 assembly language course at NCSU roughly a year ago, but the course only covered 8086 instruction set, and used the MASM version 6.0 as the assembler which is only good for writing MS-DOS applications. I wanted to at least learn how to do floating-point calculations in assembly, and do it in GNU assembly so that my apps would run on Linux.

There are quite a few extensions to the core X86 instructions, such as FPU, MMX, SSE, and SSE2. The FPU takes care of normal floating point calculations since 80386, MMX for operating multiple integer calculations in a single CPU cycle, SSE for multiple single-precision calculations, and SSE2 for multiple double-precision calculations (again, in a single CPU cycle). Since software these days, and OO.o in particular, seem to do almost all of floating point calculations in double-precision, I decided to give SSE2 a little benchmark test.

Here is how I did it. I wrote some simple mathematical routines in C, compiled it normally with gcc with -O1 optimization. Then I had gcc generate an assembly code of that routine, cleaned it up a bit and replaced several instructions with SSE2 instructions, reassembled it into an executable to run benchmark.

Here is the original C code for the routine:

void array_multiply(double* array1, double* array2, unsigned int size)
{
    // Make nloop a multiple of 2.
    unsigned int nloop = size/2;
    nloop += nloop;
 
    unsigned int i = 0;
    for (; i < nloop; i += 2)
    {
        array1[i]   *= array2[i] * array2[i] * array2[i];
        array1[i+1] *= array2[i+1] * array2[i+1] * array2[i+1];
    }
}

and this is the assembly instructions that gcc generated (with -O1):

    .text
    .align 2
.globl array_multiply
    .type   array_multiply, @function
array_multiply:
.LFB13:
    pushl   %ebp
.LCFI0:
    movl    %esp, %ebp
.LCFI1:
    pushl   %ebx
.LCFI2:
    movl    8(%ebp), %edx
    movl    12(%ebp), %ebx
    movl    16(%ebp), %ecx
    andl    $-2, %ecx
    je      .L5
    movl    $0, %eax
.L4:
    fldl    (%ebx,%eax,8)
    fld     %st(0)
    fmul    %st(1), %st
    fmulp   %st, %st(1)
    fmull   (%edx,%eax,8)
    fstpl   (%edx,%eax,8)
    fldl    8(%ebx,%eax,8)
    fld     %st(0)
    fmul    %st(1), %st
    fmulp   %st, %st(1)
    fmull   8(%edx,%eax,8)
    fstpl   8(%edx,%eax,8)
    addl    $2, %eax
    cmpl    %eax, %ecx
    ja      .L4
.L5:
    popl    %ebx
    popl    %ebp
    ret

It does all the calculations using FPU instructions. And here is the assembly code after I replaced the FPU instructions with SSE2 ones:

.section .text
.align 16
.globl array_multiply
    .type   array_multiply, @function
array_multiply:
    pushl   %ebp
    movl    %esp, %ebp
    pushl   %ebx
    movl    8(%ebp), %edx           # pointer to array1
    movl    12(%ebp), %ebx          # pointer to array2
    movl    16(%ebp), %ecx          # size
    andl    $-2, %ecx               # make the size a multiple of 2
    je  .L5
    movl    $0, %eax                # i = 0
.L4:
    movapd  (%edx,%eax,8), %xmm0    # SSE2
    movupd  (%ebx,%eax,8), %xmm1    # SSE2
    mulpd   %xmm1, %xmm0            # SSE2
    mulpd   %xmm1, %xmm0            # SSE2
    mulpd   %xmm1, %xmm0            # SSE2
    movapd  %xmm0, (%edx,%eax,8)    # SSE2
    addl    $2, %eax                # i += 2
    cmpl    %eax, %ecx
    ja  .L4
.L5:
    popl    %ebx
    popl    %ebp
    ret

I then used the following main C++ code

void print_array(double* array, unsigned int size)
{
    cout << "{ ";
    for (unsigned int i = 0; i < size; ++i)
    {
        if (i)
            cout << ", ";
        cout << array[i];
    }
    cout << " }" << endl;
}
 
int main()
{
    double myarray1[] = {10.5, 50.0, 25.0, 10.0, 2345.4848, 594.23, 0.4, 87.2};
    double myarray2[] = {1.2, 50.0, 1.5, 10.0, 120.9, 44.09, 874.234, 233.333};
    unsigned int array_size = 8;
 
    cout << "myarray1 = ";
    print_array(myarray1, array_size);
    cout << "myarray2 = ";
    print_array(myarray2, array_size);
 
    double myarray[array_size];
 
    for (long counter = 0; counter < 99999999; ++counter)
    {
        for (unsigned int i = 0; i < array_size; ++i)
            // To prevent calculation results from being cached.
            myarray[i] = myarray1[i] + 0.000000001*counter;
        array_multiply(myarray, myarray2, array_size);
    }
 
    for (unsigned int i = 0; i < array_size; ++i)
        cout << i << " t " << myarray[i] << endl;
}

to call both the C version and the assembly with SSE2 version to compare performance. The executables with the original C version and the SSE2 version are named test_c and test_sse, respectively. Here is the result (on my machine):

$ time ./test_c
myarray1 = { 10.5, 50, 25, 10, 2345.48, 594.23, 0.4, 87.2 }
myarray2 = { 1.2, 50, 1.5, 10, 120.9, 44.09, 874.234, 233.333 }
0        18.3168
1        6.2625e+06
2        84.7125
3        10100
4        4.14505e+09
5        5.09387e+07
6        3.34082e+08
7        1.10903e+09
 
real    0m3.308s
user    0m3.292s
sys     0m0.012s
 
$ time ./test_sse
myarray1 = { 10.5, 50, 25, 10, 2345.48, 594.23, 0.4, 87.2 }
myarray2 = { 1.2, 50, 1.5, 10, 120.9, 44.09, 874.234, 233.333 }
0        18.3168
1        6.2625e+06
2        84.7125
3        10100
4        4.14505e+09
5        5.09387e+07
6        3.34082e+08
7        1.10903e+09
 
real    0m2.451s
user    0m2.436s
sys     0m0.000s

Indeed, the SSE2 version seems to perform better! I also compared my SSE2 version against the -O3 optimized C-code, but there was not much difference from the -O1 optimized code for this particular algorithm. But of course, YMMV.

Does this mean we should start writing assembly code for performance optimization? Probably not. I still think it’s much better to write a better algorithm in the first place than re-write code in assembly language, because re-writing in assembly is itself not a guarantee for a better performance. But for serious performance work, however, knowing what type of assembly code that the compiler generates from a C/C++ code, for various compiler optimization flags, will undoubtedly benefit. You never know, in a few extreme cases, it may even make sense to write parts of the application code in assembly, even if that means having to write that part in assembly for every platform that app needs to support. OO.o has some parts of UNO bridge written in assembly, and I’ve seen some assembly code in the FireFox codebase as well.

Oh, by the way, for anyone looking for a good study guide on GNU assembly for X86 family of chips, the “Professional Assembly Language” by Richard Blum (Wiley Publishing) would be a pretty good place to start.