Windows clipboard dumper

Inspired by this bug report, I just wrote a small, quick and dirty utility to dump the current clipboard content on Windows. Windows development to me is still pretty much an uncharted territory, so even a utility as simple as this took me some time. Anyway, you can download the binary from here: clipdump.exe. Note that this is a console utility, so you need to run this from the console window.

Here is the source code.

#include <Windows.h>
 
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <vector>
 
using namespace std;
 
size_t char_per_line = 16;
typedef vector<WORD> line_store_type;
 
void dump_line(const line_store_type& line)
{
    if (line.empty())
        return;
 
    size_t fill_size = char_per_line - line.size();
 
    line_store_type::const_iterator i = line.begin(), iend = line.end();
    for (; i != iend; ++i)
        printf("%04X ", *i);
 
    while (fill_size--)
        cout << "     ";
 
    cout << ' ';
    i = line.begin();
    for (; i != iend; ++i)
    {
        WORD c = *i;
        if (32 <= c && c <= 126)
            // ASCII printable range
            cout << static_cast<char>(c);
        else
            // non-printable range
            cout << '.';
    }
 
    cout << endl;
}
 
void dump_clip(HANDLE hdl)
{
    if (!hdl)
        return;
 
    LPTSTR buf = static_cast<LPTSTR>(GlobalLock(hdl));
    if (!buf)
        return;
 
    line_store_type line;
    line.reserve(char_per_line);
    for (size_t i = 0, n = GlobalSize(hdl); i < n; ++i)
    {
        line.push_back(buf[i]);
        if (line.size() == char_per_line)
        {
            dump_line(line);
            line.clear();
        }
    }
    dump_line(line);
 
    GlobalUnlock(hdl);
}
 
int main()
{
    if (!OpenClipboard(NULL))
        return EXIT_FAILURE;
 
    UINT fmt = 0;
    for (fmt = EnumClipboardFormats(fmt); fmt; fmt = EnumClipboardFormats(fmt))
    {
        char name[100];
        int len = GetClipboardFormatName(fmt, name, 100);
        if (!len)
            continue;
 
        cout << "---" << endl;
        cout << "format code: " << fmt << endl;
        cout << "name: " << name << endl << endl;
 
        HANDLE hdl = GetClipboardData(fmt);
        dump_clip(hdl);
    }
 
    CloseClipboard();
    return EXIT_SUCCESS;
}

It’s nothing sophisticated, and it could probably use more polishing and perhaps some GUI (since it’s a Windows app). But for now it serves the purpose for me.

Update:
Tor has submitted his version in the comment section. Much more sophisticated than mine (and it’s C not C++).

Working with a branch using git-new-workdir

Introduction

Git package contains a script named git-new-workdir, which allows you to work in a branch in a separate directory on the file system. This differs from cloning a repository in that git-new-workdir doesn’t duplicate the git history from the original repository and shares it instead, and that when you commit something to the branch that commit goes directly into the history of the original repository without explicitly pushing to the original repository. On top of that, creating a new branch work directory happens very much instantly. It’s fast, and it’s efficient. It’s an absolute time saver for those of us who work on many branches at any given moment without bloating the disk space.

As wonderful as this script can be, not all distros package this script with their git package. If your distro doesn’t package it, you can always download the source packages of git and find the script there, under the contrib directory. Also, if you have the build repository of libreoffice cloned, you can find it in bin/git-new-workdir too.

Now, I’m going to talk about how I make use of this script to work on the 3.3 branch of LibreOffice.

Creating a branch work directory

If you’ve followed this page to build the master branch of libreoffice, then you should have in your clone of the build repository a directory named clone. Under this directory are your local clones of the 19 repositories comprising the whole libreoffice source tree. If you are like me, you have followed the above page and built your libreoffice build in the rawbuild directory.

The next step is to create a separate directory just for the 3.3 branch which named libreoffice-3-3 and set things up so that you can build it normally as you did in the rawbuild. I’ve written the following bash script (named create-branch-build.sh) to do this in one single step.

#!/usr/bin/env bash
 
GIT_NEW_WORKDIR=~/bin/git-new-workdir
REPOS=clone
 
print_help() {
    echo Usage: $1 [bootstrap dir] [dest dir] [branch name]
}
 
die() {
    echo $1
    exit 1
}
 
BOOTSTRAP_DIR="$1"
DEST_DIR="$2"
BRANCH="$3"
 
if [ "$BOOTSTRAP_DIR" = "" ]; then
    echo bootstrap repo is missing.
    print_help $0
    exit 1
fi
 
if [ "$DEST_DIR" = "" ]; then
    echo destination directory is missing.
    print_help $0
    exit 1
fi
 
if [ "$BRANCH" = "" ]; then
    echo branch name is missing.
    print_help $0
    exit 1
fi
 
if [ -e "$DEST_DIR/$BRANCH" ]; then
    die "$DEST_DIR/$BRANCH already exists."
fi
 
# Clone bootstrap first.
$GIT_NEW_WORKDIR "$BOOTSTRAP_DIR" "$DEST_DIR/$BRANCH" "$BRANCH" || die "failed to clone bootstrap repo."
 
# First, check out the branches.
echo "creating directory $DEST_DIR/$BRANCH/$REPOS"
mkdir -p "$DEST_DIR/$BRANCH/$REPOS" || die "failed to create $DEST_DIR/$BRANCH/$REPOS"
for repo in `ls "$BOOTSTRAP_DIR/clone"`; do
    repo_path="$BOOTSTRAP_DIR/clone/$repo"
    if [ ! -d $repo_path ]; then
        # we only care about directories.
        continue
    fi
    echo ===== $repo =====
    $GIT_NEW_WORKDIR $repo_path "$DEST_DIR/$BRANCH/$REPOS/$repo" $BRANCH
done
 
# Set symbolic links to the root directory.
cd "$DEST_DIR/$BRANCH"
for repo in `ls $REPOS`; do
    repo_path=$REPOS/$repo
    if [ ! -d $repo_path ]; then
        # skip if not directory.
        continue
    fi
    ln -s -t . $repo_path/*
done

The only thing you need to do before running this script is to set the GIT_NEW_WORKDIR variable to point to the location of the git-new-workdir script on your file system.

With this script in place, you can simply

cd ..  # move out of the build directory
create-branch-build.sh ./build/clone . libreoffice-3-3

and you now have a new directory named libreoffice-3-3 (same as the branch name), where all modules and top-level files are properly symlinked to their original locations, while the actual repo branches are under the _repos directory. All you have left to do is to start building. :-)

Note that there is no need to manually create a local branch named libreoffice-3-3 that tracks the remote libreoffice-3-3 branch in the original repository before running this script; git-new-workdir takes care of that for you provided that the remote branch of the same name exists.

Updating the branch work directory

In general, when you are in a branch work directory (I call it this because it sounds about right), updating the branch from the branch in the remote repo consists of two steps. First, fetch the latest history in the original repository by git fetch, move back to the branch work directory and run git pull -r.

But doing this manually in all the 19 repositories can be very tedious. So I wrote another script (named g.sh) to ease this pain a little.

#!/usr/bin/env bash
 
REPOS=clone
 
die() {
    echo $1
    exit 1
}
 
if [ ! -d $REPOS ]; then
    die "$REPOS directory not found in cwd."
fi
 
echo ===== main repository =====
git $@
 
for repo in `ls $REPOS`; do
    echo ===== $repo =====
    repo_path=$REPOS/$repo
    if [ ! -d $repo_path ]; then
        # Not a directory.  Skip it.
        continue
    fi
    pushd . > /dev/null
    cd $repo_path
    git $@
    popd > /dev/null
done

With this, updating the branch build directory is done:

g.sh pull -r

That’s all there is to it.

A few more words…

As with any methods in life, this method has limitations. If you build libreoffice with the old-fashioned way of applying patches on top of the raw source tree, this method doesn’t help you; you would still need to clone the repo, and manually switch to the branch in the cloned repo.

But if you build, hack and debug in rawbuild almost exclusively (like me), then this method will help you save time and disk space. You can also adopt this method for any feature branches, as long as all the 19 repos (20 if you count l10n repo) have the same branch name. So, it’s worth a look! :-)

Thank you, ladies and gentlemen.

P.S. I’ve updated the scripts to adopt to the new bootstrap based build scheme.

STL container performance on data insertion

I just ran a quick analysis on the performance of various STL containers on simple data insertion. The result was not exactly what I expected so I’d like to share it with you.

What was performed was sequential insertions of 50,000,000 (50 million) unique pointer values into various STL containers, either by push_back or insert, depending on which method is supported by the container. I ran the test on openSUSE 11.2, with g++ 4.4.1, with the compiler options of -std=c++0x -Os -g. The -std=c++0x flag is necessary in order to use std::unordered_set.

Anyway, here is the result I observed:

stl-perf

I was fully aware of the set containers being slower than list and vector on insertion, due to the internal structure of set being more elaborate than those of list or vector, and this test confirms my knowledge. However, I was not aware of such wide gap between list and vector. Also, the difference between unreserved and reserved vector was not as wide as I would have expected. (For the sake of completeness, a reserved vector is an instance of vector whose internal array size is pre-allocated in advance in order to avoid re-allocation.) My belief has always been that reserving vector in advance improves performance on data insertion, which it does, but I was expecting a wider gap between the two. So, the result I see here is a bit unexpected.

In case you want to re-run this test on your own environment, here is the code I used to measure the containers’ performance:

#include <vector>
#include <unordered_set>
#include <set>
#include <list>
 
#include <stdio.h>
#include <string>
#include <sys/time.h>
 
using namespace std;
 
namespace {
 
class StackPrinter
{
public:
    explicit StackPrinter(const char* msg) :
        msMsg(msg)
    {
        fprintf(stdout, "%s: --begin\n", msMsg.c_str());
        mfStartTime = getTime();
    }
 
    ~StackPrinter()
    {
        double fEndTime = getTime();
        fprintf(stdout, "%s: --end (duration: %g sec)\n", msMsg.c_str(), (fEndTime-mfStartTime));
    }
 
    void printTime(int line) const
    {
        double fEndTime = getTime();
        fprintf(stdout, "%s: --(%d) (duration: %g sec)\n", msMsg.c_str(), line, (fEndTime-mfStartTime));
    }
 
private:
    double getTime() const
    {
        timeval tv;
        gettimeofday(&tv, NULL);
        return tv.tv_sec + tv.tv_usec / 1000000.0;
    }
 
    ::std::string msMsg;
    double mfStartTime;
};
 
}
 
int main()
{
    size_t store_size = 50000000;
    {
        StackPrinter __stack_printer__("vector non-reserved");
        string* ptr = 0x00000000;
        vector<void*> store;
        for (size_t i = 0; i < store_size; ++i)
            store.push_back(ptr++);
    }
 
    {
        StackPrinter __stack_printer__("vector reserved");
        string* ptr = 0x00000000;
        vector<void*> store;
        store.reserve(store_size);
        for (size_t i = 0; i < store_size; ++i)
            store.push_back(ptr++);
    }
 
    {
        StackPrinter __stack_printer__("list");
        string* ptr = 0x00000000;
        list<void*> store;
        for (size_t i = 0; i < store_size; ++i)
            store.push_back(ptr++);
    }
 
    {
        StackPrinter __stack_printer__("set");
        string* ptr = 0x00000000;
        set<void*> store;   
        for (size_t i = 0; i < store_size; ++i)
            store.insert(ptr++);
    }
 
    {
        StackPrinter __stack_printer__("unordered set");
        string* ptr = 0x00000000;
        unordered_set<void*> store;
        for (size_t i = 0; i < store_size; ++i)
            store.insert(ptr++);
    }
}

This is valid C++ code?

My compiler reported a build error in the following code block today.

long ScDPOutput::GetHeaderDim( const ScAddress& rPos, USHORT& rOrient )
{
    SCCOL nCol = rPos.Col();
    SCROW nRow = rPos.Row();
    SCTAB nTab = rPos.Tab();
    if ( nTab != aStartPos.Tab() )
        return -1;                                      // wrong sheet
 
    //  calculate output positions and sizes
 
    CalcSizes();
 
    //  test for column header
 
    if ( nRow == nTabStartRow && nCol >= nDataStartCol && nCol < nDataStartCol + nColFieldCount )
    {
        rOrient = sheet::DataPilotFieldOrientation_COLUMN;
        long nField = nCol - nDataStartCol;
        return pColFields[nField].nDim;
    }
 
    //  test for row header
 
    if ( nRow+1 == nDataStartRow && nCol >= nTabStartCol == nCol < nTabStartCol + nRowFieldCount )
    {
        rOrient = sheet::DataPilotFieldOrientation_ROW;
        long nField = nCol - nTabStartCol;
        return pRowFields[nField].nDim;
    }
 
    //  test for page field
 
    SCROW nPageStartRow = aStartPos.Row() + ( bDoFilter ? 1 : 0 );
    if ( nCol == aStartPos.Col() && nRow >= nPageStartRow && nRow < nPageStartRow + nPageFieldCount )
    {
        rOrient = sheet::DataPilotFieldOrientation_PAGE;
        long nField = nRow - nPageStartRow;
        return pPageFields[nField].nDim;
    }
 
    //! single data field (?)
 
    rOrient = sheet::DataPilotFieldOrientation_HIDDEN;
    return -1;      // invalid
}

In particular, my compiler didn’t like the == (equality operator) in nTabStartCol == nCol in the 3rd if statement block from the top. Looking at the if statement before and after that, you’ll probably say “yeah, looks like that ‘==’ was supposed to be &&, so what’s the surprise?” Well, the thing is, this piece of code has not changed for at least a few years, which means it was compiling just fine up until today (though it may have caused a bug somewhere…). And even today, it compiled fine before I made a few changes that were not related to this method, and I didn’t modify this method itself at all.

I have to wonder, why this code block compiled fine up till today, and what change of mine triggered the compiler to complain all of a sudden if the method itself is unchanged…. :-/

Excel sheet protection password hash

When you protect either your workbook or one of your worksheets with a password in Excel, Excel internally generates a 16-bit hash of your password and stores it instead of the original password text. The hashing algorithm used for that was previously unknown, but thanks to the infamous Office Open XML specification it is now documented for the world to see (take a look at Part 4, Section 3.3.1.81 – Sheet Protection Options for the details). Thankfully, the algorithm is identical for all recent versions of Excel including XP, 2003 and 2007, so you can simply reuse the documented algorithm for the older versions of Excel too.

But alas! the documented algorithm is incorrect; it does not produce correct hash values. Being determined to find out the correct algorithm, however, I started to analyze the hashes that the documented algorithm produces, and compare them with the real hash values that Excel generates, in order to decipher the correct algorithm.

In the end, the documented algorithm was, although not accurate, pretty close enough that I was able to make a few changes and derive the algorithm that generates correct values. The following code:

#include <stdio.h>
 
using namespace std;
 
typedef unsigned char sal_uInt8;
typedef unsigned short sal_uInt16;
 
sal_uInt16 getPasswordHash(const char* szPassword)
{
    sal_uInt16 cchPassword = strlen(szPassword);
    sal_uInt16 wPasswordHash = 0;
    if (!cchPassword)
        return wPasswordHash;
 
    const char* pch = &szPassword[cchPassword];
    while (pch-- != szPassword)
    {
        wPasswordHash = ((wPasswordHash >> 14) & 0x01) | 
                        ((wPasswordHash << 1) & 0x7fff);
        wPasswordHash ^= *pch;
    }
 
    wPasswordHash = ((wPasswordHash >> 14) & 0x01) | 
                    ((wPasswordHash << 1) & 0x7fff);
 
    wPasswordHash ^= (0x8000 | ('N' << 8) | 'K');
    wPasswordHash ^= cchPassword;
 
    return wPasswordHash;
}
 
int main (int argc, char** argv)
{
    if (argc < 2)
        exit(1);
 
    printf("input password = %s\n", argv[1]);
    sal_uInt16 hash = getPasswordHash(argv[1]);
    printf("hash = %4.4X\n", hash);
 
    return 0;
}

produces the right hash value from an arbitrary password. One caveat: this algorithm takes an 8-bit char array, so if the input value consists of 16-bit unicode characters, it needs to be first converted into 8-bit character array. The conversion algorithm is also documented in the OOXML specification. I have not tested it yet, but I hope that algorithm is correct. ;-)

Missing vcl resource

At one point in the past, I started getting this annoying error message dialog

VCL resource warning dialog on startup

on startup, and OO.o simply shuts itself down after that. It happened whenever I installed the trunk version of ooo-build with ooinstall (with an -l option for linking), or the upstream build with linkoo. These two commands are two, totally separate scripts, but they both create symbolic links to the shared libraries and resources in the installation directory, to their respective location in the build (actually ooinstall makes use of linkoo for the -l functionality). This setting allows a fast iteration of code change, compilation and testing without having to manually swap the shared libraries each time they get modified in the build. But because of this problem I was not able to use linkoo with the upstream build, or ooinstall -l with the trunk ooo-build, and was forced to manually set symlink to the modules I was working on. (Somehow, the 2.2 and 2.1 branches of ooo-build didn’t have this problem.)

But today, after not having been able to use an automatically symlinked installation for a long, long time, I got tired of it and decided to jump into the code, to see what causes this problem.

After a little tracing of the code, I finally found the offending code block (tools/source/rc/resmgr.cxx#244):

void ResMgrContainer::init()
{
    // get resource path
    std::list< OUString > aDirs;
    sal_Int32 nIndex = 0;
 
    // 1. relative to current module (<installation>/program/resource)
    OUString libraryFileUrl;
    if( Module::getUrlFromAddress(
            reinterpret_cast< oslGenericFunction >(ResMgrContainer::release),
            libraryFileUrl) )
        nIndex = libraryFileUrl.lastIndexOf( '/' );
    DBG_ASSERT( nIndex > 0, "module resolution failed" );
    if( nIndex > 0 )
    {
        OUStringBuffer aBuf( libraryFileUrl.getLength() + 16 );
        aBuf.append( libraryFileUrl.getStr(), nIndex+1 ); // copy inclusive '/'
        aBuf.appendAscii( "resource" );
        aDirs.push_back( aBuf.makeStringAndClear() );
    }
 
    // 2. in STAR_RESOURCEPATH
    ....

Here is what the code does. It tries to locate all of the resources (.res) files and put their locations into an internal data structure. To do it, it needs to know where to find the resource files. It cleverly determines the resource file directory by first getting the absolute path of the module where the code resides (libtl680li.so), and move down to the resource file directory from that location. In the normal installation, the modules are located in the [installation root]/program/, and the resources directory is only one level down.

However, when the shared library in question is a symbolic link (symlink) to another file, the code ends up getting the path of the actual file the symlink points to, instead of the path of that symlink (via dladdr call), and this causes the above problem.

There is an easy workaround. Since it’s only the shared library where the ResMgrContainer class is (which is libtl680li.so as mentioned) needs to be the actual file, you can simply delete the symlink that points to libtl680li.so, and put the original file in its place. Then OO.o launches just fine. You can leave all the other symlinked shared libraries alone. The only problem with this workaround is, if you want to hack at the tools code, you would need to manually swap the shared library on each module re-build, but for me, that’s not a major problem (I don’t hack at the tools code).

SSE2 Instructions

For the past several weeks I have been studying X86 assembly language, mainly because I wanted to update my knowledge on the assembly language to match the latest CPU technology. I had previosly taken an X86 assembly language course at NCSU roughly a year ago, but the course only covered 8086 instruction set, and used the MASM version 6.0 as the assembler which is only good for writing MS-DOS applications. I wanted to at least learn how to do floating-point calculations in assembly, and do it in GNU assembly so that my apps would run on Linux.

There are quite a few extensions to the core X86 instructions, such as FPU, MMX, SSE, and SSE2. The FPU takes care of normal floating point calculations since 80386, MMX for operating multiple integer calculations in a single CPU cycle, SSE for multiple single-precision calculations, and SSE2 for multiple double-precision calculations (again, in a single CPU cycle). Since software these days, and OO.o in particular, seem to do almost all of floating point calculations in double-precision, I decided to give SSE2 a little benchmark test.

Here is how I did it. I wrote some simple mathematical routines in C, compiled it normally with gcc with -O1 optimization. Then I had gcc generate an assembly code of that routine, cleaned it up a bit and replaced several instructions with SSE2 instructions, reassembled it into an executable to run benchmark.

Here is the original C code for the routine:

void array_multiply(double* array1, double* array2, unsigned int size)
{
    // Make nloop a multiple of 2.
    unsigned int nloop = size/2;
    nloop += nloop;
 
    unsigned int i = 0;
    for (; i < nloop; i += 2)
    {
        array1[i]   *= array2[i] * array2[i] * array2[i];
        array1[i+1] *= array2[i+1] * array2[i+1] * array2[i+1];
    }
}

and this is the assembly instructions that gcc generated (with -O1):

    .text
    .align 2
.globl array_multiply
    .type   array_multiply, @function
array_multiply:
.LFB13:
    pushl   %ebp
.LCFI0:
    movl    %esp, %ebp
.LCFI1:
    pushl   %ebx
.LCFI2:
    movl    8(%ebp), %edx
    movl    12(%ebp), %ebx
    movl    16(%ebp), %ecx
    andl    $-2, %ecx
    je      .L5
    movl    $0, %eax
.L4:
    fldl    (%ebx,%eax,8)
    fld     %st(0)
    fmul    %st(1), %st
    fmulp   %st, %st(1)
    fmull   (%edx,%eax,8)
    fstpl   (%edx,%eax,8)
    fldl    8(%ebx,%eax,8)
    fld     %st(0)
    fmul    %st(1), %st
    fmulp   %st, %st(1)
    fmull   8(%edx,%eax,8)
    fstpl   8(%edx,%eax,8)
    addl    $2, %eax
    cmpl    %eax, %ecx
    ja      .L4
.L5:
    popl    %ebx
    popl    %ebp
    ret

It does all the calculations using FPU instructions. And here is the assembly code after I replaced the FPU instructions with SSE2 ones:

.section .text
.align 16
.globl array_multiply
    .type   array_multiply, @function
array_multiply:
    pushl   %ebp
    movl    %esp, %ebp
    pushl   %ebx
    movl    8(%ebp), %edx           # pointer to array1
    movl    12(%ebp), %ebx          # pointer to array2
    movl    16(%ebp), %ecx          # size
    andl    $-2, %ecx               # make the size a multiple of 2
    je  .L5
    movl    $0, %eax                # i = 0
.L4:
    movapd  (%edx,%eax,8), %xmm0    # SSE2
    movupd  (%ebx,%eax,8), %xmm1    # SSE2
    mulpd   %xmm1, %xmm0            # SSE2
    mulpd   %xmm1, %xmm0            # SSE2
    mulpd   %xmm1, %xmm0            # SSE2
    movapd  %xmm0, (%edx,%eax,8)    # SSE2
    addl    $2, %eax                # i += 2
    cmpl    %eax, %ecx
    ja  .L4
.L5:
    popl    %ebx
    popl    %ebp
    ret

I then used the following main C++ code

void print_array(double* array, unsigned int size)
{
    cout << "{ ";
    for (unsigned int i = 0; i < size; ++i)
    {
        if (i)
            cout << ", ";
        cout << array[i];
    }
    cout << " }" << endl;
}
 
int main()
{
    double myarray1[] = {10.5, 50.0, 25.0, 10.0, 2345.4848, 594.23, 0.4, 87.2};
    double myarray2[] = {1.2, 50.0, 1.5, 10.0, 120.9, 44.09, 874.234, 233.333};
    unsigned int array_size = 8;
 
    cout << "myarray1 = ";
    print_array(myarray1, array_size);
    cout << "myarray2 = ";
    print_array(myarray2, array_size);
 
    double myarray[array_size];
 
    for (long counter = 0; counter < 99999999; ++counter)
    {
        for (unsigned int i = 0; i < array_size; ++i)
            // To prevent calculation results from being cached.
            myarray[i] = myarray1[i] + 0.000000001*counter;
        array_multiply(myarray, myarray2, array_size);
    }
 
    for (unsigned int i = 0; i < array_size; ++i)
        cout << i << " t " << myarray[i] << endl;
}

to call both the C version and the assembly with SSE2 version to compare performance. The executables with the original C version and the SSE2 version are named test_c and test_sse, respectively. Here is the result (on my machine):

$ time ./test_c
myarray1 = { 10.5, 50, 25, 10, 2345.48, 594.23, 0.4, 87.2 }
myarray2 = { 1.2, 50, 1.5, 10, 120.9, 44.09, 874.234, 233.333 }
0        18.3168
1        6.2625e+06
2        84.7125
3        10100
4        4.14505e+09
5        5.09387e+07
6        3.34082e+08
7        1.10903e+09
 
real    0m3.308s
user    0m3.292s
sys     0m0.012s
 
$ time ./test_sse
myarray1 = { 10.5, 50, 25, 10, 2345.48, 594.23, 0.4, 87.2 }
myarray2 = { 1.2, 50, 1.5, 10, 120.9, 44.09, 874.234, 233.333 }
0        18.3168
1        6.2625e+06
2        84.7125
3        10100
4        4.14505e+09
5        5.09387e+07
6        3.34082e+08
7        1.10903e+09
 
real    0m2.451s
user    0m2.436s
sys     0m0.000s

Indeed, the SSE2 version seems to perform better! I also compared my SSE2 version against the -O3 optimized C-code, but there was not much difference from the -O1 optimized code for this particular algorithm. But of course, YMMV.

Does this mean we should start writing assembly code for performance optimization? Probably not. I still think it’s much better to write a better algorithm in the first place than re-write code in assembly language, because re-writing in assembly is itself not a guarantee for a better performance. But for serious performance work, however, knowing what type of assembly code that the compiler generates from a C/C++ code, for various compiler optimization flags, will undoubtedly benefit. You never know, in a few extreme cases, it may even make sense to write parts of the application code in assembly, even if that means having to write that part in assembly for every platform that app needs to support. OO.o has some parts of UNO bridge written in assembly, and I’ve seen some assembly code in the FireFox codebase as well.

Oh, by the way, for anyone looking for a good study guide on GNU assembly for X86 family of chips, the “Professional Assembly Language” by Richard Blum (Wiley Publishing) would be a pretty good place to start.

How to (pretend to) write an export filter

It turns out that pretending to write an export filter, at least adding a new entry to the Export dialog, is quite easy. In fact, you don’t even have to write a single line of code. Here is what to do.

Suppose you do your own build, and you have installed the OO.o that you have built. Now, go back to your build tree, and change directory into the following location

filter/source/config/fragments

and add the following two new files relative to this location:

./filters/calc_Kohei_SDF_Filter.xcu
./filters/calc_Kohei_SDF_ui.xcu

You can name your files anyway you want, of course. ;-) Anyway, put the following XML fragments into these files:

<!-- calc_Kohei_SDF_Filter.xcu -->
<node oor:name="calc_Kohei_SDF_Filter" oor:op="replace">
  <prop oor:name="Flags"><value>EXPORT ALIEN 3RDPARTYFILTER</value></prop>
  <prop oor:name="UIComponent"/>
  <prop oor:name="FilterService"><value>com.sun.star.comp.packages.KoheiSuperDuperFileExporter</value></prop>
  <prop oor:name="UserData"/>
  <prop oor:name="FileFormatVersion"/>
  <prop oor:name="Type"><value>Kohei_SDF</value></prop>
  <prop oor:name="TemplateName"/>
  <prop oor:name="DocumentService"><value>com.sun.star.sheet.SpreadsheetDocument</value></prop>
</node>
 
<!-- calc_Kohei_SDF_Filter_ui.xcu -->
<node oor:name="calc_Kohei_SDF_Filter">
  <prop oor:name="UIName"><value xml:lang="x-default">Kohei Super Duper File Format</value>
    <value xml:lang="en-US">Kohei Super Duper File Format</value>
    <value xml:lang="de">Kohei Super Duper File Format</value>
  </prop>
</node>

Likewise, create another file:

./types/Kohei_SDF.xcu

with the following content

<!-- Kohei_SDF.xcu -->
<node oor:name="Kohei_SDF" oor:op="replace" >
  <prop oor:name="DetectService"/>
  <prop oor:name="URLPattern"/>
  <prop oor:name="Extensions"><value>koheisdf</value></prop>
  <prop oor:name="MediaType"/>
  <prop oor:name="Preferred"><value>false</value></prop>
  <prop oor:name="PreferredFilter"><value>calc_Kohei_SDF_Filter</value></prop>
  <prop oor:name="UIName"><value xml:lang="x-default">Kohei Super Duper File Format</value></prop>
  <prop oor:name="ClipboardFormat"><value>doctype:Workbook</value></prop>
</node>

Once these new files are in place, add these files to fcfg_calc.mk so that the build process can find them. To add, open fcfg_calc.mk and add Kohei_SDF to the end of T4_CALC, calc_Kohei_SDF_Filter to F4_CALC, and calc_Kohei_SDF_Filter_ui to F4_UI_CALC. Save the file and rebuild the module. This should rebuild the following configuration files (build done on Linux):

./unxlngi6.pro/misc/filters/modulepacks/fcfg_calc_types.xcu
./unxlngi6.pro/misc/filters/modulepacks/fcfg_calc_filters.xcu
./unxlngi6.pro/bin/fcfg_langpack_en-US.zip

One note: the language pack zip package should contain the file named Filter.xcu with the new UI string you just put in. If you don’t see that, remove the whole unxlngi6.pro directory and build the module again.

Now it’s time to update your installation. You need to update the following files:

<install_dir>/share/registry/modules/org/openoffice/TypeDetection/Filter/fcfg_calc_filters.xcu
<install_dir>/share/registry/modules/org/openoffice/TypeDetection/Types/fcfg_calc_types.xcu

with the new ones you just rebuilt. Next, unpack the langpack zip file and extract Filter.xcu. Place this file in

<install_dir>/share/registry/res/en-US/org/openoffice/TypeDetection/Filter.xcu

to replace the old one.

Ok so far? There is one more thing you need to do to complete the process. Since these configuration files are cached, in order for the updated configuration files to take effect, the cached data must be removed. The cached data is in the user configuration directory, so you need to locate and delete the following directory:

rm -rf <user_config_dir>/user/registry/cache

That’s it! Now, fire up Calc and launch the Export dialog. You see the new file format entry you just put in. :-)

Export dialog with new export filter entry

Just try not to export your file using this new filter for real, because that will utterly fail. ;-)