Posts Tagged ‘code snippet’

STL container performance on data insertion

March 31st, 2010

I just ran a quick analysis on the performance of various STL containers on simple data insertion. The result was not exactly what I expected so I’d like to share it with you.

What was performed was sequential insertions of 50,000,000 (50 million) unique pointer values into various STL containers, either by push_back or insert, depending on which method is supported by the container. I ran the test on openSUSE 11.2, with g++ 4.4.1, with the compiler options of -std=c++0x -Os -g. The -std=c++0x flag is necessary in order to use std::unordered_set.

Anyway, here is the result I observed:

stl-perf

I was fully aware of the set containers being slower than list and vector on insertion, due to the internal structure of set being more elaborate than those of list or vector, and this test confirms my knowledge. However, I was not aware of such wide gap between list and vector. Also, the difference between unreserved and reserved vector was not as wide as I would have expected. (For the sake of completeness, a reserved vector is an instance of vector whose internal array size is pre-allocated in advance in order to avoid re-allocation.) My belief has always been that reserving vector in advance improves performance on data insertion, which it does, but I was expecting a wider gap between the two. So, the result I see here is a bit unexpected.

In case you want to re-run this test on your own environment, here is the code I used to measure the containers’ performance:

#include <vector>
#include <unordered_set>
#include <set>
#include <list>
 
#include <stdio.h>
#include <string>
#include <sys/time.h>
 
using namespace std;
 
namespace {
 
class StackPrinter
{
public:
    explicit StackPrinter(const char* msg) :
        msMsg(msg)
    {
        fprintf(stdout, "%s: --begin\n", msMsg.c_str());
        mfStartTime = getTime();
    }
 
    ~StackPrinter()
    {
        double fEndTime = getTime();
        fprintf(stdout, "%s: --end (duration: %g sec)\n", msMsg.c_str(), (fEndTime-mfStartTime));
    }
 
    void printTime(int line) const
    {
        double fEndTime = getTime();
        fprintf(stdout, "%s: --(%d) (duration: %g sec)\n", msMsg.c_str(), line, (fEndTime-mfStartTime));
    }
 
private:
    double getTime() const
    {
        timeval tv;
        gettimeofday(&tv, NULL);
        return tv.tv_sec + tv.tv_usec / 1000000.0;
    }
 
    ::std::string msMsg;
    double mfStartTime;
};
 
}
 
int main()
{
    size_t store_size = 50000000;
    {
        StackPrinter __stack_printer__("vector non-reserved");
        string* ptr = 0x00000000;
        vector<void*> store;
        for (size_t i = 0; i < store_size; ++i)
            store.push_back(ptr++);
    }
 
    {
        StackPrinter __stack_printer__("vector reserved");
        string* ptr = 0x00000000;
        vector<void*> store;
        store.reserve(store_size);
        for (size_t i = 0; i < store_size; ++i)
            store.push_back(ptr++);
    }
 
    {
        StackPrinter __stack_printer__("list");
        string* ptr = 0x00000000;
        list<void*> store;
        for (size_t i = 0; i < store_size; ++i)
            store.push_back(ptr++);
    }
 
    {
        StackPrinter __stack_printer__("set");
        string* ptr = 0x00000000;
        set<void*> store;   
        for (size_t i = 0; i < store_size; ++i)
            store.insert(ptr++);
    }
 
    {
        StackPrinter __stack_printer__("unordered set");
        string* ptr = 0x00000000;
        unordered_set<void*> store;
        for (size_t i = 0; i < store_size; ++i)
            store.insert(ptr++);
    }
}

This is valid C++ code?

November 13th, 2008

My compiler reported a build error in the following code block today.

long ScDPOutput::GetHeaderDim( const ScAddress& rPos, USHORT& rOrient )
{
    SCCOL nCol = rPos.Col();
    SCROW nRow = rPos.Row();
    SCTAB nTab = rPos.Tab();
    if ( nTab != aStartPos.Tab() )
        return -1;                                      // wrong sheet
 
    //  calculate output positions and sizes
 
    CalcSizes();
 
    //  test for column header
 
    if ( nRow == nTabStartRow && nCol >= nDataStartCol && nCol < nDataStartCol + nColFieldCount )
    {
        rOrient = sheet::DataPilotFieldOrientation_COLUMN;
        long nField = nCol - nDataStartCol;
        return pColFields[nField].nDim;
    }
 
    //  test for row header
 
    if ( nRow+1 == nDataStartRow && nCol >= nTabStartCol == nCol < nTabStartCol + nRowFieldCount )
    {
        rOrient = sheet::DataPilotFieldOrientation_ROW;
        long nField = nCol - nTabStartCol;
        return pRowFields[nField].nDim;
    }
 
    //  test for page field
 
    SCROW nPageStartRow = aStartPos.Row() + ( bDoFilter ? 1 : 0 );
    if ( nCol == aStartPos.Col() && nRow >= nPageStartRow && nRow < nPageStartRow + nPageFieldCount )
    {
        rOrient = sheet::DataPilotFieldOrientation_PAGE;
        long nField = nRow - nPageStartRow;
        return pPageFields[nField].nDim;
    }
 
    //! single data field (?)
 
    rOrient = sheet::DataPilotFieldOrientation_HIDDEN;
    return -1;      // invalid
}

In particular, my compiler didn’t like the == (equality operator) in nTabStartCol == nCol in the 3rd if statement block from the top. Looking at the if statement before and after that, you’ll probably say “yeah, looks like that ‘==’ was supposed to be &&, so what’s the surprise?” Well, the thing is, this piece of code has not changed for at least a few years, which means it was compiling just fine up until today (though it may have caused a bug somewhere…). And even today, it compiled fine before I made a few changes that were not related to this method, and I didn’t modify this method itself at all.

I have to wonder, why this code block compiled fine up till today, and what change of mine triggered the compiler to complain all of a sudden if the method itself is unchanged…. :-/

Excel sheet protection password hash

January 18th, 2008

When you protect either your workbook or one of your worksheets with a password in Excel, Excel internally generates a 16-bit hash of your password and stores it instead of the original password text. The hashing algorithm used for that was previously unknown, but thanks to the infamous Office Open XML specification it is now documented for the world to see (take a look at Part 4, Section 3.3.1.81 – Sheet Protection Options for the details). Thankfully, the algorithm is identical for all recent versions of Excel including XP, 2003 and 2007, so you can simply reuse the documented algorithm for the older versions of Excel too.

But alas! the documented algorithm is incorrect; it does not produce correct hash values. Being determined to find out the correct algorithm, however, I started to analyze the hashes that the documented algorithm produces, and compare them with the real hash values that Excel generates, in order to decipher the correct algorithm.

In the end, the documented algorithm was, although not accurate, pretty close enough that I was able to make a few changes and derive the algorithm that generates correct values. The following code:

#include <stdio.h>
 
using namespace std;
 
typedef unsigned char sal_uInt8;
typedef unsigned short sal_uInt16;
 
sal_uInt16 getPasswordHash(const char* szPassword)
{
    sal_uInt16 cchPassword = strlen(szPassword);
    sal_uInt16 wPasswordHash = 0;
    if (!cchPassword)
        return wPasswordHash;
 
    const char* pch = &szPassword[cchPassword];
    while (pch-- != szPassword)
    {
        wPasswordHash = ((wPasswordHash >> 14) & 0x01) | 
                        ((wPasswordHash << 1) & 0x7fff);
        wPasswordHash ^= *pch;
    }
 
    wPasswordHash = ((wPasswordHash >> 14) & 0x01) | 
                    ((wPasswordHash << 1) & 0x7fff);
 
    wPasswordHash ^= (0x8000 | ('N' << 8) | 'K');
    wPasswordHash ^= cchPassword;
 
    return wPasswordHash;
}
 
int main (int argc, char** argv)
{
    if (argc < 2)
        exit(1);
 
    printf("input password = %s\n", argv[1]);
    sal_uInt16 hash = getPasswordHash(argv[1]);
    printf("hash = %4.4X\n", hash);
 
    return 0;
}

produces the right hash value from an arbitrary password. One caveat: this algorithm takes an 8-bit char array, so if the input value consists of 16-bit unicode characters, it needs to be first converted into 8-bit character array. The conversion algorithm is also documented in the OOXML specification. I have not tested it yet, but I hope that algorithm is correct. ;-)

Missing vcl resource

July 11th, 2007

At one point in the past, I started getting this annoying error message dialog

VCL resource warning dialog on startup

on startup, and OO.o simply shuts itself down after that. It happened whenever I installed the trunk version of ooo-build with ooinstall (with an -l option for linking), or the upstream build with linkoo. These two commands are two, totally separate scripts, but they both create symbolic links to the shared libraries and resources in the installation directory, to their respective location in the build (actually ooinstall makes use of linkoo for the -l functionality). This setting allows a fast iteration of code change, compilation and testing without having to manually swap the shared libraries each time they get modified in the build. But because of this problem I was not able to use linkoo with the upstream build, or ooinstall -l with the trunk ooo-build, and was forced to manually set symlink to the modules I was working on. (Somehow, the 2.2 and 2.1 branches of ooo-build didn’t have this problem.)

But today, after not having been able to use an automatically symlinked installation for a long, long time, I got tired of it and decided to jump into the code, to see what causes this problem.

After a little tracing of the code, I finally found the offending code block (tools/source/rc/resmgr.cxx#244):

void ResMgrContainer::init()
{
    // get resource path
    std::list< OUString > aDirs;
    sal_Int32 nIndex = 0;
 
    // 1. relative to current module (<installation>/program/resource)
    OUString libraryFileUrl;
    if( Module::getUrlFromAddress(
            reinterpret_cast< oslGenericFunction >(ResMgrContainer::release),
            libraryFileUrl) )
        nIndex = libraryFileUrl.lastIndexOf( '/' );
    DBG_ASSERT( nIndex > 0, "module resolution failed" );
    if( nIndex > 0 )
    {
        OUStringBuffer aBuf( libraryFileUrl.getLength() + 16 );
        aBuf.append( libraryFileUrl.getStr(), nIndex+1 ); // copy inclusive '/'
        aBuf.appendAscii( "resource" );
        aDirs.push_back( aBuf.makeStringAndClear() );
    }
 
    // 2. in STAR_RESOURCEPATH
    ....

Here is what the code does. It tries to locate all of the resources (.res) files and put their locations into an internal data structure. To do it, it needs to know where to find the resource files. It cleverly determines the resource file directory by first getting the absolute path of the module where the code resides (libtl680li.so), and move down to the resource file directory from that location. In the normal installation, the modules are located in the [installation root]/program/, and the resources directory is only one level down.

However, when the shared library in question is a symbolic link (symlink) to another file, the code ends up getting the path of the actual file the symlink points to, instead of the path of that symlink (via dladdr call), and this causes the above problem.

There is an easy workaround. Since it’s only the shared library where the ResMgrContainer class is (which is libtl680li.so as mentioned) needs to be the actual file, you can simply delete the symlink that points to libtl680li.so, and put the original file in its place. Then OO.o launches just fine. You can leave all the other symlinked shared libraries alone. The only problem with this workaround is, if you want to hack at the tools code, you would need to manually swap the shared library on each module re-build, but for me, that’s not a major problem (I don’t hack at the tools code).

SSE2 Instructions

June 4th, 2007

For the past several weeks I have been studying X86 assembly language, mainly because I wanted to update my knowledge on the assembly language to match the latest CPU technology. I had previosly taken an X86 assembly language course at NCSU roughly a year ago, but the course only covered 8086 instruction set, and used the MASM version 6.0 as the assembler which is only good for writing MS-DOS applications. I wanted to at least learn how to do floating-point calculations in assembly, and do it in GNU assembly so that my apps would run on Linux.

There are quite a few extensions to the core X86 instructions, such as FPU, MMX, SSE, and SSE2. The FPU takes care of normal floating point calculations since 80386, MMX for operating multiple integer calculations in a single CPU cycle, SSE for multiple single-precision calculations, and SSE2 for multiple double-precision calculations (again, in a single CPU cycle). Since software these days, and OO.o in particular, seem to do almost all of floating point calculations in double-precision, I decided to give SSE2 a little benchmark test.

Here is how I did it. I wrote some simple mathematical routines in C, compiled it normally with gcc with -O1 optimization. Then I had gcc generate an assembly code of that routine, cleaned it up a bit and replaced several instructions with SSE2 instructions, reassembled it into an executable to run benchmark.

Here is the original C code for the routine:

void array_multiply(double* array1, double* array2, unsigned int size)
{
    // Make nloop a multiple of 2.
    unsigned int nloop = size/2;
    nloop += nloop;
 
    unsigned int i = 0;
    for (; i < nloop; i += 2)
    {
        array1[i]   *= array2[i] * array2[i] * array2[i];
        array1[i+1] *= array2[i+1] * array2[i+1] * array2[i+1];
    }
}

and this is the assembly instructions that gcc generated (with -O1):

    .text
    .align 2
.globl array_multiply
    .type   array_multiply, @function
array_multiply:
.LFB13:
    pushl   %ebp
.LCFI0:
    movl    %esp, %ebp
.LCFI1:
    pushl   %ebx
.LCFI2:
    movl    8(%ebp), %edx
    movl    12(%ebp), %ebx
    movl    16(%ebp), %ecx
    andl    $-2, %ecx
    je      .L5
    movl    $0, %eax
.L4:
    fldl    (%ebx,%eax,8)
    fld     %st(0)
    fmul    %st(1), %st
    fmulp   %st, %st(1)
    fmull   (%edx,%eax,8)
    fstpl   (%edx,%eax,8)
    fldl    8(%ebx,%eax,8)
    fld     %st(0)
    fmul    %st(1), %st
    fmulp   %st, %st(1)
    fmull   8(%edx,%eax,8)
    fstpl   8(%edx,%eax,8)
    addl    $2, %eax
    cmpl    %eax, %ecx
    ja      .L4
.L5:
    popl    %ebx
    popl    %ebp
    ret

It does all the calculations using FPU instructions. And here is the assembly code after I replaced the FPU instructions with SSE2 ones:

.section .text
.align 16
.globl array_multiply
    .type   array_multiply, @function
array_multiply:
    pushl   %ebp
    movl    %esp, %ebp
    pushl   %ebx
    movl    8(%ebp), %edx           # pointer to array1
    movl    12(%ebp), %ebx          # pointer to array2
    movl    16(%ebp), %ecx          # size
    andl    $-2, %ecx               # make the size a multiple of 2
    je  .L5
    movl    $0, %eax                # i = 0
.L4:
    movapd  (%edx,%eax,8), %xmm0    # SSE2
    movupd  (%ebx,%eax,8), %xmm1    # SSE2
    mulpd   %xmm1, %xmm0            # SSE2
    mulpd   %xmm1, %xmm0            # SSE2
    mulpd   %xmm1, %xmm0            # SSE2
    movapd  %xmm0, (%edx,%eax,8)    # SSE2
    addl    $2, %eax                # i += 2
    cmpl    %eax, %ecx
    ja  .L4
.L5:
    popl    %ebx
    popl    %ebp
    ret

I then used the following main C++ code

void print_array(double* array, unsigned int size)
{
    cout << "{ ";
    for (unsigned int i = 0; i < size; ++i)
    {
        if (i)
            cout << ", ";
        cout << array[i];
    }
    cout << " }" << endl;
}
 
int main()
{
    double myarray1[] = {10.5, 50.0, 25.0, 10.0, 2345.4848, 594.23, 0.4, 87.2};
    double myarray2[] = {1.2, 50.0, 1.5, 10.0, 120.9, 44.09, 874.234, 233.333};
    unsigned int array_size = 8;
 
    cout << "myarray1 = ";
    print_array(myarray1, array_size);
    cout << "myarray2 = ";
    print_array(myarray2, array_size);
 
    double myarray[array_size];
 
    for (long counter = 0; counter < 99999999; ++counter)
    {
        for (unsigned int i = 0; i < array_size; ++i)
            // To prevent calculation results from being cached.
            myarray[i] = myarray1[i] + 0.000000001*counter;
        array_multiply(myarray, myarray2, array_size);
    }
 
    for (unsigned int i = 0; i < array_size; ++i)
        cout << i << " t " << myarray[i] << endl;
}

to call both the C version and the assembly with SSE2 version to compare performance. The executables with the original C version and the SSE2 version are named test_c and test_sse, respectively. Here is the result (on my machine):

$ time ./test_c
myarray1 = { 10.5, 50, 25, 10, 2345.48, 594.23, 0.4, 87.2 }
myarray2 = { 1.2, 50, 1.5, 10, 120.9, 44.09, 874.234, 233.333 }
0        18.3168
1        6.2625e+06
2        84.7125
3        10100
4        4.14505e+09
5        5.09387e+07
6        3.34082e+08
7        1.10903e+09
 
real    0m3.308s
user    0m3.292s
sys     0m0.012s
 
$ time ./test_sse
myarray1 = { 10.5, 50, 25, 10, 2345.48, 594.23, 0.4, 87.2 }
myarray2 = { 1.2, 50, 1.5, 10, 120.9, 44.09, 874.234, 233.333 }
0        18.3168
1        6.2625e+06
2        84.7125
3        10100
4        4.14505e+09
5        5.09387e+07
6        3.34082e+08
7        1.10903e+09
 
real    0m2.451s
user    0m2.436s
sys     0m0.000s

Indeed, the SSE2 version seems to perform better! I also compared my SSE2 version against the -O3 optimized C-code, but there was not much difference from the -O1 optimized code for this particular algorithm. But of course, YMMV.

Does this mean we should start writing assembly code for performance optimization? Probably not. I still think it’s much better to write a better algorithm in the first place than re-write code in assembly language, because re-writing in assembly is itself not a guarantee for a better performance. But for serious performance work, however, knowing what type of assembly code that the compiler generates from a C/C++ code, for various compiler optimization flags, will undoubtedly benefit. You never know, in a few extreme cases, it may even make sense to write parts of the application code in assembly, even if that means having to write that part in assembly for every platform that app needs to support. OO.o has some parts of UNO bridge written in assembly, and I’ve seen some assembly code in the FireFox codebase as well.

Oh, by the way, for anyone looking for a good study guide on GNU assembly for X86 family of chips, the “Professional Assembly Language” by Richard Blum (Wiley Publishing) would be a pretty good place to start.

How to (pretend to) write an export filter

May 8th, 2007

It turns out that pretending to write an export filter, at least adding a new entry to the Export dialog, is quite easy. In fact, you don’t even have to write a single line of code. Here is what to do.

Suppose you do your own build, and you have installed the OO.o that you have built. Now, go back to your build tree, and change directory into the following location

filter/source/config/fragments

and add the following two new files relative to this location:

./filters/calc_Kohei_SDF_Filter.xcu
./filters/calc_Kohei_SDF_ui.xcu

You can name your files anyway you want, of course. ;-) Anyway, put the following XML fragments into these files:

<!-- calc_Kohei_SDF_Filter.xcu -->
<node oor:name="calc_Kohei_SDF_Filter" oor:op="replace">
  <prop oor:name="Flags"><value>EXPORT ALIEN 3RDPARTYFILTER</value></prop>
  <prop oor:name="UIComponent"/>
  <prop oor:name="FilterService"><value>com.sun.star.comp.packages.KoheiSuperDuperFileExporter</value></prop>
  <prop oor:name="UserData"/>
  <prop oor:name="FileFormatVersion"/>
  <prop oor:name="Type"><value>Kohei_SDF</value></prop>
  <prop oor:name="TemplateName"/>
  <prop oor:name="DocumentService"><value>com.sun.star.sheet.SpreadsheetDocument</value></prop>
</node>
 
<!-- calc_Kohei_SDF_Filter_ui.xcu -->
<node oor:name="calc_Kohei_SDF_Filter">
  <prop oor:name="UIName"><value xml:lang="x-default">Kohei Super Duper File Format</value>
    <value xml:lang="en-US">Kohei Super Duper File Format</value>
    <value xml:lang="de">Kohei Super Duper File Format</value>
  </prop>
</node>

Likewise, create another file:

./types/Kohei_SDF.xcu

with the following content

<!-- Kohei_SDF.xcu -->
<node oor:name="Kohei_SDF" oor:op="replace" >
  <prop oor:name="DetectService"/>
  <prop oor:name="URLPattern"/>
  <prop oor:name="Extensions"><value>koheisdf</value></prop>
  <prop oor:name="MediaType"/>
  <prop oor:name="Preferred"><value>false</value></prop>
  <prop oor:name="PreferredFilter"><value>calc_Kohei_SDF_Filter</value></prop>
  <prop oor:name="UIName"><value xml:lang="x-default">Kohei Super Duper File Format</value></prop>
  <prop oor:name="ClipboardFormat"><value>doctype:Workbook</value></prop>
</node>

Once these new files are in place, add these files to fcfg_calc.mk so that the build process can find them. To add, open fcfg_calc.mk and add Kohei_SDF to the end of T4_CALC, calc_Kohei_SDF_Filter to F4_CALC, and calc_Kohei_SDF_Filter_ui to F4_UI_CALC. Save the file and rebuild the module. This should rebuild the following configuration files (build done on Linux):

./unxlngi6.pro/misc/filters/modulepacks/fcfg_calc_types.xcu
./unxlngi6.pro/misc/filters/modulepacks/fcfg_calc_filters.xcu
./unxlngi6.pro/bin/fcfg_langpack_en-US.zip

One note: the language pack zip package should contain the file named Filter.xcu with the new UI string you just put in. If you don’t see that, remove the whole unxlngi6.pro directory and build the module again.

Now it’s time to update your installation. You need to update the following files:

<install_dir>/share/registry/modules/org/openoffice/TypeDetection/Filter/fcfg_calc_filters.xcu
<install_dir>/share/registry/modules/org/openoffice/TypeDetection/Types/fcfg_calc_types.xcu

with the new ones you just rebuilt. Next, unpack the langpack zip file and extract Filter.xcu. Place this file in

<install_dir>/share/registry/res/en-US/org/openoffice/TypeDetection/Filter.xcu

to replace the old one.

Ok so far? There is one more thing you need to do to complete the process. Since these configuration files are cached, in order for the updated configuration files to take effect, the cached data must be removed. The cached data is in the user configuration directory, so you need to locate and delete the following directory:

rm -rf <user_config_dir>/user/registry/cache

That’s it! Now, fire up Calc and launch the Export dialog. You see the new file format entry you just put in. :-)

Export dialog with new export filter entry

Just try not to export your file using this new filter for real, because that will utterly fail. ;-)