DBF import performance

Posted on December 11, 2009August 7, 2021 by Kohei Yoshida

Here is another performance win! Importing dbf files into Calc is now quicker by 80%. You will probably notice the difference especially when importing a large dbf file. The test document I used had roughly 24000 rows, and importing that took 57 seconds on my machine. Having 24000 rows in a database file (or even in a spreadsheet file) is very common by today’s standard, so this wasn’t good at all.

I had done quite a bit of performance work over the years, but this one was somewhat difficult to tackle. The bottlenecks were fragmented all over the place which required different solutions to different areas. Roughly speaking, the following are the areas I tackled to reduce the total import time for dbf files (module name in parentheses):

speedup in parsing of dbf file content (connectivity)
disabled property change notification during dbf import (dbaccess)
more efficient string interning and unicode conversion (sal)
reduction in column array re-allocation during import (sc)
removal of unnecessary column and row size adjustments post-import (sc)

With all of this, the file that originally took 57 seconds to load now loads in 12 seconds on the same hardware, which roughly translates to 80% reduction of the total import time!

This itself is pretty impressive; however, I was hoping to get it at least under 10 seconds since Excel can load the same file less than 5 seconds on the same hardware, even through wine emulation (!). But that’s probably for a future project. For now, I’m content with what I’ve done.

Published by Kohei Yoshida

LibreOffice contributor, spreadsheet nerd, and machine learning beginner. View all posts by Kohei Yoshida

7 thoughts on “DBF import performance”

ervin December 12, 20094:48 am

awesome.
Jim Callahan December 16, 200910:51 am

Terrific!
How hard would it be to modify the Calc import module to serve as the Base import module?

Some Base tutorials and forums suggest importing CSV files by importing into Calc and copying and pasting to base! Not bad for small tables, but what if more than 64,000 rows?

http://sheepdogguides.com/fdb/fdb1imp1.htm
http://www.oooforum.org/forum/viewtopic.phtml?t=25434

or raw SQL
http://user.services.openoffice.org/en/forum/viewtopic.php?f=83&t=23260

Some likely issues:
1. The modified import module would have to be callable from Base (is there a separate import module in Calc?).
2. The modified import module would have to output to Hyper SQL (HSQLDB) or whatever database Base is using.
http://www.hsqldb.org/
a. Database interface
http://wiki.services.openoffice.org/wiki/HSQLDB:Tips_and_Tricks
http://wiki.services.openoffice.org/wiki/HSQLDB_Integration
b. Would have to guess at field types, especially numeric (integer or real?).
3. The modified import module would have to be able to process more than 64,000 rows.

I assume this is non-trivial or it would have been done years ago.

Thanks,
Jim Callahan
Orlando, FL
1. Kohei Yoshida December 16, 200911:39 pm
  
  Hi Jim,
  
  As I also said in my reply to your email, having Base support CSV file as an editable data source will not be trivial, to say the least. But I’m not really that much familiar with Base (yet), so you are best advised to ask this on the dev@dba.openoffice.org mailing list.
  
  Good luck!
  1. Andreas Säger January 7, 20114:57 pm
    
    Hello Kohei,
    You wrote:
    > having Base support CSV file as an editable data source will not be trivial,
    
    You can use Base and its underlying HSQLDB as csv editor: http://user.services.openoffice.org/en/forum/viewtopic.php?f=83&t=23260
Jody Goldberg December 23, 20099:39 am

Is the file publicly available?
I’d like to see how Gnumeric stacks up.
Kohei Yoshida December 23, 200910:19 am

Hi Jody, good to hear from you. :-)

Unfortunately the file is confidential, but it’s just a regular, ordinary-looking dbf file with 24000 rows from column A to CN. About half of the records are string records. Any file that resembles that will do, I think.

I just opened up the same file in Gnumeric on my machine, and Gnumeric can open the file in ~11 seconds. So, it does slightly better than Calc even after all my optimizations.
allbombs October 24, 201010:54 pm

doh..wish the file was public

=(

Comments are closed.