Tag Archives: performance

DBF import performance

Share Button

dbf-import-perfHere is another performance win! Importing dbf files into Calc is now quicker by 80%. You will probably notice the difference especially when importing a large dbf file. The test document I used had roughly 24000 rows, and importing that took 57 seconds on my machine. Having 24000 rows in a database file (or even in a spreadsheet file) is very common by today’s standard, so this wasn’t good at all.

I had done quite a bit of performance work over the years, but this one was somewhat difficult to tackle. The bottlenecks were fragmented all over the place which required different solutions to different areas. Roughly speaking, the following are the areas I tackled to reduce the total import time for dbf files (module name in parentheses):

  • speedup in parsing of dbf file content (connectivity)
  • disabled property change notification during dbf import (dbaccess)
  • more efficient string interning and unicode conversion (sal)
  • reduction in column array re-allocation during import (sc)
  • removal of unnecessary column and row size adjustments post-import (sc)

With all of this, the file that originally took 57 seconds to load now loads in 12 seconds on the same hardware, which roughly translates to 80% reduction of the total import time!

This itself is pretty impressive; however, I was hoping to get it at least under 10 seconds since Excel can load the same file less than 5 seconds on the same hardware, even through wine emulation (!). But that’s probably for a future project. For now, I’m content with what I’ve done.

Updates on various stuff

Share Button

Ok. Here is some updates on some of the stuff I’ve been doing lately. I picked the ones that are particularly worth mentioning.

Saving documents

There are two changes related to the document-saving functionality that I’d like to mention. The first one is the new icon in the document modified status window. As I blogged before, I had made a minor polish to the existing document modified status window, to show the status graphically instead of simply showing ‘*’ when the document is modified. The only problem was that the icon I used to fill that space was pretty lame and ugly. But thanks to jimmac, we now have a much better icon to show the modified status (see below).
doc-modified

The second thing is with the save icon itself. It has been known to us that some users want the ability to always save the document even when the document is not considered “modified”, while others want the save action disabled when the document is not “modified”. I quote the term modified here because even when the content of the document has not changed, some peripheral data may have changed, such as the zoom level, cursor position, active sheet and so on and so forth. These peripheral data (that we call the “view data”) are still stored with the document, but changes in these data do not set a document modified status. So, if you wanted to save your document with the cursor at a particular location, a certain sheet activated and the zoom level set to a certain level, you had to make a fake change to the content to be able to save the document with the view data change. The solution we had employed previously was to always enable this only for Calc, where the request for this behavior was greatest. However, some users still found it confusing that only Calc enables the save all the time while the rest of the applications didn’t. Also, a lot of users used the save icon itself to check whether their document has been modified or not even in Calc.

So, I’ve decided to make it a configuration option. That way we can keep both camps happy. :-) Here is the new check box to toggle this behavior:
always-save-option

Anyway, I hope some of you guys will find this useful, or at least will not find it annoying.

Performance improvement

pagenation-perf-chart
Another stuff worth mentioning is the improvement I made on Calc’s pagination performance. Pagination refers to the action of calculating appropriate positions to set page borders over the entire sheet based on the current page size, row/column sizes, presence of manual page breaks and several other factors. I had previously worked on optimizing this when we increased Calc’s row limit to 1 million rows (as I also mentioned during my talk in Orvieto), but apparently that optimization still had massive room for improvement; the test document I had took 7 minutes to perform pagination during printing! Granted, the document had 98 pages to print, but I bet that no one wants to wait that long to print even if the document has that many pages.

Long story short, I have reduced the duration from 7 minutes to roughly 35 seconds. Though I’m very happy with the result, it required a large amount of refactoring to get to that point, and when a large amount of code changes, the chance of introducing regressions unfortunately goes up. So, please pay special attention to Calc’s pagination behavior and its handling of row heights, and if you notice any problems, I’d like to hear from you, preferably with a test document or two.

DataPilot field popup window

Last but not least, I’d like to mention this one. The DataPilot field popup window has been in the works for quite some time since 3.1. I have blogged about the initial version and the 2nd incarnation. Now the 3rd incarnation is on the horizon. As they say, a picture is worth a thousand words. So without further ado, let’s take a look at the screenshot:
dp-popup-window
This version has a “toggle all” check box to quickly turn on and off all field members, “select only current” button to only select currently selected member, and “unselect only current” button to select all but the current member. Also not visible on this screenshot is the support for Gnome accessibility framework, which is also new in this version.

So…

These are the highlights of some of the stuff I’ve been doing recently. There are more things on the horizon, so stay tuned.

Current status

Share Button

First off, my coding activity on solver is progressing very slow these days. Mostly because, besides my day job as a software engineer, I’ve been taking a class at a nearby college since January, and its class work (homework and programming assignments etc.) is taking up the bulk of my free time. This class finishes in May, so hopefully I will have more time to spend for the solver project once this is all done.

Also, for those who haven’t noticed, there is now a Windows binary available thanks to Kami’s contribution, so to those of you who have been waiting for a Windows version, now is your chance. :-) It’s available at the usual place.

Ok, so…, here is the update on my solver. I’m pretty much done with my revised simplex code, and now I’m shifting my focus to improving the (upper-and-lower) bounded revised simplex code. After a couple of test runs I’ve realized that the bounded simplex’s initial solution search is still pretty weak, so, that code needs to be improved first before anything else.

As an aside, here is an interesting page on Calc’s performance optimization bits by Michael and his gang.