Performance improvement in opening ODS documents

I have great news to share with you. Calc’s ODS import filter in 3.5 should be substantially faster when you have documents with a large number of named ranges. Read on if you want to know more details.

What happened?

Laurent Godard, Markus Mohrhard, and myself have been working pretty hard in the past month to bring the performance of ODS import filter to a reasonable level, especially with documents containing a large number of named ranges.

Here is the background. Laurent uses LibreOffice as a platform for his professional extension, which makes heavy use of named ranges. It programmatically generates ODS documents and inserts hundred’s or thousand’s of named ranges as intermediary storage to further process the data. The problem was, however, our import performance with that kind of documents was so suboptimal that this process was taking a prohibitively long time. In order for his extension to perform optimally, our ODS import filter needed to be optimized, and optimized heavily.

During the Paris conference, we got our heads together in order to come up with a strategy to make that happen. Laurent was more than willing to participate this effort, and in the end, he did substantial amount of work profiling, analyzing code, coming up with optimization strategy and putting it altogether. Markus and I provided mentorship, code pointers, as well as occasional coding to accelerate this effort.

Our hope was to make it all happen in time for our first 3.5 release. And I’m very happy to say that we made it.

Benchmark

Since we are talking about performance, it won’t be complete without the actual numbers. So here goes.

Test document 1


Here is the first test document global500.ods. It contains 500 sheets, 12,500 global named ranges, and 12,500 formulas that reference them.

On my development machine, the last stable release 3.4.4 takes 14 seconds to open this document. While 14 seconds may not seem that slow, keep in mind that this machine is somewhat unfairly fast tailored for the abusive developer use, so the real world performance is likely much less impressive (you can probably multiply that number by 3 to get a rough idea of the real world performance). Anyhow, using the latest master branch on the same machine, this document opens roughly in 2 and a half seconds. That’s roughly 86% reduction in import time.

Test document 2


Here is the second, somewhat larger document global1000.ods. This document contains 1000 sheets, 25,000 named ranges and 25,000 formulas that reference them.

According to my benchmark performed in the same condition as the first document, 3.4.4 opens this document in 50 seconds, whereas in 3.5.0 it opens under 5 seconds. That’s about 90% reduction in import time. Pretty impressive!

Real power of open source

This story shows another aspect of this remarkable achievement worth mentioning. If you use an open source product such as LibreOffice in your business, and if it doesn’t perform the way you need it to, you can actually join the project as a developer and coordinate the effort with the upstream developers to make it happen. And depending on the nature of the change you want to see happen, it can happen very quickly as this story demonstrates.

I wanted to emphasize this because, while more and more businesses and institutions are embracing open source software, many of them tend to focus too much on the cost-saving aspect of it, thereby developing the wrong mindset that that’s what open source is all about. It isn’t. The real power of using open source software in your deployment is it gives you the ability to join and contribute to the project to influence the direction of its development. That gives you real flexibility in planning, and in my opinion the best way to harness the power of using open source software. The monetary cost-saving side of the benefit comes as a side effect but should be thought of only as an added bonus, not the primary reason for deploying open source software.

Redesigned autofilter popup

I’m happy to announce that I’ve managed to squeeze this new feature in just in time for the 3.5 code freeze.

What’s new?

As I’ve mentioned briefly in G+, I’ve been working on brushing up the age-old autofilter popup window in the past few weeks. I have no idea how old the old one is, but it’s been there for as long as I remember. In case anyone needs a reminder as to what the old one looks like, here it is.

It’s functional, yet very basic. While this has served us for many years since the last century, it was also clear that the world has since moved on, and the people has started craving for modern looks and eye candies even in the office productivity applications. Clearly, it was time for a change.

In contrast to the old, here is how the new one looks:

I don’t know about you, but I really like the new one better. :-)

Motivation

Aside from updating the aged look of the old popup, I was also motivated to introduce the new popup for its ability to allow selection of multiple values from the selection list.

As you can see in the screenshot, the old one allowed only one value to be selected for each given column, which was not only very limiting but also caused interoperability issues with Excel documents, especially with those created in Excel 2007 and newer. In fact, adding this feature has been my long-term goal, ever since I began working on OpenOffice.org code base professionally. Because of this background, I had my personal attachment to fulfill that goal, and I’m really glad to have finally landed this feature 4 years and one name change (OOo to LibreOffice) later!

Laying the foundation

You may think that this new popup looks somewhat familiar. That’s because the same popup is also used as the pivot table (formerly data pilot) field member selection popup. I’ve touched on this previously on my blog, and you’ll probably notice the similarity when comparing the screenshot of the new popup with the screenshot of the pivot table popup included in that post.

Internally these two use the same code. In fact, when I developed that feature for the pivot table, I intentionally designed it to be re-usable, precisely so that I could use it for the autofilter popup at a later time.

So, the hard part of implementing the new popup had already been finished. All I had to do was to put the autofilter functionality into the popup and launch it instead of the ugly old one, which is precisely what I did to bring the new popup into reality. I also had to refactor the code that performs the filtering to allow multi-value matching, which was, while invisible to the users, not a trivial task.

Going forward

The work is not totally done yet. As of this writing, the xlsx filter has not been fully adopted to take advantage of the new multi-selection capability, but that’s my next task, and I expect that to be done in time for 3.5.

Also, the menu still looks very basic, and contains only the same set of options that the old popup had. This was done deliberately in order for us to ship it in time for 3.5, by avoiding the rather expensive process of re-designing the menu part of the popup. But I expect we work on the re-design post-3.5, to make it even better and more usable. Note that the new popup is fully capable of doing sub menus, which gives us all sorts of possibilities.

Anyhow, that’s all I have to say about this at the moment. I hope you guys will enjoy the new and shiny autofilter popup! :-)

Notes for testing

As with any new features, this one needs lots of testing. I’ve written new unit test to cover some parts of it, but unit test can’t cover all corners of use cases (especially those involving UI interactions), and manual testing from real users is always appreciated. Some of the affected areas I can think of are:

  • Built-in functions MATCH, LOOKUP, HLOOKUP and VLOOKUP that use the core filtering code which I’ve heavily refactored.
  • Import and export of the existing filtering rules, with ods, xls, and xlsx.
  • Filtering with pivot tables, which shares parts of the filtering code that has been refactored.
  • Standard and advanced filter dialogs

So, watch out for the next daily build that includes this feature!