This is another performance win.
I’ve just pushed changes to the master branch to improve the import performance of binary Excel documents containing tons of form controls. The test document I used had an upward of 500 form controls which, prior to the change, Calc would spend at least several minutes to load. I don’t know the exact amount of time it took to open the document because each time I tried to open it, I had to kill the app after I became too impatient to wait.
Long story short, the same document now opens under 6 seconds on my machine.
The poor performance in this particular case consisted of several different bottlenecks. They are
- inefficient algorithm in registering event listeners for VBA events,
- inefficient algorithm in querying the code name from the parent application,
- unnecessary VBA event registration for form controls, and
- sending unnecessary notifications to property value change listeners during import for each and every property value insertion.
Registering event listeners for VBA events
When each control is inserted, we register several VBA events for it in order to handle events from the VBA code. For each event, we would register by passing the target and listener pair to the handler that handles event notification. As it turned out, however, each time that happens, the handler has to introspect the type of the target because it is passed as UNO’s Any object. While each instance of that may take only a fraction of a second to complete, when calling it literally millions of times it adds up not to mention the fact that the target remains the same for 12 or so listeners that are being registered for each control.
To solve this, I added a new method to register multiple event listeners for an identical target in a single call, to avoid repeated and unnecessary introspection of the target type. This alone has resulted in reducing the load time significantly (66% load-time reduction with my test document). However, this was still not enough with a larger number of controls since, as the number of controls grew, the load time would increase almost quadratically.
Querying the code name from the parent application
Another issue was the algorithm responsible for looking up the “code name” of the VBA module that the control belongs to. The code name is the name associated with each VBA module that Excel creates for each sheet. The name of the module does not necessarily equal the name of the sheet, and is unique to each sheet. The old algorithm would go through all existing form control instances in order to find a match, then backtrack the sheet it is on in order to determine the correct code name. But because it had to iterate through all existing controls, as the number of the controls grew, so would the time it takes to find a match.
Since the code name is identical for each sheet, there was no reason to check every single control. So I added a new method to get the code name directly from the parent container of the controls. Since we only create one container per sheet at most, this has resulted in making the code name lookup independent of the number of controls, and has resulted in quasi-constant time lookup since the number of sheets doesn’t grow during the import.
Unnecessary VBA event registration for form controls
There are two types of controls that Excel supports. One is the older form controls that you can insert via Forms toolbar, while the other is the newer, OLE controls that you can insert via Control Toolbox toolbar. As luck would have it, Excel doesn’t support bindings to VBA with the form controls, so it was not necessary to register events for these guys when we import them (as Noel told me). Turning off event registration for form control import has surely cut down the load time significantly. Many thanks to Noel for giving me a patch to turn this off for form controls.
Property value change listeners
Even after all these performance bottlenecks squashed, the load time still didn’t feel as fast as it should be. So, I did another round of profiling. It indicated that, every time we set a new property value to a control via XPropertySet, we would notify all property value change listeners to allow them to react to the change or veto the change, and this happened unconditionally for every single property value insertion for every single control.
Since the likelihood of having to veto or change other property values based on a new property value insertion during file import is close to nil if not zero, I added a new API to temporarily turn off this notification. This has cut down the last few seconds off the overall load time, down to 6 seconds in total. This notification is turned back on after the loading is complete.
There are several opportunities for future work. For one thing, the code name lookup also applies to the VBA event support in Writer. But because I wasn’t aware of how Writer organizes form controls, I didn’t touch its lookup algorithm. So, if the same inefficiency applies to Writer (which I’m not sure it does), then there may be a way to improve performance in that area.
Another area to consider is reducing the number of VBA events to register. As Noel told me, we currently register 12 or so events unconditionally for all controls imported from Excel documents. But technically we only have to register events that are actually needed. So, if we can find a way to determine what events we need to register by either parsing the VBA code or any other ways, we can reduce the number of VBA event registrations during the import.
This is all I can think of at the moment. Thank you ladies and gentlemen.