The highlight of this release is mostly with the segment_tree data structure, where its value type previously only supported pointer types. Markus Mohrhard worked on removing that constraint from segment_tree so that you can now store values of arbitrary types just like you would expect from a template container.
Aside from that, there are some minor bug and build fixes. Users of the previous versions are encouraged to update to this version.
Today is my last day with Collabora, and also my last day as a full-time engineer working on the LibreOffice (and formerly OpenOffice.org) code base. It’s been 8 long years of adventure. Lots of things happened, and we’ve achieved great many things. I’m certainly very proud of having been a part of it.
From this point on, I’ll participate the project purely as a volunteer. I have not yet figured out what I want to do nor how much I can do, and figuring that out will probably be my first task as a newly volunteer contributor.
Thank you for being patient with me in the last 8 years. You guys have been great, and, even though I’ll have much less time to devote to LibreOffice going forward, I still hope to see you guys around from time to time!
Today I’d like to talk about the LibreOffice Hackfest (LibreFest) that we did in Seattle on October 26th. This hackfest happens to be the very first hackfest event that I have participated outside of those held at the annual LibreOffice conferences, and the first one ever in the United States. Quite frankly, I didn’t really know what to expect going into this event. But despite that, I’m pleased to say that the event went quite well, with 32 participants joining the event in total, which was much more than what we had anticipated.
Hackfest took place inside the Communications Building at University of Washington, located in downtown Seattle. We borrowed a small-size class room to host the event, and later brought in extra chairs to accommodate everyone.
Four of us were there from the LibreOffice project – Robinson Tryon, Norbert Thiebaud, Bjoern Michaelsen and myself, though Bjoern had to leave early to catch his flight. Some of us came to the venue around 9 AM to set things up, and people started showing up around 9:30. Once the event officially started at 10 AM, we split into 2 tracks: the hackfest track where people work on building LibreOffice from the git repository & making changes, and the QA track where people test LibreOffice to report bugs. Robinson assisted those in the QA track, and the rest of us helped those in the hackfest track.
We spent much of the morning setting people up and getting their builds going, which was quite a challenge in and of itself. We eventually got everyone building one way or another, and the availability of a virtual machine environment was quite helpful for some of the participants. Others opted to use their own machines to build it on.
Some participants came late and joined in the afternoon session, while others only joined the morning session and had to leave in the afternoon. About half of us stayed there until late evening. Overall, it was great to see so much interest in our project, and pleased to see that many decided to stay until late to get things done.
Overall, we had a very successful hackfest event. I would like to thank Robinson for working hard to organize this hackfest, and Lee Fisher who was very helpful in organizing the event especially in handling matters on the Seattle side.
Two things I’ve learned from this event are: 1) access to a very fast virtual build environment can be quite helpful, and 2) Slackware is still very much alive! With regard to the first one, I feel that we should put more emphasis on having the participants use virtual machines to build LibreOffice for future hackfest events, and have mentors adequately trained to set it up for them. With regard to the popularity of Slackware, well, we need to encourage more participation from Slackware users and encourage them to share tips on building LibreOffice on Slackware in our wiki.
I hope those who came to the event learned something worthwhile (I certainly did), and I hope to see them again in the LibreOffice project!
Some of you have asked me previously whether or not we can share any test documents to demonstrate Calc’s new OpenCL-based formula engine. Thanks to AMD, we can now make available 3 test documents that showcase the performance of the new engine, and how it compares to Calc’s existing engine as well as Excel’s.
These files are intentionally in Excel format so that they can be used both in Calc and Excel. They also contain VBA script to automate the execution of formula cell recalculation and measure the recalculation time with a single button click.
All you have to do is to open one of these files, click “Recalculate” and wait for it to finish. It should give you the number that represents the duration of the recalculation in milliseconds.
Note that the 64-bit version of Excel requires different VBA syntax for calling native function in DLL, which is why we have a separate set of documents just for that version. You should not use these documents unless you want to test them specifically in the 64-bit version of Excel. Use the other one for all the rest.
On Linux, you need to use a reasonably recent build from the master branch in order for the VBA macro to be able to call the native DLL function. If you decide to run them on Linux, make sure your build is recent enough to contain this commit.
Once again, huge thanks to AMD for allowing us to share these documents with everyone!
I’d like to share the slides I used for my talk at LibreOffice Conference 2014 in Bern, Switzerland.
During my talk, I hinted that the number of unit tests for Calc have dramatically increased during the 4.2 bug fix cycle alone. Since I did not have the opportunity to count the actual number of unit test cases to include in my slides, let me give you the numbers now.
The numbers represent the number of top level test functions in each test category. Since sometimes we add assertions to existing test case rather than adding a new function when testing a new bug fix, these numbers are somewhat conservative representation of how much test case we’ve accumulated for Calc. Even then, it is clear from this data set that the number has spiked since the branch-off of the 4.2 stable branch.
Now, I’ll be the first to admit that the 4.2 releases were quite rough in terms of Calc due to the huge refactoring done in the cell storage structure. That said, I’m quite confident that as long as we diligently add tests for the fixes we do, we can recover from this sooner rather than later, and eventually come out stronger than ever before.
Just a quick update to my last post on getting Calc’s border line situation sorted out.
As of last post, the border lines were pretty in good shape as far as printing to paper, but it was still less than satisfactory when rendered on screen. Lines looked generally fatter and the dashes line were unevenly positioned. I had some ideas that I wanted to try out in order to make the border lines look prettier on screen. So I went ahead and spent a few extra days to give that a try, and I’m happy to report that that effort paid off.
To recap, this is what the border lines looked like as of last Friday.
and this is what they look like now:
The lines are skinnier, which in my opinion make them look slicker, and the dashes lines are now evenly spaced and look much better.
I spent this past week on investigating a collection of various problems surrounding how Calc draws cell borders. The problem is very hard to define and can become very subjective depending on who you talk to. Having said that, if you ever imported an Excel document that makes elaborate use of cell borders into Calc, you may often have seen that the borders were printed somewhat differently than what you would have expected.
When you open this document in Calc and print it, you probably get something like this:
You’ll immediately notice that some of the lines (hair, dashed and double lines to be precise) are not printed at all! Not only that, thin, medium and thick lines are a little skinner than those of Excel’s, the dotted line is barely visible, the medium dashed line looks a lot different, and the rest of the dashed lines all became solid lines.
Therefore, it was time for action.
I’ll spare you the details, but the good news is that after spending a week in various parts of the code base, I’ve been able to fix most of the major issues. Here is what Calc prints now using the build from the latest master branch:
There are still some minor discrepancies from Excel’s borders, such as the double line being a bit too thinner, the dotted line being not as dense as Excel’s etc. But I consider this a step in the right direction. The dashed and medium dashed lines look much better to my eye, and the thicknesses of these lines are more comparable to Excel’s.
The dash-dot and dash-dot-dot lines still become solid lines since we don’t yet support those line types, but that can be worked on at a later time.
So, this is all good, right?
Not quite. One of the reasons why the cell borders became such a big issue was that we previously focused too much on getting them to display correctly on screen. Unfortunately, the resolution of a typical PC monitor is not high enough to accurately depict the content of your document, so what you see on screen is a pixelized approximation of the actual content. When printing to a paper, on the other hand, the content gets depicted much more accurately simply because you get much higher resolution when printing.
I’ll give you a side-by-side comparison of how the content of the same document gets displayed in Excel (2010), Calc 4.2 (before my change), and Calc master (with my change) all at 100% zoom level.
First up is Excel:
The lines all look correct, unsurprisingly. One thing to note is that when displaying Excel approximates a hairline with a very thin, densely dotted line to differentiate it from a thin line both of which are one pixel high. But make no mistake; hairline by definition is a solid line. This is just a trick Excel employs in order to make the hairline look thinner than the thin line counterpart.
Then comes Calc as of 4.2 (before my change):
The hairline became a finely-dashed line both on display and in internal representation. Aside from that, both dashed and medium dashed lines look a bit too far apart. Also, the double line looks very much single. In terms of the line thicknesses, however, they do look very much comparable to Excel’s. Let me also remind you that Excel’s dash-dot and dash-dot-dot lines currently become solid lines in Calc because we don’t support these line types yet.
Now here is what Calc displays after my change:
The hair line is a solid line since we don’t use the same hair line trick that Excel uses. The dotted and dashes lines look much denser and in my opinion look better. The double line is now really double. The line thicknesses, however, are a bit off even though they are internally more comparable to Excel’s (as you saw in the printout above). This is due to the loss of precision during rasterization of the border lines, and for some reason they get fatter. We previosly tried to “fix” this by making the lines thinner internally, but that was a wrong approach since that also made the lines thinner even when printed, which was not a good thing. So, for now, this is a compromise we’ll have to live with.
But is there really nothing we can do about this? Well, we could try to apply some correction to make the lines look thinner on screen, and on screen only. I have some ideas how we may be able to achieve that, and I might give that a try during my next visit.
That, and we should also support those missing dash-dot, and dash-dot-dot line types at some point.
Here is the slides for my talk at the LibreOffice conference in Milan, Italy.
I did spend several slides with code examples in an attempt to explain how to use multi_type_vector in a performance-sensitive way. I realize it was not easy to digest all of that with my talk alone, so hopefully these slides will help you review it a bit for those of you who need it.
I will try to find some time to write separate blog articles to cover this topic properly.
Here is another performance improvement that just landed on master.
It was brought to our attention that the performance of saving documents to ODF spreadsheet format had been degrading quite noticeably. This was especially true when the document contained lots of what we call rich text cells. Rich text cells are those cells that contain text with mixed format spans, or text that consists of multiple lines. These cells are handled differently from simple strings internally, and have slightly more overhead than the simple string counterparts. Because of this, saving a document full of such texts was always slower than saving one with just numbers and simple strings.
However, even with this unavoidable overhead, the performance of saving rich text cells was clearly going in the wrong direction. Therefore it was time to act.
Long story short, after many days of code reading and writing, I brought it to a state where I can share some numbers.
Measuring export performance
I measured the performance of exporting rich text cells in the following steps.
Create a new spreadsheet document.
Type in cell A1 3 lines of ‘libreoffice’. Here, you can hit Ctrl-Enter to move to the next line within the same cell.
Copy A1, select A1:N1000 and paste, to replicate the content of A1 to all cells in the range.
Save the document as ODF spreadsheet document, and measure its duration.
I performed the above measurement with 3.5, 3.6, 4.0, 4.1, and the latest master (slated to become 4.2) builds, and these are the numbers.
It is clear from this chart that the performance started to suffer first in version 3.6, then gradually worsened over 4.0 and 4.1. The good news is that we have managed to bring the number back down in the master build, even lower than that of 3.5 which I used as the point of reference. Not just slightly lower, but much, much lower.
I don’t know about you, but I’m quite happy with this result.
This week I have finally finished implementing a true shared formula framework in Calc core which allows Calc to share token array instances between adjacent formula cells if they contain identical set of formula tokens. Since one of the major benefits of sharing formula token arrays is reduced memory footprint, I decided to measure the trend in Calc’s memory usages since 4.0 all the way up to the latest master, to see how much impact this shared formula work has made in Calc’s overall memory footprint.
Here is the test document I used to measure Calc’s memory usage
This ODF spreadsheet document contains 100000 rows of cells in 4 columns of which 399999 are formula cells. Column A contains a series of integers that grow linearly down the column. Here, only the first cell (A1) is a numeric cell while the rest are all formula cells that reference their respective immediate upper cell. Cells in Column B all reference their immediate left in Column A, cells in Column C all reference their immediate left in Column B, and so on. References used in this document are all relative references; no absolute references are used.
I’ve tested a total of 4 builds. One is the 4.0.1 build packaged for openSUSE 11.4 (x64) from the openSUSE repository, one is the 4.0.6 build built from the 4.0 branch, one is the 4.1.1 build built from the 4.1 branch, and the last one is the latest from the master branch. With the exception of the packaged 4.0.1 build, all builds are built locally on my machine running openSUSE 11.4 (x64). Also on the master build, I’ve tested memory usage both with and without shared formulas.
In each tested build, the memory usage was measured by directly opening the test document from the command line and recording the virtual memory usage in GNOME system monitor. After the document was loaded, I allowed for the virtual memory reading to stabilize by waiting several seconds before recording the number. The results are presented graphically in the following chart.
The following table shows the actual numbers recorded.
4.0.1 (packaged by openSUSE)
master (no shared formula)
master (shared formula)
Additionally, I’ve also measured the number of token array instances between the two master builds (one with shared formula and one without), and the build without shared formula created 399999 token array instances (exactly 4 x 100000 – 1) upon file load, whereas the build with shared formula created only 4 token array instances. This likely accounts for the difference of 78.3 MiB in virtual memory usage between the two builds.
Effect of cell storage rework
One thing worth noting here is that, even without shared formulas, the numbers clearly show a steady decline of Calc’s memory usage from 4.0 to 4.1, and to the current master. While we can’t clearly infer from these numbers alone what caused the memory usage to shrink, I can say with reasonable confidence that the cell storage rework we did during the same period is a significant factor in such memory footprint shrinkage. I won’t go into the details of the cell storage rework here; I’ll reserve that topic for another blog post.
Oh by the way, I have absolutely no idea why the 4.0.1 build packaged from the openSUSE repository shows such high memory usage. To me this looks more like an anomaly, indicative of earlier memory leaks we had later fixed, different custom allocator that only the distro packaged version uses that favors large up-front memory allocation, or anything else I haven’t thought of. Either way, I’m not counting this as something that resulted from any of our improvements we did in Calc core.