Shared formula to reduce memory usage

This week I have finally finished implementing a true shared formula framework in Calc core which allows Calc to share token array instances between adjacent formula cells if they contain identical set of formula tokens. Since one of the major benefits of sharing formula token arrays is reduced memory footprint, I decided to measure the trend in Calc’s memory usages since 4.0 all the way up to the latest master, to see how much impact this shared formula work has made in Calc’s overall memory footprint.

Test document

Here is the test document I used to measure Calc’s memory usage

shared-formula-memory-test.ods

This ODF spreadsheet document contains 100000 rows of cells in 4 columns of which 399999 are formula cells. Column A contains a series of integers that grow linearly down the column. Here, only the first cell (A1) is a numeric cell while the rest are all formula cells that reference their respective immediate upper cell. Cells in Column B all reference their immediate left in Column A, cells in Column C all reference their immediate left in Column B, and so on. References used in this document are all relative references; no absolute references are used.

Tested builds

I’ve tested a total of 4 builds. One is the 4.0.1 build packaged for openSUSE 11.4 (x64) from the openSUSE repository, one is the 4.0.6 build built from the 4.0 branch, one is the 4.1.1 build built from the 4.1 branch, and the last one is the latest from the master branch. With the exception of the packaged 4.0.1 build, all builds are built locally on my machine running openSUSE 11.4 (x64). Also on the master build, I’ve tested memory usage both with and without shared formulas.

Results

In each tested build, the memory usage was measured by directly opening the test document from the command line and recording the virtual memory usage in GNOME system monitor. After the document was loaded, I allowed for the virtual memory reading to stabilize by waiting several seconds before recording the number. The results are presented graphically in the following chart.

shared-memory-graph

The following table shows the actual numbers recorded.

Build Virtual memory
4.0.1 (packaged by openSUSE) 4.0 GiB
4.0.6 892.1 MiB
4.1.1 858.4 MiB
master (no shared formula) 842.2 MiB
master (shared formula) 763.9 MiB

Additionally, I’ve also measured the number of token array instances between the two master builds (one with shared formula and one without), and the build without shared formula created 399999 token array instances (exactly 4 x 100000 – 1) upon file load, whereas the build with shared formula created only 4 token array instances. This likely accounts for the difference of 78.3 MiB in virtual memory usage between the two builds.

Effect of cell storage rework

One thing worth noting here is that, even without shared formulas, the numbers clearly show a steady decline of Calc’s memory usage from 4.0 to 4.1, and to the current master. While we can’t clearly infer from these numbers alone what caused the memory usage to shrink, I can say with reasonable confidence that the cell storage rework we did during the same period is a significant factor in such memory footprint shrinkage. I won’t go into the details of the cell storage rework here; I’ll reserve that topic for another blog post.

Oh by the way, I have absolutely no idea why the 4.0.1 build packaged from the openSUSE repository shows such high memory usage. To me this looks more like an anomaly, indicative of earlier memory leaks we had later fixed, different custom allocator that only the distro packaged version uses that favors large up-front memory allocation, or anything else I haven’t thought of. Either way, I’m not counting this as something that resulted from any of our improvements we did in Calc core.

14 thoughts on “Shared formula to reduce memory usage”

  1. The document hangs LibreOffice 4.1.0 on Windows :(
    Using master, the memory usage for the document is 132 Mb according to the task manager.

    1. You might give 4.1.1 a try. There was one major perf bug on ods import that I fixed for 4.1.1. That may be causing the hang, assuming that the document does eventually open, after a few minutes of load.

      OTOH, the number for the master build looks pretty good. Thanks for the quick test!

  2. 208,50MB on win7 64bit, LO 4.1.1 RC1, first start, according to task manager for soffice.bin.

  3. LibreOffice 4.1.0.4 (Release), Windows 7 Ultimate SP1 64 bit, file opened from the link, LibreOffice started by the document
    soffice.bin memory according to Resource Monitor:
    Commit 211.908KB, Working Set 251.204KB, Shareable 45.252KB, Private 205.952KB

  4. FWIW, I tried to test MSO Excel 2007 with this file. Only calculated results were imported, so I recreated the full test file in Excel, saved as .ods and reopened.
    EXCEL.EXE memory according to Resource Monitor:
    Commit 46.916KB, Working Set 63.012KB, Shareable 21.896, Private 41.116KB
    Does this seem right? There may be something wrong, see bug #68176.

    1. Excel has a number of other memory optimizations in place, aside from just shared formulas. You can’t compare Calc with Excel yet. They are still miles ahead of us.

    1. Make sure that your master build contains shared formula. Shared formula was turned on for ods just this Thursday my time, one day before you posted your comment. It’s possible that your tested build didn’t have shared formula turned on.

  5. Results on my machine (Win7): Calc 4.1 -> 205MB, Calc 4.2 -> 48,6MB and Excel -> 28MB.
    Wow, that’s a great work!!!! Thank you so much :D

  6. Great work Kohei :-) Measuring memory consumption on Linux is really hard – none of the RSS-share type algorithms will work too well for LibreOffice; most likely it’s best to do:

    $ pmap `pidof soffice.bin` | grep heap
    09938000 39956K 39804K 13180K 39804K 0K rw-p [heap]

    eg. and use those numbers. OTOH there is a lot of memory usage that is nothing to do with calc, so perhaps using simple Virtual memory mapped, and subtracting the value for an empty calc sheet would be better ? :-)

    Nice work anyhow – and thanks for blogging about it.

  7. I don’t think this may be of many use given my case of unwieldy spreadsheet. Very few formulas in it, massive data (almost half redundant) yet crashes very often when edited then saved.

Comments are closed.