Two more enhancements are in

Today, I’d like to talk about two minor enhancements I just checked in to ooo-build master. They are not really earth-shuttering per se, but still worth mentioning & may be interesting to some users.

Insert new sheet tab

insert-sheet-shot

Here is the first enhancement. In Calc, you’ll see a new tab at the right end of the sheet tabs, to allow quick insertion of new sheets. Each time you click this tab, a new sheet gets inserted to the right end. The sheet names are automatically assigned.

Previously, inserting a new sheet has to be done by opening the Insert sheet dialog, selecting the position of the new sheet and how many new sheets are to be inserted etc. But if you always append a single sheet at the right end and don’t care to name the new sheet (or name it after the sheet is inserted), this enhancement will save you a few clicks. Implementing this was actually not that hard since I was able to re-use the existing code for most of its functionality. I personally wanted to give it a little more visual appeal, but that will be a future project.

Anyway, I hope some of you will find this useful.

English function names in non-English locale

The second enhancement is related to cell functions. If you use a localized version of OOo, you probably know that the function names are localized. But there has been quite a few requests to support English function names even if the UI is localized. This is where this enhancement comes in.

First, there is now an additional check box in the Formula options page:
english-func-option
By default, the check box is off, which means the localized function names are used. Checking this check box will swap localized function names with the English ones across the board. You can of course uncheck it to go back to the localized function names.

For example, in French locale, the name of the function that calculates a summation of a cell range is called SOMME, but when the English function name option is enabled, this becomes SUM as you can see in the following screenshot:
english-func-displayed

This change takes effect in all of the following areas:

  • formula input and display,
  • function wizard, and
  • formula tips.

As always, please test this thoroughly, and report any bugs. Thanks!

Distributed text justification

What’s new?

Here is something I’ve been working on for the past few weeks. Since I just checked in the first version of this feature into ooo-build master, it’s probably a good time for me to talk about this.

This feature introduces a new justification option for cell text known as the “distributed justification”, where the left and right edges of the text are aligned with the left and right edges of the bounding box by adjusting space between characters (inter-character spacing), rather than space between words (inter-word spacing), across the entire width of the bounding box. This type of distributed text justification makes little sense for Latin-based languages such as English, French and German, but makes a big difference for Asian languages such as Japanese. The reason the normal justification doesn’t work for Asian languages is because, in those languages, you don’t put spaces between individual word boundaries, and the normal justification relies on presence of spaces at word boundaries. This is where the distributed justification comes into play.

This distributed justification method is commonly known as ?????? in Japanese, and is said to be one of the blockers when attempting to migrate users away from Excel to Calc.

Horizontal justification

First and foremost, I’d like to cover the horizontal justification. The following screenshot shows the difference between the three horizontal alignment modes:

calc-text-hor-align

As you can see, in the normal left-aligned text, the right edges of the lines are not aligned. When the text is justified, the right edges of the lines are now aligned by adjusting the inter-character spacing, except for the last line, which remains left-aligned. When the text is distributed, even the right edge of the last line becomes aligned with the right edge of the bounding box by equally distributing the characters on that line.

To allow this new justification type, I added a new justification type Distributed to the existing Cell Formatting dialog.

calc-text-align-dlg

For the vertical alignment setting, I’ve added two new options Justified and Distributed, to support justification in the vertical direction.

Justifying Asian text mixed with Latin text

While working on this feature, I have decided to also tweak the normal justification algorithm to make it work slightly better for Asian text mixed with Latin text such as English. As I mentioned earlier, distributed justification is not really ideal for Latin text. But with the society becoming more and more global, we are seeing more and more Asian text intermixed with Latin text, and vise versa. And correctly justifying a text having mixed script types requires using different justification methods for their respective script types. After a bit of trial and error, I think I got it right. You can see the result in the following screenshot:

mixed-script-justification

The English portion of the text is justified by inter-word justification, whereas the Japanese portion is justified by inter-character justification. The spaces between the English and Japanese text portions are also slightly adjusted in this scheme.

Vertical justification

Now, let’s move on to the vertical justification. When you justify a text in the vertical direction, that is, in the direction perpendicular to the direction of text flow, the spacing between the lines gets adjusted so that the top and bottom lines get aligned with their respective edges of the bounding box, like so:
vertically-justified
The top cell shows text with default justification, while the bottom cell shows text with vertical justification.

The Cell Format dialog itself provides both Justified and Distributed options for the vertical justification setting, but they do exactly the same thing for horizontally-flowing text. For vertically-flowing text, on the other hand, they do different things, but more on that in the next section.

Justifying vertically flowing text

Now, you can also justify text even when the text is flowing vertically. There are three ways you can make the text flow vertically. You can either

  1. rotate 90 degrees to the right (bottom-to-top),
  2. rotate 90 degrees to the left (top-to-bottom), or
  3. switch to Asian layout mode, which flows text in the top-to-bottom, right-to-left direction.

In these modes, the Justified and Distributed vertical justification options do have different effects. The following screenshot demonstrates different vertical alignment settings in three different vertical flow modes.

vertially-flowing-paragraph

As an added bonus…

The code responsible for the text layout, the code where I made my modification to support this feature, is actually shared between Calc, Draw and Impress. Calc uses it to render complex cell text, while Draw and Impress use it for their text box objects. This means that, any improvement I make in this area will automatically be made available for all three applications. All that needs to be done is to simply adjust the UI in each app and add hooks in their respective import/export filters. Whether or not I’ll work on that during this cycle is another question. Having said that, I’d like to eventually get that done, and I’d like to do it sooner rather than later. But we’ll see how that goes.

But even without making the extra code change in the Draw/Impress code, my change so far was enough to fix this bug which I didn’t even know existed. :-)

Lastly…

As of this writing, I’m not entirely done with this feature yet. I still have to cover some corner cases, and I still need to fix some bugs which I unfortunately discovered while taking screenshots for this post. So, stay tuned for further fine-tuning!

DBF import performance

dbf-import-perfHere is another performance win! Importing dbf files into Calc is now quicker by 80%. You will probably notice the difference especially when importing a large dbf file. The test document I used had roughly 24000 rows, and importing that took 57 seconds on my machine. Having 24000 rows in a database file (or even in a spreadsheet file) is very common by today’s standard, so this wasn’t good at all.

I had done quite a bit of performance work over the years, but this one was somewhat difficult to tackle. The bottlenecks were fragmented all over the place which required different solutions to different areas. Roughly speaking, the following are the areas I tackled to reduce the total import time for dbf files (module name in parentheses):

  • speedup in parsing of dbf file content (connectivity)
  • disabled property change notification during dbf import (dbaccess)
  • more efficient string interning and unicode conversion (sal)
  • reduction in column array re-allocation during import (sc)
  • removal of unnecessary column and row size adjustments post-import (sc)

With all of this, the file that originally took 57 seconds to load now loads in 12 seconds on the same hardware, which roughly translates to 80% reduction of the total import time!

This itself is pretty impressive; however, I was hoping to get it at least under 10 seconds since Excel can load the same file less than 5 seconds on the same hardware, even through wine emulation (!). But that’s probably for a future project. For now, I’m content with what I’ve done.

Updates on various stuff

Ok. Here is some updates on some of the stuff I’ve been doing lately. I picked the ones that are particularly worth mentioning.

Saving documents

There are two changes related to the document-saving functionality that I’d like to mention. The first one is the new icon in the document modified status window. As I blogged before, I had made a minor polish to the existing document modified status window, to show the status graphically instead of simply showing ‘*’ when the document is modified. The only problem was that the icon I used to fill that space was pretty lame and ugly. But thanks to jimmac, we now have a much better icon to show the modified status (see below).
doc-modified

The second thing is with the save icon itself. It has been known to us that some users want the ability to always save the document even when the document is not considered “modified”, while others want the save action disabled when the document is not “modified”. I quote the term modified here because even when the content of the document has not changed, some peripheral data may have changed, such as the zoom level, cursor position, active sheet and so on and so forth. These peripheral data (that we call the “view data”) are still stored with the document, but changes in these data do not set a document modified status. So, if you wanted to save your document with the cursor at a particular location, a certain sheet activated and the zoom level set to a certain level, you had to make a fake change to the content to be able to save the document with the view data change. The solution we had employed previously was to always enable this only for Calc, where the request for this behavior was greatest. However, some users still found it confusing that only Calc enables the save all the time while the rest of the applications didn’t. Also, a lot of users used the save icon itself to check whether their document has been modified or not even in Calc.

So, I’ve decided to make it a configuration option. That way we can keep both camps happy. :-) Here is the new check box to toggle this behavior:
always-save-option

Anyway, I hope some of you guys will find this useful, or at least will not find it annoying.

Performance improvement

pagenation-perf-chart
Another stuff worth mentioning is the improvement I made on Calc’s pagination performance. Pagination refers to the action of calculating appropriate positions to set page borders over the entire sheet based on the current page size, row/column sizes, presence of manual page breaks and several other factors. I had previously worked on optimizing this when we increased Calc’s row limit to 1 million rows (as I also mentioned during my talk in Orvieto), but apparently that optimization still had massive room for improvement; the test document I had took 7 minutes to perform pagination during printing! Granted, the document had 98 pages to print, but I bet that no one wants to wait that long to print even if the document has that many pages.

Long story short, I have reduced the duration from 7 minutes to roughly 35 seconds. Though I’m very happy with the result, it required a large amount of refactoring to get to that point, and when a large amount of code changes, the chance of introducing regressions unfortunately goes up. So, please pay special attention to Calc’s pagination behavior and its handling of row heights, and if you notice any problems, I’d like to hear from you, preferably with a test document or two.

DataPilot field popup window

Last but not least, I’d like to mention this one. The DataPilot field popup window has been in the works for quite some time since 3.1. I have blogged about the initial version and the 2nd incarnation. Now the 3rd incarnation is on the horizon. As they say, a picture is worth a thousand words. So without further ado, let’s take a look at the screenshot:
dp-popup-window
This version has a “toggle all” check box to quickly turn on and off all field members, “select only current” button to only select currently selected member, and “unselect only current” button to select all but the current member. Also not visible on this screenshot is the support for Gnome accessibility framework, which is also new in this version.

So…

These are the highlights of some of the stuff I’ve been doing recently. There are more things on the horizon, so stay tuned.

Automatic decimal place adjustment by column width

Adjusting decimals by column width

Here is what I’m working on at the moment. I’m working on changing Calc’s behavior so that when a value is entered into a cell, and the cell width is not wide enough to show all its significant digits, it will truncate it to fit the available column width when the number format of that cell is General.

Let me demonstrate this using the value of PI entered into a cell. I have made the column wide enough to show all available significant digits of the PI value. This is what it looks like first:
auto-decimal-1

Then I’ve decided that the column is too wide for my liking, and dragged the column border to make it narrower:
auto-decimal-2
Notice that the displayed value now has less digits to fit the new column width. Now, I have decided to make the column even more narrow. See what happens when I do that:
auto-decimal-3
The cell now only displays “3.14”. But as I said, this automatic decimal place adjustment takes place only when the cell’s number format is General. If the number format specifies some fixed decimal places for that cell, Calc won’t adjust decimals automatically, and gladly displays “###” when the value doesn’t fit the current column width.

Default decimal places

Some of you may notice that, using the current version of Calc, a cell with the value of PI only shows 3.14, or typing any number into a cell only shows up to 2 decimal places unless you manually specify decimal places for that cell. That’s because Calc by default only shows 2 decimal places for cells with General number format. You can change that by increasing or decreasing the default number of decimal places in the Options dialog (in the Calculate page). However, that behavior is a bit confusing, especially when you type in a number such as 3.01234 and the cell only displays 3.01 even though there is enough space to show the whole value. That’s another thing I’m working on to change.

The new Calculate page now has an additional check box at the bottom. You can check or uncheck this check box to either limit the number of decimal places for cells with General number format, or leave it unlimited.
auto-decimal-option
What the default behavior should be is still under discussion, but I’m pretty sure that we will agree on leaving it unlimited by default.

HackWeek – Minor polish

As some of us already blogged, the last week was a Hack Week inside Novell, where we the Novell engineers are allowed to work on whatever project we are pleased to work on. Given the opportunity, I decided to work on some UI polish work for OOo that I had always wanted to work on but could not due to other priorities. These are the results of my Hack Week effort.

First, I wanted to implement animated border to outline copied ranges. Currently, copied ranges are outlined with static solid borders, but it was not always obvious to the users what those borders were for. Excel and Gnumeric, for instance, use animated dashed borders, which look more intuitive than static borders to depict copied ranges. Long story short, we now have animated dashed borders in Calc as well.

It’s not obvious in the above screenshot since it’s a static image, but trust me, it does animate. ;-) I consider this a natural extension of the previous work that Jon Pryor did for pasting on ENTER key.

The second work I did was to brush up the document modified status window, to display disk image to indicate whether the document is modified or not. Previously OOo displayed ‘*’ when the current document is modified, or none if it is not modified. I wanted to make it a little fancier so that it would catch more attention of the users. Anyway, here is the result.


This is what the status bar looks like when the document is modified. The image I used here is basically a reduced version of the save icon in Tango icon theme. However, I am not an artist, and I don’t consider this image to be a final version. So the final image is still subject to change without notice.


This is what the status bar looks like when the document is not modified. Basically a black & white version of the document-modified image, with some translucency applied.

That’s all the work I did during Hack Week. I couldn’t spend as much time as I would have liked since I still had to take care of other tasks even during Hack Week, but hopefully you guys like what I did.

CSV import enhancement

As I mentioned in my previous post, I had introduced two new HTML import options to control how numbers in cells are detected & converted during import. I had also hinted at the end of the post that I might add the same import options to the CSV import dialog. I’m writing this post to inform you guys that that’s exactly what I just did.

Here is the new CSV import options dialog:

which highlights the changes I’ve made. At the top of the dialog is the new Language list box, to select what language to use for the import. If it’s set to Default, it uses the language that OOo uses globally. Similar to the HTML import, this option affects how the numbers are parsed, based on the decimal and thousands separators for the selected language, as well as how the special numbers are detected & converted. I’ll talk more on the special number detection later.

In addition to the language option, located in the middle of the dialog are two check boxes to further control how cell values are to be interpreted.

  • Option Quoted field as text, when set, always imports quoted cell values as text, even if they are numbers. This option existed before this change, but was used only for the separator-based CSV imports. It is now used for the fixed width imports as well.
  • Option Detect special numbers, similar to the one in the HTML import option, controls whether or not to detect specially-formatted numbers, such as dates, scientific notations etc. When this option is set, Calc will try to detect special numbers and convert them into appropriate format. When this option is not set, Calc will only convert the simple decimal numbers. By default, this option is not set.

Anyway, that’s all there is to it. Hopefully this will solve, or at least make it easier to handle importing of CSV documents, which was previously hampered by Calc’s aggressive date detection and lack of support for alternative number separators. I doubt that this will make your CSV import experience a perfect one, but it will hopefully make it a much better one.

Multi-range selection copy & paste

New in Go-OO Calc is the ability to copy and paste multi-range selection, that is, you can select separate cell ranges that are not connected with each other, and copy and paste them in one single action. This was not possible previously; when you tried to copy multiple ranges, you would get an error message telling you that copying of multi-range selection is not supported (or something to that effect).

When pasting a multi-range selection, all the copied ranges get consolidated into a single range when pasted into a destination location. For example, copying this multi-range selection

and pasting it into another sheet will paste the data as follows

You can’t just copy a random set of ranges of varying sizes, however. Because the data gets pasted as a single range, all copied ranges must have either equal column size, or equal row size, or else you’ll be greeted with an error message.

Range selection and cursor

There is a slight tweak I’ve made in how Calc makes a range selection and how it places the cursor during selection. The old behavior was that the cursor always moves with the tip of the selection, as you can see in the following screenshot:

In the new selection behavior, the cursor stays at its original position while the selection is being made, as follows:

Aesthetically, the difference between the old and new behaviors is subtle, and probably is subject to personal preference. What makes a big difference is when editing multiple cells by first making a selection via shift-arrow, then editing them one by one within that selection by using the ENTER or TAB key to navigate through them. Or when copying & pasting a range of cells via keyboard, by first making a selection to copy, then moving the cursor to a destination cell to paste. Power users tend to do this, and they are being put off by how Calc places the cursor always at the “end” of the selection. In their mind, the current cursor placement is “wrong”.

Competitively speaking, the new behavior is what most other spreadsheet applications behave (Excel, Gnumeric and GoogleDoc, in particular). So, there is also a benefit in keeping it consistent with the rest of the spreadsheet applications, to make it easier to migrate power users to Calc.

HTML import option

What’s new?

I just checked in to the Go-OO master branch an enhancement to Calc’s HTML table import feature. With this enhancement, you can select a custom language to use when you import an HTML table, which affects how the numbers are interpreted during the import.

What’s this good for?

Each language adheres to a different set of conventions, such as date formats, thousands and decimal separators, and other special number formats. Previously, Calc simply used the system’s language setting when importing an HTML in order to decide whether a cell is a nubmer or a text, and if it’s a number, what numbering format to apply. Although you could change the system language in the global options dialog, that was not always convenient especially when all you need to do is quickly import an HTML table with different language conventions, and you don’t want to bother changing the global language setting (and changing it back after the import is done). That’s where this enhancement will come in.

What can you expect with this change?

Well, you’ll get the following dialog

when you are importing an HTML table, to either select the system’s language (Automatic), or a custom language from the list of available languages. Just remember that, if you don’t know or don’t care about what language to choose, select Automatic and move on. If you do care, then select Custom and pick a language from the list.

Special number detection (especially the dates)

There is an additional option below, to toggle automatic number format detection. With this option checked, Calc will try to automatically detect special number formats, such as dates, scientific notations etc. If unchecked, Calc will only detect the simplest numbers i.e. numbers that only consist of digits, decimal separators and thousands separators (and a sign if there is one), while all the other numbers are imported as text. By default, this option is NOT checked, which means Calc will only detect the simple numbers.

BTW, I’m thinking of adding a similar option to the csv import, where automatic date conversion has been quite problematic in the past and driving lots of users crazy. Adding this option also to the csv import would IMO make sense.

Well, this is it. It’s actually a pretty minor enhancement, but I hope someone will find it useful. :-)