Excel workbook and worksheet protection

I didn’t mean to keep this a secret, but I’ve been working on two tasks since December to further enhance Calc’s interoperability with Excel. The first one is to support correct import of an Excel document that includes password-protected worksheets, and the second one is to support exporting of a Calc document into an encrypted Excel format with password protection.

Workbook and worksheet protection with password

Currently, when Calc imports an Excel document with password-protected sheet, it retains the protected state but discards the password. What I’m trying to do is to retain the password if the sheet is password-protected and also export the sheet password when that document is exported back into Excel.

The difficulty of retaining a sheet password from Excel all lies in finding out the correct password hashing function that Excel uses, then the rest will be easy. This is because when you type in your password to protect a worksheet, Excel doesn’t store the actual password text, but immediately hashes it and stores that hash value in the document instead of the original text (BTW, Calc does the same). So, if you don’t know the correct hashing algorithm that generates an identical hash value given the same password, all bets are off.

As I covered in my previous blog post, however, this has already been taken care of. On top of that, Excel uses the same hashing function for both worksheet and workbook protections, so we can take care of both types of protections with just this one hashing function mentioned in my previous blog.

There is one important thing to note here. This worksheet-level protection is primarily for preventing accidental editing of protected worksheet during run-time, and this still does not prevent the content of such “protected” worksheet from being modified by directly altering the content of the file itself. This is because the content of the protected worksheet is not encrypted inside the file, so this level of password “protection” can be easily circumvented.

Workbook-level protection with password

Now, Excel also offers a workbook-level protection with password that actually involves encryption of document’s content, and that’s the second task I’ve been working on. There are two instances where Excel encrypts the workbook content:

  1. when you protect workbook’s structure (you can either protect the workbook structure, windows or both for workbook-level protection) or
  2. when you set a password from the Save As dialog (click on Tools near the upper-right corner and select General Options).

Currently, Calc can open an encrypted Excel document, but cannot save it back with the content still encrypted. So, when you open such a document and save it, it loses its content encryption (though a warning dialog is displayed before it writes the content back to the file unencrypted). What I’m trying to do is to allow the user to save the content of an already encrypted document back encrypted, or save a new Calc document to an encrypted Excel document with password protection.

This work is still on-going, but I’m pretty confident that the hardest part is already over. What makes this task really hard is the fact that Excel doesn’t encrypt the entire file stream, but encrypts only the record contents and leaves the record headers unencrypted. Also, certain records, and even parts of certain records, are not encrypted. If the entire file stream was encrypted, it would be relatively simpler to put an encryption layer on top of the normal record stream. But because some parts of the stream are not encrypted, it is necessary to check every single record that we write to make sure that we are encrypting only the right parts of the stream, or else Excel will not load the document at all. But like I said, the worst is over. I can now correctly generate an encrypted Excel file with various content types: even those documents with form controls, charts, cell comments, autoshapes, and OLE objects.

One interesting trivia. The encryption algorithm that Excel uses to encrypt a document requires a password, but Excel allows document encryption without a password. So, what does Excel do if no password is given? It uses the default password VelvetSweatshop. :-)

Anyway, both of these tasks are on-going in the scsheetprotection01 CWS. The original IZ issues are i60305 and i84766. There is still some work ahead before I can get this CWS integrated, but I’m making progress. Thank you, ladies and gentlemen.

Drilling down data in Data Pilot

Ok. Here is the feature I’ve been working on while I was in Barcelona, and I think it’s got to a point where I can show some progress to the world.

This feature is about supporting a drill-down of a data field cell in Calc’s Data Pilot feature. Excel already has an equivalent feature in its Pivot Table functionality, and what I’m doing initially came out of a request from current and former Excel users who rely on that feature.

Here is what this feature does. Let’s say you have a data pilot box somewhere in your Calc document like the one below.

data pilot view

You then decide to double-click on one cell within the data field to see the original data rows that comprise that cell. Calc then inserts a new sheet, containing the appropriate subset of the original data table (see below).

drill down data on new sheet

At this point, it works pretty reliably for me, but that’s not to say that it’s ready to be integrated into the main branch. There are still a few things to take care of.

  • The drill down sheet is not decorated with auto formatting. Excel formats the drill down sheet nicely, but currently it’s not implemented. But I have to wonder whether that is really essential (I think not).
  • Currently this feature works only on internal data source i.e. data source within the same document. But Calc’s Data Pilot supports two other data source backends (registered database and data pilot UNO component), and I need to add support for them too.

Once that is done, I’ll be happy to see this piece integrated upstream (under JCA, of course), since this feature is also requested in Issue 57030. While I’m in the data pilot code, it may be a good idea to review other data pilot related issues (if there is any) and tackle them at the same time. If you know of any hot data pilot related issues, please let me know.

Extending Calc’s autofilter

As I work on implementing the OOXML import filter for Calc, I notice quite a few features that are in Excel 2007 but are not in Calc. While not all of them deserve special attention, one particular feature has caught my eye, which is the ability to filter rows based on a set of multiple string values instead of just one.

To see the benefit of this feature, let’s take a look at the current autofilter implementation in Calc as of OO.o 2.1.

Current autofilter implementation (2.1)

As you can see, if you want to filter by the cell content, you can only specify one value. If you want to specify “show either Bruce or David”, you will have to use the Standard filter and use this regular expression ^(Bruce|David)$ to accomplish the effect. Alternatively, if the filtering criteria involves only one column field, you could use two filter conditions, each one specifying textual equality to one text value, and connect them with OR, but this still will not work if more than one fields are involved because Calc doesn’t allow nested AND/OR’s between filter conditions.

Excel 2007 does this quite nicely. It allows a user to specify multiple text values in a single filtering criteria by presenting a list of check boxes like this:

Multi-string autofilter in Excel 2007

This allows a user to quickly filter his/her data, without resorting to something more complex, like regular expressions.

I believe our OO.o users will benefit enormously if we implement something similar in Calc, and I’ve already started some work toward implementing this. But to implement this feature in Calc requires a change in the ODF file format specification. The ODF spec, as of version 1.1, does not allow a clean storage of multiple text values in a single filter condition. An effort is on-going, however, to change the ODF spec in order to accommodate this feature, so there is hope.