Posts Tagged ‘excel’

mso-dumper now packaged in OBS

March 20th, 2010

I’m happy to announce that the mso-dumper tool is now packaged in the openSUSE build service under my home repository. This tool is written in Python, and allows you to dump the contents of MS Office documents stored in the BIFF-structured binary file format in a more human readable fashion. It is an indispensable tool when dealing with importing from and/or exporting to the Office documents. Right now, only Excel and Power Point formats are supported.

This package provides two new commands xls-dump and ppt-dump. If you wish to dump the content of an Excel document, all you have to do is

xls-dump ./path/to/mydoc.xls

and it dumps its content to standard output. What the output looks like depends on what’s stored with the document, but it will look something like this:

...
0085h: =============================================================
0085h: BOUNDSHEET - Sheet Information (0085h)
0085h:   size = 14
0085h: -------------------------------------------------------------
0085h: B4 09 00 00 00 00 06 00 53 68 65 65 74 31
0085h: -------------------------------------------------------------
0085h: BOF position in this stream: 2484
0085h: sheet name: Sheet1
0085h: hidden state: visible
0085h: sheet type: worksheet or dialog sheet

008Ch: =============================================================
008Ch: COUNTRY - Default Country and WIN.INI Country (008Ch)
008Ch:   size = 4
008Ch: -------------------------------------------------------------
008Ch: 01 00 01 00 

00EBh: =============================================================
00EBh: MSODRAWINGGROUP - Microsoft Office Drawing Group (00EBh)
00EBh:   size = 90
00EBh: -------------------------------------------------------------
00EBh: 0F 00 00 F0 52 00 00 00 00 00 06 F0 18 00 00 00
00EBh: 02 04 00 00 02 00 00 00 02 00 00 00 01 00 00 00
00EBh: 01 00 00 00 02 00 00 00 33 00 0B F0 12 00 00 00
00EBh: BF 00 08 00 08 00 81 01 09 00 00 08 C0 01 40 00
00EBh: 00 08 40 00 1E F1 10 00 00 00 0D 00 00 08 0C 00
00EBh: 00 08 17 00 00 08 F7 00 00 10 

00FCh: =============================================================
00FCh: SST - Shared String Table (00FCh)
00FCh:   size = 8
00FCh: -------------------------------------------------------------
00FCh: 00 00 00 00 00 00 00 00
00FCh: -------------------------------------------------------------
00FCh: total number of references: 0
00FCh: total number of unique strings: 0
...

I have originally written this tool to deal with the Excel import and export part of Calc’s development, and continue to develop it further. Thorsten Behrens has later joined forces and added support for the Power Point format. Right now, I’m working on adding an XML output format option to make it easier to compare outputs, which is important for regression testing.

ODF Support in MS Office 2007

April 29th, 2009

I guess it’s all over the news right now, that the latest service pack (SP2) for MS Office 2007 will enable Office to import and export ODF natively. This blog piece by Doug Mahugh touches on the word processor part of their ODF support. I haven’t yet tried it myself, but judging by Doug’s blog article it looks pretty impressive.

But, being more of a spreadsheet person, I’m personally more interested in how it fares in Excel to Calc interoperability. Since their ODF support is on ODF 1.1, which predates the on-going OpenFormula specification work, I’d be interested to see how compatible the formulas are. Technically speaking, as of ODF 1.1, interpreting formula expressions was pretty much application-specific, so I would not be surprised even if they are not compatible at all. But we’ll just have to see.

Either way, I find this news very encouraging. This is undoubtedly a big step toward proliferation of ODF as a practical document exchange format.

Encrypting an Excel document with password from Calc

July 21st, 2008

The feature of exporting an Excel document encrypted with a password has landed! Now you can save a spreadsheet document as an Excel document with password protection and open it in Excel with that password. The proof is in the following screenshot (the colored emphasis is mine):

Saving a Calc document as Excel with password protection (with encryption)

This feature is now available in the go-oo version of OOo. This also means that, when you edit & save an existing Excel document that has already been encrypted, it will get saved encrypted. Previously when you did this, Calc would save it un-encrypted and you would lose your password protection, which was not good and was a deal breaker for a certain segment of users.

The upstream effort of this feature is underway in the scsheetprotection02 CWS. That CWS also contains another enhancement for sheet and document structure protection (sounds similar but totally unrelated to this document encryption feature), and as soon as I take care of upstreaming that enhancement, I can push the whole CWS for upstream integration. The bad news is that, because that requires a change in the ODF file format & UI change, it will probably take some time before it can be integrated upstream. But I’m making slow but steady progress there, so I’ll keep you guys updated.

Meanwhile, please test the go-oo version for this encryption feature to see if there is any document that Calc fails to encrypt properly (that means Excel can’t open it), and report any bugs to us. I’ve done my own testing, but it’s never as good as many other users testing it. So, thanks!

P.S. Actually this feature has been available for at least two months in the 3.0 branch, but I wanted to see this bug fixed before writing a blog about this so that I could take a pretty screenshot with all texts displayed.

New sheet protection dialog

May 8th, 2008

I’ve just finished designing a new dialog for Calc’s sheet protection functionality to allow optional sheet protection options. This was actually my first time designing a dialog from scratch instead of modifying an existing one, so I had to dig around and figure out how to add a dialog. It turns out that it is actually very simple once you know what to do. After several hours of creative designing process, I’ve come up with something I can show to people. So here it is:

sheet protection dialog screenshot

One thing to note: obviously this dialog is inspired by the similar functionality offered by Excel, and Excel provides many more options for sheet protection than just the two I’m showing here. The reason I only have two at the moment is because I’ve only implemented support for those two options in Calc core. When we support more options in the core, we can easily add them to the dialog.

This work is on-going in scsheetprotection02 CWS. Aside from the new dialog and sheet protection options, this CWS contains my other work on the binary Excel export encryption as well as sheet and document password interoperability between Excel and Calc. I’m trying to wrap this up, so hopefully I can come up with something that people can try out soon.

Excel workbook and worksheet protection

February 3rd, 2008

I didn’t mean to keep this a secret, but I’ve been working on two tasks since December to further enhance Calc’s interoperability with Excel. The first one is to support correct import of an Excel document that includes password-protected worksheets, and the second one is to support exporting of a Calc document into an encrypted Excel format with password protection.

Workbook and worksheet protection with password

Currently, when Calc imports an Excel document with password-protected sheet, it retains the protected state but discards the password. What I’m trying to do is to retain the password if the sheet is password-protected and also export the sheet password when that document is exported back into Excel.

The difficulty of retaining a sheet password from Excel all lies in finding out the correct password hashing function that Excel uses, then the rest will be easy. This is because when you type in your password to protect a worksheet, Excel doesn’t store the actual password text, but immediately hashes it and stores that hash value in the document instead of the original text (BTW, Calc does the same). So, if you don’t know the correct hashing algorithm that generates an identical hash value given the same password, all bets are off.

As I covered in my previous blog post, however, this has already been taken care of. On top of that, Excel uses the same hashing function for both worksheet and workbook protections, so we can take care of both types of protections with just this one hashing function mentioned in my previous blog.

There is one important thing to note here. This worksheet-level protection is primarily for preventing accidental editing of protected worksheet during run-time, and this still does not prevent the content of such “protected” worksheet from being modified by directly altering the content of the file itself. This is because the content of the protected worksheet is not encrypted inside the file, so this level of password “protection” can be easily circumvented.

Workbook-level protection with password

Now, Excel also offers a workbook-level protection with password that actually involves encryption of document’s content, and that’s the second task I’ve been working on. There are two instances where Excel encrypts the workbook content:

  1. when you protect workbook’s structure (you can either protect the workbook structure, windows or both for workbook-level protection) or
  2. when you set a password from the Save As dialog (click on Tools near the upper-right corner and select General Options).

Currently, Calc can open an encrypted Excel document, but cannot save it back with the content still encrypted. So, when you open such a document and save it, it loses its content encryption (though a warning dialog is displayed before it writes the content back to the file unencrypted). What I’m trying to do is to allow the user to save the content of an already encrypted document back encrypted, or save a new Calc document to an encrypted Excel document with password protection.

This work is still on-going, but I’m pretty confident that the hardest part is already over. What makes this task really hard is the fact that Excel doesn’t encrypt the entire file stream, but encrypts only the record contents and leaves the record headers unencrypted. Also, certain records, and even parts of certain records, are not encrypted. If the entire file stream was encrypted, it would be relatively simpler to put an encryption layer on top of the normal record stream. But because some parts of the stream are not encrypted, it is necessary to check every single record that we write to make sure that we are encrypting only the right parts of the stream, or else Excel will not load the document at all. But like I said, the worst is over. I can now correctly generate an encrypted Excel file with various content types: even those documents with form controls, charts, cell comments, autoshapes, and OLE objects.

One interesting trivia. The encryption algorithm that Excel uses to encrypt a document requires a password, but Excel allows document encryption without a password. So, what does Excel do if no password is given? It uses the default password VelvetSweatshop. :-)

Anyway, both of these tasks are on-going in the scsheetprotection01 CWS. The original IZ issues are i60305 and i84766. There is still some work ahead before I can get this CWS integrated, but I’m making progress. Thank you, ladies and gentlemen.

Excel sheet protection password hash

January 18th, 2008

When you protect either your workbook or one of your worksheets with a password in Excel, Excel internally generates a 16-bit hash of your password and stores it instead of the original password text. The hashing algorithm used for that was previously unknown, but thanks to the infamous Office Open XML specification it is now documented for the world to see (take a look at Part 4, Section 3.3.1.81 – Sheet Protection Options for the details). Thankfully, the algorithm is identical for all recent versions of Excel including XP, 2003 and 2007, so you can simply reuse the documented algorithm for the older versions of Excel too.

But alas! the documented algorithm is incorrect; it does not produce correct hash values. Being determined to find out the correct algorithm, however, I started to analyze the hashes that the documented algorithm produces, and compare them with the real hash values that Excel generates, in order to decipher the correct algorithm.

In the end, the documented algorithm was, although not accurate, pretty close enough that I was able to make a few changes and derive the algorithm that generates correct values. The following code:

#include <stdio.h>
 
using namespace std;
 
typedef unsigned char sal_uInt8;
typedef unsigned short sal_uInt16;
 
sal_uInt16 getPasswordHash(const char* szPassword)
{
    sal_uInt16 cchPassword = strlen(szPassword);
    sal_uInt16 wPasswordHash = 0;
    if (!cchPassword)
        return wPasswordHash;
 
    const char* pch = &szPassword[cchPassword];
    while (pch-- != szPassword)
    {
        wPasswordHash = ((wPasswordHash >> 14) & 0x01) | 
                        ((wPasswordHash << 1) & 0x7fff);
        wPasswordHash ^= *pch;
    }
 
    wPasswordHash = ((wPasswordHash >> 14) & 0x01) | 
                    ((wPasswordHash << 1) & 0x7fff);
 
    wPasswordHash ^= (0x8000 | ('N' << 8) | 'K');
    wPasswordHash ^= cchPassword;
 
    return wPasswordHash;
}
 
int main (int argc, char** argv)
{
    if (argc < 2)
        exit(1);
 
    printf("input password = %s\n", argv[1]);
    sal_uInt16 hash = getPasswordHash(argv[1]);
    printf("hash = %4.4X\n", hash);
 
    return 0;
}

produces the right hash value from an arbitrary password. One caveat: this algorithm takes an 8-bit char array, so if the input value consists of 16-bit unicode characters, it needs to be first converted into 8-bit character array. The conversion algorithm is also documented in the OOXML specification. I have not tested it yet, but I hope that algorithm is correct. ;-)

Importing Excel 2007 files

June 20th, 2007

The Excel 2007 filter for Calc is still on its way, but perhaps now is a good time to show the progress of this new Excel 2007 import filter work by Daniel Rentz and myself.

Here is a screenshot of a file created by Excel 2007 (left), and one for the same file opened in Calc (right).

Excel 2007 screenshot Excel 2007 file opened in Calc

There are still a lot to be done, however. Formula import is still to be completed, which blocks other features that rely on the formula parser. Charts, text boxes, and other graphic objects are still not imported yet. There is also a performance issue of a large xlsx file import, which needs to be addressed at some point.

But all in all, things are coming along very nicely.