Excel sheet protection password hash

When you protect either your workbook or one of your worksheets with a password in Excel, Excel internally generates a 16-bit hash of your password and stores it instead of the original password text. The hashing algorithm used for that was previously unknown, but thanks to the infamous Office Open XML specification it is now documented for the world to see (take a look at Part 4, Section 3.3.1.81 – Sheet Protection Options for the details). Thankfully, the algorithm is identical for all recent versions of Excel including XP, 2003 and 2007, so you can simply reuse the documented algorithm for the older versions of Excel too.

But alas! the documented algorithm is incorrect; it does not produce correct hash values. Being determined to find out the correct algorithm, however, I started to analyze the hashes that the documented algorithm produces, and compare them with the real hash values that Excel generates, in order to decipher the correct algorithm.

In the end, the documented algorithm was, although not accurate, pretty close enough that I was able to make a few changes and derive the algorithm that generates correct values. The following code:

#include <stdio.h>
 
using namespace std;
 
typedef unsigned char sal_uInt8;
typedef unsigned short sal_uInt16;
 
sal_uInt16 getPasswordHash(const char* szPassword)
{
    sal_uInt16 cchPassword = strlen(szPassword);
    sal_uInt16 wPasswordHash = 0;
    if (!cchPassword)
        return wPasswordHash;
 
    const char* pch = &szPassword[cchPassword];
    while (pch-- != szPassword)
    {
        wPasswordHash = ((wPasswordHash >> 14) & 0x01) | 
                        ((wPasswordHash << 1) & 0x7fff);
        wPasswordHash ^= *pch;
    }
 
    wPasswordHash = ((wPasswordHash >> 14) & 0x01) | 
                    ((wPasswordHash << 1) & 0x7fff);
 
    wPasswordHash ^= (0x8000 | ('N' << 8) | 'K');
    wPasswordHash ^= cchPassword;
 
    return wPasswordHash;
}
 
int main (int argc, char** argv)
{
    if (argc < 2)
        exit(1);
 
    printf("input password = %s\n", argv[1]);
    sal_uInt16 hash = getPasswordHash(argv[1]);
    printf("hash = %4.4X\n", hash);
 
    return 0;
}

produces the right hash value from an arbitrary password. One caveat: this algorithm takes an 8-bit char array, so if the input value consists of 16-bit unicode characters, it needs to be first converted into 8-bit character array. The conversion algorithm is also documented in the OOXML specification. I have not tested it yet, but I hope that algorithm is correct. ;-)

8 thoughts on “Excel sheet protection password hash”

  1. Can you submit a “bug” against the standard? Let the standards people know MS submitted the wrong spec?

  2. Hi Daniel,

    Yeah, I’ve noticed it in your doc after I came up with it mine. Good to know that my algorithm is in fact the correct one since it’s identical to your pseudo-code. ;-)

    BTW, how did you come up with yours?

  3. Hi there – I work for MS on Excel, just to put my cards on the table. The above PDF link didn’t work for me, but there’s also a copy of the correct algorithm on Wouter van Vugt’s page (http://blogs.infosupport.com/wouterv/archive/2006/11/21/Hashing-password-for-use-in-SpreadsheetML.aspx). There’s actually a small bug in the one Kohei posted here – the XOR with *pch after the while() loop is a buffer underrun. By the time the while() loop finishes, pch points to the character before the first character of the password, which is an unknown area of memory. If it contains zero then this algorithm will work (because the XOR doesn’t do anything) but if it contains something else it won’t.

  4. Hi Chris, thanks for the info about Wouter’s blog post, and the bug fix in my code. ;-) BTW, the above PDF link should work now; the comma at the end of the URL was causing the link from working.

    I just edited my blog entry to remove my bug. Better do it sooner before people start to see my code, or worse, use it in their program. ;-)

  5. @Mattew,

    Well, I’m not really qualified to comment on the quality of hashing algorithms since it’s not my specialty. But since the length of the hash value is only 16-bit, the chance of two different passwords generating an identical hash value is higher than one would normally desire when using it for document security such as encryption. That said, this level of security may be just adequate for the purpose it is being used in Excel, since Excel’s sheet protection does not in fact protect the content of the sheet but simply prevent accidental editing of the content.

    That’s probably all I would have to say about the quality of this hashing algorithm.

Comments are closed.