We Can Build AI Systems, But Can We Handle Names With Umlauts?
My name includes three umlauts. It can be written correctly using six plus eleven unicode characters (first and last name). In the early 1990s, I never considered the idea of using my name as an identifier on a computer because most systems back then were still living in a US-ASCII world or the unfortunate ISO/IEC 8859 8-bit extensions, where the interpretation of the code points depends on the context.
Back in the 1990s, when I went to a bank to get a printout of my account status (yes, this is how banks worked back then; they had printers, and you had to visit them physically to check your account), I was always amazed at what came out of the printer. Often umlauts were replaced by curly braces and other fancy symbols. Back then, I used to be an optimist, and I believed that these problems would be overcome at the beginning of the 21st century. After all, the Unicode consortium was established in 1991, and Unicode 1.0 was released in the same year. In 1996, the initial specification of the UTF-8 transformation format was published as RFC 2044. So things looked promising. Sure, banks run mainframes, but give them 20 years and they may have made a transition to unicode.
So here I am in 2025. Most banks have turned into online businesses, and I have not used their printers for a long time (perhaps I should give them a try to see what they are up to meanwhile). The user-facing software provided by the banks appears to be doing well with my name. The banks are meanwhile also validating names against account numbers, and, though not perfect, this seems to work for me reasonably well. Apparently they have managed to transition into a unicode world (or their modern frontends manage to hide the true representation of names on their mainframes).
While this is positive news, I am facing problems in the cooperate world where some widely used commercial office software apparently still struggles with unicode and/or the length of my name. Well, perhaps it is not the software per se; perhaps it is the specific setup of the software where identifiers have to interoperate with other “legacy” systems. So I see umlaut characters replaced by other characters, and the result is at times truncated as if we were still living in the 1980s. And to top this, I am given several email addresses in different writing styles, some as aliases, with changing interpretations of which address serves as my primary identifier. This can confuse third-party services with single-sign-on solutions about my identity and the access control policies to apply. As a result, my name keeps IT professionals busy even 25 years into the 21st century.
But to close this note with a positive twist: I am celebrating today the publication of RFC 9911, which ships with my name written with proper umlauts. This shows that evolution of formats and internationalization is indeed possible; it just takes time. And often it takes more time than one would expect.