Unicode and Lisp

I finally managed to prove to myself that Lisp (SBCL anyway) supports Unicode. It took some persevarence, especially with a borked ebuild for Emacs that failed to install LEIM. This is how I managed to demonstrate Unicode in the Lisp REPL within Slime.

Using Unicode in the Lisp REPL requires a couple of things to start. A Unicode aware Lisp implementation such as Clisp or SBCL and Emacs version 21.4 or greater. I won't go into setting those up here, but in Gentoo you'll need to be sure SBCL is emerged with the "unicode" USE flag, and Emacs with the "leim" use flag and possibly some tweaking to the ebuild.

To get started you need to tell Emacs and Slime that you want to use UTF-8. For Emacs this can be done in the Options > MULE > Set Language Environment menu or with the command set-language-environment. utf-8 is a good value.

You'll also need Slime. To get Slime setup for Unicode you need to set its slime-net-coding-system Emacs variable. This can be done in either your .emacs file or in the *scratch* buffer. Just evaluate or add:

(setq slime-net-coding-system 'utf-8-unix)

The utf-8-unix is the coding system Slime uses for Unicode. The Emacs variable slime-net-valid-coding-systems lists other valid options for Slime's coding system, but for Unicode the above value is what you want.

Once Emacs and Slime know that you want stuff done in Unicode you can start Slime up. Now you'll be wondering if anything has changed. Now for the fun.

Getting Unicode characters entered than Slime is a real PITA. Slime choked on the characters I punched in with the Japanese input method. Even pasting from some applications only produced a question mark. Firefox is friendly though. You can copy a Unicode string and in Emacs it'll insert the hexdecimal value of the characters prepended by \u. That comes in handy, but it could be better.

So to get your first sign of Unicode support you can copy and paste it into Emacs. You should see \u2211. To get that printed evaluate:

(code-char #x2211)
; => #\∑

You may only see a box though, but do take note of the #x instead of the \u. That tells Lisp that it's a hexadecimal number. If you only see a box then your selected font can't render that character. You can try another one if you want like ɥ which has a hexadecimal value of 0x0265.

If you're using CLisp then your output might be different. CLisp is a bit better than SBCL about Unicode. You can specify the Unicode character above by using either #\u2211 or #\N-ARY_SUMMATION. The latter may also be what CLisp prints out too. If that's the case, then you'll have to coerce it into a string before going onto the next step because you'll need an actual character to copy and paste, and actual characters don't work.

(coerce '(#\u2211) 'string)

Now we can define the function with some copy and paste. You'll need to copy the character after #\ in SBCL or the one between the quotes in CLisp to create:

(defun ∑ (list)
   (if list
       (+ (first list) (∑ (rest list)))
       0))
; => ∑
(∑ '(1 2 3 4 5))
; => 15

So that's the pain that is Emacs, Lisp, and Unicode. Perhaps in the future this will be easier and we can just punch in a bunch of Japanese and math characters to create some funky programs.

Update (2006-08-16): An article describing how to convert between character encodings and such is available on CLiki titled CloserLookAtCharacters.

Ad's by Google