UnicodeReader and UnicodeInputStream

UTF-BOM reference

JDK Bug 4508058

Java InpuStreamReader supports BOM marker for UTF-16 files. But for some reason it does not recognize UTF-8 BOM marks. This is an unfortunate for all Windows (>win2k) users if textfiles are saved with Notepad using UTF-8 format. Notepad adds a BOM bytes at the start of file, but Java's InputStreamReader does not skip it.

UnicodeInputStream.java class helps you to autorecognize and skip BOMs. This will support UTF-8 as well.

UnicodeReader.java class will do everything ever more transparently. Just instantiate it and read text.

Java UTF8_with_BOM XML example
Java UTF8_with_BOM
See this little example page and application how to handle unicode text files properly. If you use utf8 with bom marker format for text files, you may want to use UnicodeReader helper class.

Velocity's new FileResourceLoader
Velocity website
Apache Velocity is a nice template engine with a flexible ResourceLoader interface. Unfortunately it still want InputStream being passed back to the engine, althought Reader would be more clean in Java2 environment.

But anyway, here is a tweak to support Win2K Notepad's UTF-8 textfiles without having an extra ? character at the start of output. You wont see it if generate "text/html" files as webbrowsers will skip bom.

But try generating a "text/plain" texts and will have ? char at the start.

Problem is not Velocity, but Java core unable to support BOM marker in UTF-8 files. InputStreamReader(in, enc) is a nice streamreader, but lack of UTF-8BOM support is still a missing part. It does support BOMs in UTF-16 unicode files.

See attachment (full source + test program), where you should see a new FileResourceLoader implementation for Velocity. Give it new "unicode=true" parameter in velocity.properties file and it will skip known unicode bom markers.

I compiled Velocity sources from the trunk release, so jar is v1.6.

Updated: 2007-02-10 / updated velocity example
Updated: 2007-02-04 / velocity 1.5 info
Updated: 2007-01-25 / fixed BOM ordering in stream/reader