UTF-BOM reference
JDK Bug 4508058
Java InpuStreamReader supports BOM marker for UTF-16 files. But for some
reason it does not recognize UTF-8 BOM marks. This is an unfortunate for all
Windows (>win2k) users if textfiles are saved with Notepad using UTF-8 format.
Notepad adds a BOM bytes at the start of file, but Java's InputStreamReader
does not skip it.
UnicodeInputStream.java class helps you to autorecognize and skip BOMs. This will support UTF-8 as well.
UnicodeReader.java class will do everything ever more transparently. Just instantiate it and read text.
Velocity's new FileResourceLoader
Velocity website
Apache Velocity is a nice template engine with a flexible ResourceLoader interface.
Unfortunately it still want InputStream being passed back to the engine, althought
Reader would be more clean in Java2 environment.
But anyway, here is a tweak to support Win2K Notepad's UTF-8 textfiles without
having an extra ? character at the start of output. You wont see it if generate
"text/html" files as webbrowsers will skip bom.
But try generating a "text/plain" texts and will have ? char at the start.
Problem is not Velocity, but Java core unable to support BOM marker in UTF-8 files.
InputStreamReader(in, enc) is a nice streamreader, but lack of UTF-8BOM support
is still a missing part. It does support BOMs in UTF-16 unicode files.
See attachment (full source + test program), where you should see a new
FileResourceLoader implementation for Velocity. Give it new "unicode=true" parameter
in velocity.properties file and it will skip known unicode bom markers.
UnicodeLoaderForVelocity.zip
I compiled Velocity sources from the trunk release, so jar is v1.6.