JDK Bug 4508058
Java InpuStreamReader supports BOM marker for UTF-16 files. But for some reason it does not recognize UTF-8 BOM marks. This is an unfortunate for all Windows (>win2k) users if textfiles are saved with Notepad using UTF-8 format. Notepad adds a BOM bytes at the start of file, but Java's InputStreamReader does not skip it.
UnicodeInputStream.java class helps you to autorecognize and skip BOMs. This will support UTF-8 as well.
UnicodeReader.java class will do everything ever more transparently. Just instantiate it and read text.
Velocity's new FileResourceLoader
Apache Velocity is a nice template engine with a flexible ResourceLoader interface. Unfortunately it still want InputStream being passed back to the engine, althought Reader would be more clean in Java2 environment.
But anyway, here is a tweak to support Win2K Notepad's UTF-8 textfiles without having an extra ? character at the start of output. You wont see it if generate "text/html" files as webbrowsers will skip bom.
But try generating a "text/plain" texts and will have ? char at the start.
Problem is not Velocity, but Java core unable to support BOM marker in UTF-8 files. InputStreamReader(in, enc) is a nice streamreader, but lack of UTF-8BOM support is still a missing part. It does support BOMs in UTF-16 unicode files.
See attachment (full source + test program), where you should see a new FileResourceLoader implementation for Velocity. Give it new "unicode=true" parameter in velocity.properties file and it will skip known unicode bom markers.
I compiled Velocity sources from the trunk release, so jar is v1.6.