UnicodeReader and UnicodeInputStream
JDK Bug 4508058
Java InpuStreamReader will support BOM mark for UTF-16 files. But for some
reason it does not recognize UTF-8 BOM marks. This is very unfortunate all
Windows (>win2k) users if textfiles are saved with Notepad using UTF-8 format.
Notepad will add BOM bytes at the start of file, but Java's InputStreamReader
does not skip it.
UnicodeInputStream.java class will
help you to autorecognize and skip BOMs. This will support UTF-8 as well.
UnicodeReader.java class will
do everything ever more transparently. Just instantiate it and read text.
Java UTF8_with_BOM XML example
See this little example page and application how to handle unicode text files properly.
If you use utf8 with bom marker format for text files, you may want to use UnicodeReader helper class.
Velocity's new FileResourceLoader
Apache Velocity is a nice template engine with a flexible ResourceLoader interface.
Unfortunately it still want InputStream being passed back to the engine, althought
Reader would be more clean in Java2 environment.
But anyway, here is a tweak to support Win2K Notepad's UTF-8 textfiles without
having an extra ? character at the start of output. You wont see it if generate
"text/html" files as webbrowsers will skip bom.
But try generating a "text/plain" texts and will have ? char at the start.
Problem is not Velocity, but Java core unable to support BOM marker in UTF-8 files.
InputStreamReader(in, enc) is a nice streamreader, but lack of UTF-8BOM support
is still a missing part. It does support BOMs in UTF-16 unicode files.
See attachment (full source + test program), where you should see a new
FileResourceLoader implementation for Velocity. Give it new "unicode=true" parameter
in velocity.properties file and it will skip known unicode bom markers.
I compiled Velocity sources from the trunk release, so jar is v1.6.
Updated: 2007-02-10 / updated velocity example
Updated: 2007-02-04 / velocity 1.5 info
Updated: 2007-01-25 / fixed BOM ordering in stream/reader