Tuesday, January 13, 2009

Unicode newline character in Java string

The other day I was trying to represent a String in unicode characters.

String s = new String("\u0041 \u000A");

What I wanted was this "A \n", and instead, what I got was a COMPILE ERROR

String literal is not properly closed by a double-quote

What the hell! I have represented characters as unicode earlier in my Java code. So what was wrong here. It seems the compiler did not like the unicode newline character I had added. Here's why...

The compiler translates unicode characters at the beginning of the compile cycle. Which means the above source first gets converted to

String s = new String("\u0041


before compilation starts. Now it is quite obvious why compilation would fail. Check out section 3.2 on Lexical Translations to understand what exactly happens in the translation phase of lexical analysis.

You might also enjoy reading this issue of the Java Specialists newsletter.

If you trying to represent newline or carraige return characters as unicode in your Strings, don't bother. It will not work. Use "\n" and "\r" instead.


mik said...

thanks for the post -- it resolved my problem

Anonymous said...

This is exactly what I was looking for! Thanks!!

Luke Hutchison said...

Just ran into this issue myself. This is so frigging ridiculous.