Problema de codificación entre el servidor TCP C # y el cliente TCP Java

i'm facing some encoding issue which i'm not able to find the correct solution.

I have a C# TCP server, running as a window service which received and respond XML, the problem comes down when passing special characters in the output such as spanish characters with accents (like á,é,í and others).

Server response is being encoded as UTF-8, and java client is reading using UTF-8. But when i print its output the character is totally different.

This problem only happens in Java client(C# TCP client works as expected).

Following is an snippet of the server code that shows the encoding issue: C# Server

   byte[] destBytes = System.Text.Encoding.UTF8.GetBytes("á");
    try
    {
       clientStream.Write(destBytes, 0, destBytes.Length);
       clientStream.Flush();
    }catch (Exception ex)
    {
       LogErrorMessage("Error en SendResponseToClient: Detalle::", ex);
    }

Cliente Java:

socket.connect(new InetSocketAddress(param.getServerIp(), param.getPort()), 20000);
InputStream sockInp = socket.getInputStream();
InputStreamReader streamReader = new InputStreamReader(sockInp, Charset.forName("UTF-8"));
sockReader =  new BufferedReader(streamReader);
String tmp = null;
while((tmp = sockReader.readLine()) != null){
  System.out.println(tmp);
}

For this simple test, the output show is:

ß

I did some testing printing out the byte[] on each language and while on C# á output as: 195, 161

In java byte[] read print as: -61,-95

Will this have to do with the Signed (java), UnSigned (C#) of byte type?.

Cualquier comentario es muy apreciado.

preguntado el 28 de agosto de 11 a las 00:08

Not an answer, but a datapoint anyways - python does decode the C# version as you intended: print ''.join(chr(x) for x in [195, 161]).decode('utf-8') -> á. The java's one is not a valid utf-8 apparently if I try to preserve that order. -

Thanks, i'm still experimenting. (no luck so far). -

i made a mistake in the aboves example (i already edit it), In java byte[] print as: -61,-95. Which is a valid UTF8 character. The problem seems to lies in the OS (window) itself. I dont know what weird settings it haves that prints the wrong character. -

2 Respuestas

To me this seems like an endianess problem... you can check that by reversing the bytes in Java before printing the string...

which usually would be solved by including a BOM... see http://de.wikipedia.org/wiki/Byte_Order_Mark

Respondido 28 ago 11, 04:08

Im under the same impression, after reading how about Endian in C# and Java. - jcgarciam

If it's utf-8, then BOM is not needed and will not change anything. utf-8 encoding always has the same representation - on little and big endian machines. (unicode.org/faq/utf_bom.html#bom5) - viraptor

I think the problem may be in SO where the server is running, creating a simple java programa that should print -> á and running it there is printing the weird character as well, while on other OS (linux) it prints correctly the expected character. So i just discarded the Socket and encoding from End To End. - jcgarciam

if the OS has some weird settings this could happen :-( - Yahia

Any suggestion where should i look at in the OS setting? Regional Settings? - jcgarciam

Are you sure that's not a unicode character you are attemping to encode to bytes as UTF-8 data?

I found the below has a useful way of testing to see if the data in that string is ccorrect UTF-8 before you send it.

Cómo probar una aplicación para la codificación correcta (por ejemplo, UTF-8)

contestado el 23 de mayo de 17 a las 15:05

Im not quite understanding your statement. From my above example im getting the UTF-8 byte[] of just á to test the the encoding. - jcgarciam

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.