Kennis Blogs File upload and character sets / encoding

File upload and character sets / encoding

Door Gert-Jan van de Streek / / 1 min

When working with file uploads from a browser it is good to realize that you don't know what is coming. Character set wise that is. You simply do not get a hint from your browser that says: here is a UTF-8 encoded Unicode text file. Or, beware, this document I am sending you now is created on a Windows machine, using the windows-1252 character set.

Why don't browsers do this? The answer is rather simple: they don't have a clue either. The file is read from disk and most file systems don't store meta information on character set or encoding.

How do we correctly deal with that? There is only 1 valid option. The person uploading the file must tell us what it is. If you have a form with a file upload, put a drop down next to it with a list of character sets and let the user indicate what he is sending. If it's a system sending in files via REST make sure you know what it sending, or give it a parameter to indicate the character set used.

That is the only solution that is 100% guaranteed. If you want to try something more advanced, look at IBM's icu project (Java / C/ C++). It has functionality that detects the charset or encoding of character data in an unknown format, but the results can not be guaranteed to always be correct.

| Software Development

Door Gert-Jan van de Streek / okt 2024

Vond je deze post leuk?

Dan denken we dat dit ook wat voor jou is.

Lees meer

Functional programming

Door Gert-Jan van de Streek / jan 2014 / 1 Min

Boilerplate code doesn't bother me that much [1/2]

Door Avisi / nov 2012 / 1 Min

Enable Langur JavaScript only when needed

Door Avisi / feb 2014 / 1 Min

IT Quality - A Question of Discipline(s)!

Door Barri Jansen / feb 2012 / 1 Min

Whiteboards make you think

Door Gert-Jan van de Streek / jan 2012 / 1 Min

Reverse staging

Door Gert-Jan van de Streek / mrt 2013 / 1 Min

Freedom of Choice

Door Barri Jansen / nov 2012 / 1 Min

That little extra in people

Door Gert-Jan van de Streek / okt 2012 / 1 Min

Improving our (Agile) process