Kennis Blogs File upload and character sets / encoding

File upload and character sets / encoding

Door Gert-Jan van de Streek / / 1 min

When working with file uploads from a browser it is good to realize that you don't know what is coming. Character set wise that is. You simply do not get a hint from your browser that says: here is a UTF-8 encoded Unicode text file. Or, beware, this document I am sending you now is created on a Windows machine, using the windows-1252 character set.

Why don't browsers do this? The answer is rather simple: they don't have a clue either. The file is read from disk and most file systems don't store meta information on character set or encoding.

How do we correctly deal with that? There is only 1 valid option. The person uploading the file must tell us what it is. If you have a form with a file upload, put a drop down next to it with a list of character sets and let the user indicate what he is sending. If it's a system sending in files via REST make sure you know what it sending, or give it a parameter to indicate the character set used.

That is the only solution that is 100% guaranteed. If you want to try something more advanced, look at IBM's icu project (Java / C/ C++). It has functionality that detects the charset or encoding of character data in an unknown format, but the results can not be guaranteed to always be correct.

| Software Development

Door Gert-Jan van de Streek / apr 2024

Vond je deze post leuk?

Dan denken we dat dit ook wat voor jou is.

Lees meer

Healthy RSS addiction

Door Avisi / jan 2012 / 1 Min

Integration is often like a trade mission

Door Gert-Jan van de Streek / mrt 2014 / 1 Min

Rating open source

Door Gert-Jan van de Streek / jan 2012 / 1 Min

Javascript

Door Gert-Jan van de Streek / jan 2012 / 1 Min

Website followup

Door Gert-Jan van de Streek / jan 2012 / 1 Min

Agile Software Architecture Symposium

Door Avisi / jun 2012 / 1 Min

Neem contact met ons op.

Vond je deze post leuk?

Healthy RSS addiction

Integration is often like a trade mission

Rating open source

Javascript

Website followup

Agile Software Architecture Symposium

Blijf op de hoogte — Schrijf je in voor onze nieuwsbrief.