Encoding

By Druss , 2 April, 2015

Here I was simply creating a MySQL (5.5) table when suddenly up pops the following error:

#1071 - Specified key was too long; max key length is 767 bytes

After a little trial and error, I found that since one of my VARCHAR fields was being used for a UNIQUE index, MySQL was basically telling me that it was using too much space. When I reduced the length of this field from its initial 512 setting to 256 & then 255, it still complained. However, reducing it further to 128 fixed the issue!

By Druss , 2 June, 2013

While performing a CSV import recently, I ran into the following error messages:

Warning (Code 1366): Incorrect string value: '\xE9, a <...' for column 'body' at row 3
Warning (Code 1366): Incorrect string value: '\xE6. He ...' for column 'body' at row 24
Warning (Code 1366): Incorrect string value: '\xE9, and...' for column 'body' at row 26

The first message was triggered due to the accented é in the word, protegé, in the input. The rest of the field was not imported. The others were similarly triggered.

By Druss , 27 September, 2011

I work extensively on a Windows desktop. However, I do SSH into Linux servers often and I do so using PuTTY, a free and open source client. Everything works peachy. However, I recently had occasion to work extensively with some Unicode source data and I found that there were times when I thought that there were encoding issues with the data as they were not being displayed correctly on my screen.

By Druss , 27 September, 2011

It is often important, especially when dealing with databases and such, that files are stored in the correct character set. Failure to do so can result in illegible displays or even data corruption. Checking the character set of a file in Linux can be accomplished using the file command:
Jubal@Stranger:$ file migrate1.csv
migrate1.csv: Little-endian UTF-16 Unicode English text, with CRLF, LF line terminators
Jubal@Stranger:$ file migrate2.csv
migrate2.csv: UTF-8 Unicode (with BOM) English text, with CRLF line terminators

Tags Old
By Druss , 27 September, 2011

Earlier today, I was banging my head against the wall trying to import some data in a CSV file into MySQL. While my imports have gone well thus far, this time around I was dealing with data involving lots of strange diacritics, runic squiggles and other manners of gibberish that make the world as fun as it can be. In other words, I was dealing with Unicode.

All times are UTC. All content licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.