Unicode

Internally everything SHOULD be unicode-objects. The big advantage of an unicode object is, that it knows its encoding, and can be converted to any wanted encoding) a str-instance does not know what its chars represent.

the relevant Callbacks (mostly msg, query, command, noticed) get the input string as unicode object, and they SHOULD use sendmsg and sendme with unicode messages.

In the future we MAY have an assert type(msg)==unicode there.

Transitions

Some python modules do not work with unicode, and file write/read does not know the encoding. so you need to convert input strings with:

input_unicode=unicode(input, "my-encoding-name", errors="replace")

read help(unicode) about errors, if you want to try encoding-fallbacks or anything like this.

for writing unicode strings to file, you need to convert them to an encoded str-instance. i.e

file.write(unicode_string.encode("UTF-8"))

set debugUnicode: true in otfbot.yaml to debug where strings are converted to unicode to find all places which still need to be modified to use unicode-objects

internal stuff you might want to know

  • the callbacks try to convert from the configured channel-encoding to unicode, with fallback to an fallback-encoding.
  • the config service converts unicode-strings which are only ascii-chars to str before writing, because yaml writes !!python/unicode 'string', if the string is unicode and does not contain non-ascii chars, which is ugly for a config.

conversion needed

  • urllib.quote_plus needs an utf-8 str as input.