Discussion:
Lynx (not) mapping of Unicode chars into lower ascii?
Tim Chase
2014-05-18 22:09:44 UTC
Permalink
I noticed that lynx doesn't seem to make certain semi-obvious
transformations from fancy Unicode characters down to standard ascii
characters. For an example, check out my tweet on this page in both
a GUI browser such as Firefox/Chrome/Chromium and in Lynx:

https://twitter.com/gumnos/status/468150283592167424

It should have a whole bunch of various-font "a" characters in the
first set of brackets (only the ascii ones show up) and a variety of
fonted number from 0-5 in the second set of brackets. The
translation from these code-points down into lower-ascii should be
pretty straightforward, but I don't know how lynx currently handles
such translations (I've seen it succeed on some other characters like
turning "©" into "(c)").

It's not that grave an issue for me, but just wanted to mention
stumbling on it.

-tkc
Thorsten Glaser
2014-05-18 22:18:23 UTC
Permalink
Post by Tim Chase
I noticed that lynx doesn't seem to make certain semi-obvious
transformations from fancy Unicode characters down to standard ascii
characters. For an example, check out my tweet on this page in both
I require Lynx to not do this, because I use Unicode excessively.

On the other hand, selecting a nōn-UTF8 charset might want to do
that… but the tables are apparently hand-generated and do not cover
full Unicode, only the obvious stuff related to most legacy charsets.
Covering all of Unicode is going to be a crazy job anyway…

(… although a downconversion from SMP to BMP unicode, for us users
of uxterm, would be welcome…)

bye,
//mirabilos
--
ah, that reminds me, thanks for the stellar entertainment that you and certain
other people provide on the Debian mailing lists │ sole reason I subscribed to
them (I'm not using Debian anywhere) is the entertainment factor │ Debian does
not strike me as a place for good humour, much less German admin-style humour
Tim Chase
2014-05-19 02:09:54 UTC
Permalink
Post by Thorsten Glaser
Post by Tim Chase
I noticed that lynx doesn't seem to make certain semi-obvious
transformations from fancy Unicode characters down to standard
ascii characters. For an example, check out my tweet on this page
in both
I require Lynx to not do this, because I use Unicode excessively.
On the other hand, selecting a nōn-UTF8 charset might want to do
that… but the tables are apparently hand-generated and do not cover
full Unicode, only the obvious stuff related to most legacy
charsets. Covering all of Unicode is going to be a crazy job anyway…
I agree that, if the containing terminal supports it, lynx should
just present the character as-is. I don't know if there's an easy
way to detect whether a terminal supports smarter encodings like
UTF-{8,16,32} where it can/should present the characters as-is, but on
dumber terminals, fall back to a character-remapping.

-tkc
Thomas Dickey
2014-05-21 00:05:21 UTC
Permalink
Post by Tim Chase
Post by Thorsten Glaser
Post by Tim Chase
I noticed that lynx doesn't seem to make certain semi-obvious
transformations from fancy Unicode characters down to standard
ascii characters. For an example, check out my tweet on this page
in both
I require Lynx to not do this, because I use Unicode excessively.
On the other hand, selecting a nōn-UTF8 charset might want to do
that
 but the tables are apparently hand-generated and do not cover
full Unicode, only the obvious stuff related to most legacy
charsets. Covering all of Unicode is going to be a crazy job anyway

I agree that, if the containing terminal supports it, lynx should
just present the character as-is. I don't know if there's an easy
way to detect whether a terminal supports smarter encodings like
UTF-{8,16,32} where it can/should present the characters as-is, but on
dumber terminals, fall back to a character-remapping.
There isn't a reliable way to "detect" the encodings supported by a terminal.
--
Thomas E. Dickey <***@invisible-island.net>
http://invisible-island.net
ftp://invisible-island.net
Paul Gilmartin
2014-05-18 23:29:03 UTC
Permalink
Post by Tim Chase
I noticed that lynx doesn't seem to make certain semi-obvious
transformations from fancy Unicode characters down to standard ascii
characters. For an example, check out my tweet on this page in both
https://twitter.com/gumnos/status/468150283592167424
FWIW, Linux (and to a certain extent Windows) is quite UTF-8-savvy.
The Ubuntu Terminal app does a great job of displaying native UTF8:
Latin, Cyrillic, Greek, Hebrew, Arabic, ... No need to search for
USASCII approximations. Even more astonishing, I can edit mixed
Latin and Cyrillic with vi, and even the case-flip ('~') command
works properly. Works on Cygwin, also, IIRC.

-- gil
Loading...