Discussion:
UTF-8 MS Windows
Leon van Dommelen
2014-05-19 20:16:32 UTC
Permalink
Hi,

I am having difficulty getting utf-8 characters to show up on the
screen in DOS cmd.exe windows. I use the standard parenthesized batch
coding to do it,

(
chcp 65001
call lynx.bat
chcp 437
)

and the TT Lucida Console fonts.

But the UTF-8 bytes show up as individual boxes even with UNICODE
(UTF-8) in the options menu. I am wondering if it is because the
bytes are send to standard output one by one. I find, using g77, that
if I print each byte separately, they do not get recognized as part of
a utf-8 character. If however I print them as a single character
string, they do show up correctly, most of them at least.

This was using Windows XP and lynx 2.8.7. I am trying to use it to
show help files for some software of mine in case people do not have
access to a GUI.

Thanks,
Leon van Dommelen
--
NON-PRIVACY NOTICE: I regularly quote E-mail I receive to third
parties. E-mailing me indicates consent of those terms. Full details
are at http://www.eng.fsu.edu/~dommelen/notices/index.html

NON-REPRESENTATION NOTICE: My E-mails express my personal opinions,
not those of my employers. For official university opinions,
contact the appropriate administrative unit as found on one of:
http://www.eng.fsu.edu http://www.famu.edu http://www.fsu.edu

Leon van Dommelen E-mail: ***@eng.fsu.edu
FAMU-FSU College of Engineering Web: http://www.eng.fsu.edu/~dommelen
2525 Pottsdamer St, Room A229 Phone: (850) 410 6324 (FAX: 6337)
Tallahassee, FL 32310-6046 Mechan. Engng. Dept.: (850) 410-6331
--------------------> Please see warnings above <---------------------
Thomas Dickey
2014-05-20 23:54:26 UTC
Permalink
Post by Leon van Dommelen
Hi,
I am having difficulty getting utf-8 characters to show up on the
screen in DOS cmd.exe windows. I use the standard parenthesized batch
coding to do it,
(
chcp 65001
call lynx.bat
chcp 437
)
and the TT Lucida Console fonts.
that's part of it. But lynx has to be built using a wide-character pdcurses
(or wide-character ncurses) to provide Unicode in a console window.
Post by Leon van Dommelen
But the UTF-8 bytes show up as individual boxes even with UNICODE
(UTF-8) in the options menu. I am wondering if it is because the
bytes are send to standard output one by one. I find, using g77, that
if I print each byte separately, they do not get recognized as part of
a utf-8 character. If however I print them as a single character
string, they do show up correctly, most of them at least.
This was using Windows XP and lynx 2.8.7. I am trying to use it to
show help files for some software of mine in case people do not have
access to a GUI.
For most platforms, I'd ask what does
lynx -version
say, but for windows, it's buried in the dll information (the wide-character
pdcurses.dll is 118784 bytes, while the non-wide one is a few hundred bytes
shorter). The installers that I built last year use the wide-character pdcurses.
--
Thomas E. Dickey <***@invisible-island.net>
http://invisible-island.net
ftp://invisible-island.net
Gisle Vanem
2014-05-21 10:22:39 UTC
Permalink
Post by Thomas Dickey
say, but for windows, it's buried in the dll information (the wide-character
pdcurses.dll is 118784 bytes, while the non-wide one is a few hundred bytes
shorter). The installers that I built last year use the wide-character pdcurses.
I have not tried this option. But using SLang 2.2.4 (build with
-D_UNICODE + -DUNOCODE) and the same in lynx all hell breaks
loose. E.g.:
src/LYUtils.c(7782) : warning C4133: 'function' : incompatible types - from 'const char *' to 'LPCWSTR'
src/LYUtils.c(7822) : warning C4133: 'return' : incompatible types - from 'LPTSTR' to 'char *'

The MS-convention (as we all know?) is that LPTSTR etc. maps to 'char *'
or 'wchar_t *' depending on ASCII (default) or UNICODE. But the lynx-sources
can never accept building with '-DUNICODE' because of such errors shown
above.

So using TCHAR/LPTSTR etc. in the sources has no point as-is. The good
news is that it won't be so difficult to fix AFAICS. I have identified these
files with ASCII/UNICODE errors/warnings:

# err/warn file:
-------------------------
7 src/LYExtern.c
1 src/LYMain.c
1 src/LYMainLoop.c
13 src/LYUtils.c
2 lib/dirent.c
4 WWW/Library/Implementation/HTFile.c
4 WWW/Library/Implementation/HTTCP.c
2 WWW/Library/Implementation/HTDOS.c

Btw. line 7782 of LYUtils.c is:
lstrcpy((LPTSTR) pLogData, szBuffer);

So I'm not sure what the "official" way to support wide-chars in Lynx/Win32
is.

--gv
Thomas Dickey
2014-05-21 12:03:08 UTC
Permalink
Post by Gisle Vanem
Post by Thomas Dickey
say, but for windows, it's buried in the dll information (the wide-character
pdcurses.dll is 118784 bytes, while the non-wide one is a few hundred bytes
shorter). The installers that I built last year use the wide-character pdcurses.
I have not tried this option. But using SLang 2.2.4 (build with
-D_UNICODE + -DUNOCODE) and the same in lynx all hell breaks loose.
That's expected behavior: no one's provided any patches for slang for several years.
(I made my position clear enough about ten years ago: I'll accept and maintain changes
which use slang, but I will not develop new code using slang).
Post by Gisle Vanem
src/LYUtils.c(7782) : warning C4133: 'function' : incompatible types - from 'const char *' to 'LPCWSTR'
src/LYUtils.c(7822) : warning C4133: 'return' : incompatible types - from 'LPTSTR' to 'char *'
The MS-convention (as we all know?) is that LPTSTR etc. maps to 'char *'
or 'wchar_t *' depending on ASCII (default) or UNICODE. But the
lynx-sources can never accept building with '-DUNICODE' because of
such errors shown above.
However, lynx doesn't use that - it relies on the pdcurses or ncurses library
to do this.

(see makefile.msc, which defines WIDE_CURSES, for a starting point)
Post by Gisle Vanem
So using TCHAR/LPTSTR etc. in the sources has no point as-is. The good
news is that it won't be so difficult to fix AFAICS. I have
-------------------------
7 src/LYExtern.c 1 src/LYMain.c 1
src/LYMainLoop.c 13 src/LYUtils.c
2 lib/dirent.c
4 WWW/Library/Implementation/HTFile.c 4
WWW/Library/Implementation/HTTCP.c 2
WWW/Library/Implementation/HTDOS.c
lstrcpy((LPTSTR) pLogData, szBuffer);
So I'm not sure what the "official" way to support wide-chars in Lynx/Win32
is.
--gv
--
Thomas E. Dickey <***@invisible-island.net>
http://invisible-island.net
ftp://invisible-island.net
Thorsten Glaser
2014-05-21 12:27:39 UTC
Permalink
Post by Gisle Vanem
The MS-convention (as we all know?) is that LPTSTR etc. maps to 'char *'
or 'wchar_t *' depending on ASCII (default) or UNICODE. But the lynx-sources
can never accept building with '-DUNICODE' because of such errors shown above.
Lynx doesn’t use wchar_t (UCS) but rather multibyte strings in
some given encoding – here, UTF-8. In my experience, it is
virtually impossible to use Windows® console applications
with UTF-8, since, to do this right, one has to use the Wide
char interface.

As Tom said: the curses library has to do the translation here.
This amounts to a complete rewrite of the curses library. It is
important that the curses library would also *not* use -DUNICODE
but rather call the various FooW() library functions directly,
because it would be called from code that does not use -DUNICODE
and has to retain API compatibility there.

bye,
//mirabilos
--
ah, that reminds me, thanks for the stellar entertainment that you and certain
other people provide on the Debian mailing lists │ sole reason I subscribed to
them (I'm not using Debian anywhere) is the entertainment factor │ Debian does
not strike me as a place for good humour, much less German admin-style humour
Gisle Vanem
2014-05-21 14:57:43 UTC
Permalink
Post by Thorsten Glaser
As Tom said: the curses library has to do the translation here.
This amounts to a complete rewrite of the curses library.
Ok, thanks for clearing up that.

Tom, I've always preferred Slang over PDcurses/ncurses etc. (mostly
because it makes nicer and faster console programs compared to
PD-curses. 'most' is an excellent example vs. 'less' built with PDcurses
AFAIK). But I'm not so religious on the issue. I might have a go at building
Lynx/Win32/MSVC with some curses lib.

--gv

Loading...