Problems with iso-8859-2 charset and nroff source manpage X-lation

List overview All Threads
Download

newer

older

ptags utility

20011123 good news - bad news

Mirsad Todorovac

6 Dec 2001 6 Dec '01

4:40 p.m.

Hi, everybody,

Anybody help me?

I'm puzzled with this - I don't know how to display non-ASCII letters on a man page, and this seriously affects my efforts to translate docs to Croatian (considers everybody from non-US locales, I guess).

So, the original looks like: ----------------------------------------------------------------- .SH DESCRIPTION This man page does not contain much information about GNU Pascal because the unstructured man page format is not well suited for documenting a large program such a GNU Pascal. You can find the relevant information in the GNU Pascal Manual which is avilable in ... -----------------------------------------------------------------

My translation is: ----------------------------------------------------------------- .SH OPIS Ova man stranica ne sadr¾i previ¹e informacija o GNU Pascalu jer nestrukturiran format man stranice nije prikladan za dokumentiranje tako velikog programa kao ¹to je GNU Pascal. Relevantne informacije mo¾ete pronaæi u GNU Pascal priruèniku koji je raspolo¾iv u Texinfo formatu i formatima proivedenima iz njega, ... ------------------------------------------------------------------

But it gets printed like (messy): ------------------------------------------------------------------ OPIS Ova man stranica ne sadrM->i previM-9e informacija o GNU Pascalu jer nestruktu - riran format man stranice nije prikladan za dokumentiranje tako velikog programa kao M-9to je GNU Pascal. Relevantne informacije moM->ete pronaM-fi u GNU Pascal priruM-hniku koji je raspoloM->iv u Texinfo formatu i formatima ... ------------------------------------------------------------------

In other words, '¾' ('z' with caret above) becomes M->, '¹' ('s' with caret above) becomes M-9, 'æ' ('c' with apostrophe above) becomes M-f ...

and so on.

It doesn't look pretty. Notice that I have iso-8859-2 (latin 2) font installed in terminal, and Pine displays so-called diacritic signs correctly. The problem is in ``more'' and it could be in ``info'' too, but I haven't translated all .texi files yet so I don't know.

It's easy for me to write ``sed -e 's/¾/z/g' -e 's/¹/s/g' -e 's/æ/c/g' ...''

(on your terminal it may not look that way, but it's actually striping each character of it's 'apostrophe above' and 'caret above' marks) but I hope that somebody reading this has a better idea.

Thanks for listening, Mirsad

-- This message has been made up using recycled ideas and language constructs. No plant or animal has been injured in process of making this message.

Show replies by date

Maurice Lombardi

6 Dec 6 Dec

6:30 p.m.

Mirsad Todorovac wrote:

...

In other words, '¾' ('z' with caret above) becomes M->, '¹' ('s' with caret above) becomes M-9, 'æ' ('c' with apostrophe above) becomes M-f ...

What system are you using ? I see such silly things when I try to use ispell on a french text with iso-8859-1 accented characters if I do not give the option -Tnroff (it is an ispell coming from emtexgi on windows)

Maurice

-- Maurice Lombardi Laboratoire de Spectrometrie Physique, Universite Joseph Fourier de Grenoble, BP87 38402 Saint Martin d'Heres Cedex FRANCE Tel: 33 (0)4 76 51 47 51 Fax: 33 (0)4 76 63 54 95 mailto:Maurice.Lombardi@ujf-grenoble.fr

Mirsad Todorovac

6:53 p.m.

Thank you for your message, Maurice,

On Thu, 6 Dec 2001, Maurice Lombardi wrote:

...

...
In other words, '¾' ('z' with caret above) becomes M->, '¹' ('s' with caret above) becomes M-9, 'æ' ('c' with apostrophe above) becomes M-f ...

What system are you using ?

I'm using dec-osf-4.0b, that is Digital Unix 4.0b running on Alpha processor (patched). (Documentation translated is from gpc-20011123 dist, but I suppose that's irrelevant for the case.)

...

I see such silly things when I try to use ispell on a french text with iso-8859-1 accented characters if I do not give the option -Tnroff (it is an ispell coming from emtexgi on windows)

I guess this happens because /usr/bin/more on my system is not 8-bit-clean, it tries to interpret non-ASCII chars as M-<something>.

Unfortunatelly, with GNU less (version 358) I don't get perfect results either, although it seems less confusing, giving hex value of the character:)

-------- nroff -man gpc.hr.1 | less ------------------------------------ ...

OPIS Ova man stranica ne sadr<BE>i previ<B9>e informacija o GNU Pascalu jer nestruk tu- riran format man stranice nije prikladan za dokumentiranje tako velikog programa kao <B9>to je GNU Pascal. Relevantne informacije mo<BE>ete prona<E6> i u GNU Pascal priru<E8>niku koji je raspolo<BE>iv u Texinfo formatu i formatima proivedenima iz njega, kao npr. HTML, Info i DVI. Za gledanje Info dokumen- ... -------------------------------------------------------------------------

So, I guess the task would be to convince ``more'' or at least ``less'' to pass latin-2 (iso8859-2) characters in 8-bit-clean manner.

Maybe this can be done with some environment variable?

Thanks for help, Mirsad

-- This message has been made up using recycled ideas and language constructs. No plant or animal has been injured in process of making this message.

Mirsad Todorovac

7:11 p.m.

New subject: Problems with iso-8859-2 charset and nroff source manpage

On Thu, 6 Dec 2001, Maurice Lombardi wrote:

...

Mirsad Todorovac wrote:

...
In other words, '¾' ('z' with caret above) becomes M->, '¹' ('s' with caret above) becomes M-9, 'æ' ('c' with apostrophe above) becomes M-f ...

What system are you using ?

On sun-sparc-solaris2.7, where I tested it also,

% nroff -man gpc.1

produces a core dump (!?!) (But with english gpc.1 no it doesn't) -- this can be called a problem, since it's not practical for man command to core dump when you ask for manpage, so I guess I'll have to find a workaround (if it's not the case that latin 2 characters make Sun's nroff go berserk?).

I managed to produce acceptable result with some tweaking on Sun with

% groff -Tlatin1 -man gpc.1 | more

It did just what I wanted - passed 8-bit character to terminal, while it's my responsability to load proper font in terminal window.

This was possible only because Sun's ``more'' is 8-bit-clean enough for Latin 2 to work. (``less'' on Sun works the same way as on Digital.)

Yes, I checked that by doing ``nroff -man gpc.1 > gpc.txt'' on DU4.0b and FTP-ing it to Sun: the same file displayed with Sun's and Digital's ``more'' produces different results.

So, with ``groff -Tlatin1 -man gpc.1'' I can about do it, but now the problem is how to force ``more'' or ``less'' to do *less* messing-up of 8-bit chars.

Any ideas?

Mirsad

-- This message has been made up using recycled ideas and language constructs. No plant or animal has been injured in process of making this message.

Russell Whitaker

8:56 p.m.

New subject: Problems with iso-8859-2 charset and nroff source manpage

On Thu, 6 Dec 2001, Mirsad Todorovac wrote:

...

So, with ``groff -Tlatin1 -man gpc.1'' I can about do it, but now the problem is how to force ``more'' or ``less'' to do *less* messing-up of 8-bit chars.

Any ideas?

On the man page for "less" the author is listed as Mark Nudelman marknu@flash.net. Try emailing the problem to him. It might do some good.

Russ

Mirsad Todorovac

7 Dec 7 Dec

10:14 a.m.

New subject: Problems with iso-8859-2 charset and nroff source manpage

On Thu, 6 Dec 2001, Russell Whitaker wrote:

...

...
So, with ``groff -Tlatin1 -man gpc.1'' I can about do it, but now the problem is how to force ``more'' or ``less'' to do *less* messing-up of 8-bit chars.

Any ideas?

On the man page for "less" the author is listed as Mark Nudelman marknu@flash.net. Try emailing the problem to him. It might do some good.

Thanks for the thought, Russ - last night I've got some even more confusing information: when I came home to my Linux (:-) box, I was surprized to see that same version (358) of ``less'' displayed 8-bit characters correctly.

Now, at work on dec-alpha-osf4.0b and sun-sparc-solaris2.7 I used different packager's package than at home (the later is Mandrake). probably it can be either wrong locale or wrong compile-time options (it's ironic that packager is also from area where 8-bit chars are used ;-)

If that fails, I'll follow your advice, and say thanx, Mirsad

-- This message has been made up using recycled ideas and language constructs. No plant or animal has been injured in process of making this message.

Frank Heckenbach

8 Dec 8 Dec

3:01 a.m.

New subject: Problems with iso-8859-2 charset and nroff source manpage

Mirsad Todorovac wrote:

...

Thanks for the thought, Russ - last night I've got some even more confusing information: when I came home to my Linux (:-) box, I was surprized to see that same version (358) of ``less'' displayed 8-bit characters correctly.

Did you check the section `NATIONAL CHARACTER SETS' in the less manpage?

Otherwise, my suggestion is (like the GNU project's) not to spend much effort in manpages (the current GPC manpage is rather short and basically just a pointer to the texi documentation), and possibly avoid 8-bit characters if this may cause problems for anyone.

Frank

-- Frank Heckenbach, frank@g-n-u.de, http://fjf.gnu.de/ GPC To-Do list, latest features, fixed bugs: http://agnes.dida.physik.uni-essen.de/~gnu-pascal/todo.html

Mirsad Todorovac

2:32 p.m.

New subject: Problems with iso-8859-2 charset and nroff source manpage

On Sat, 8 Dec 2001, Frank Heckenbach wrote:

...

Mirsad Todorovac wrote:

...
Thanks for the thought, Russ - last night I've got some even more confusing information: when I came home to my Linux (:-) box, I was surprized to see that same version (358) of ``less'' displayed 8-bit characters correctly.

Did you check the section `NATIONAL CHARACTER SETS' in the less manpage?

No - frankly I didn't get that far (that's the advantage of ``info'' format documentation that you get menu at the beginning, so you get the basic idea what's inside ...

However, setting LESSCHARSET to ``iso8859'' *did* fix the problem.

...

Otherwise, my suggestion is (like the GNU project's) not to spend much effort in manpages (the current GPC manpage is rather short and basically just a pointer to the texi documentation), and possibly avoid 8-bit characters if this may cause problems for anyone.

Thanks to Paul Eggert eggert@twinsun.com from comp.sys.sun.admin, solution was found: ``nroff'' stops dumping core with national characters once ``setenv LC_ALL hr_HR'' is done!! Also Paul said in Solaris 8 the nroff coredump problem is fixed.

(However I'm in stupid position that my network management decided not to support transition to Solaris 8, while Sun stopped giving most of support for Solaris 7 - but this is only a remark).

FRANKLY, I don't know how to avoid 8-bit characters in translation (Did you watch the ''Only fools and horses'', the episode where they hired a singer who couldn't speak 'r' but sung only songs without it ;-))))))))

I could replace character that looks like

* * ****

******** ******** * * * * * with plain * * * * * * * ******** ********

but it doesn't look professional, and there is not such a standard as wise Germans introduced writing 'u with umlaut' as 'ue' etc.

That means that it wouldn't be widely accepted, some would even ridicule it I'm affraid, and this is contrary to what we want an to GNU philosophy I guess.

What we *could* do is to say that it's Sun/nroff problem and advise users to either use ``LESSCHARSET=iso8859 groff -Tlatin1 -man | less'' or equivalent, which seems *more GNU*. Or to point them to use ``LC_ALL=hr_HR man gpc.1'', which would call nroff with correct locale ?!?!?

I think Mr. Richard Stallman would agree with me

-- This message has been made up using recycled ideas and language constructs. No plant or animal has been injured in process of making this message.

Frank Heckenbach

9 Dec 9 Dec

5:09 a.m.

New subject: Problems with iso-8859-2 charset and nroff source manpage

Mirsad Todorovac wrote:

...

FRANKLY, I don't know how to avoid 8-bit characters in translation (Did you watch the ''Only fools and horses'', the episode where they hired a singer who couldn't speak 'r' but sung only songs without it ;-))))))))

I didn't, but I can imagine. ;-)

...

but it doesn't look professional, and there is not such a standard as wise Germans introduced writing 'u with umlaut' as 'ue' etc.

I had hoped there was such a convention.

...

That means that it wouldn't be widely accepted, some would even ridicule it I'm affraid, and this is contrary to what we want an to GNU philosophy I guess.

What we *could* do is to say that it's Sun/nroff problem and advise users to either use ``LESSCHARSET=iso8859 groff -Tlatin1 -man | less'' or equivalent, which seems *more GNU*. Or to point them to use ``LC_ALL=hr_HR man gpc.1'', which would call nroff with correct locale ?!?!?

I think Mr. Richard Stallman would agree with me

Probably. BTW, do you have any other Croatian manpages accessible? If so, what do they do?

However, from your notes it's probably ok to use 8859-2 then.

Frank

-- Frank Heckenbach, frank@g-n-u.de, http://fjf.gnu.de/ GPC To-Do list, latest features, fixed bugs: http://agnes.dida.physik.uni-essen.de/~gnu-pascal/todo.html

Mirsad Todorovac

10 Dec 10 Dec

5:26 p.m.

New subject: Problems with iso-8859-2 charset and nroff source manpage

On Sun, 9 Dec 2001, Frank Heckenbach wrote:

...

Probably. BTW, do you have any other Croatian manpages accessible? If so, what do they do?

Actually, bot on Solaris I haven't got any examples I know of.

...

However, from your notes it's probably ok to use 8859-2 then.

OK, then. In the mean time I'll have in mind to ask if anybody translating to non-ASCII text faced this problem. But, what comes to mind - if somebody is reading hr_HR manpages, he'd probably want to have hr_HR locale enabled via LC_ALL, wouldn't he?

-- This message has been made up using recycled ideas and language constructs. No plant or animal has been injured in process of making this message.

Frank Heckenbach

11 Dec 11 Dec

1:15 a.m.

New subject: Problems with iso-8859-2 charset and nroff source manpage

Mirsad Todorovac wrote:

...

OK, then. In the mean time I'll have in mind to ask if anybody translating to non-ASCII text faced this problem. But, what comes to mind - if somebody is reading hr_HR manpages, he'd probably want to have hr_HR locale enabled via LC_ALL, wouldn't he?

I hope so, but I'm not sure. At least, not every German user seems to have it set to de_DE. I've noticed some problems which programs that relied on this (e.g., for "ctype"s). But then I just told them to set their locale instead of changing my programs. ;-)

Frank

-- Frank Heckenbach, frank@g-n-u.de, http://fjf.gnu.de/ GPC To-Do list, latest features, fixed bugs: http://agnes.dida.physik.uni-essen.de/~gnu-pascal/todo.html

8606

Age (days ago)

8611

Last active (days ago)

gpc@gnu.de

10 comments

4 participants

tags (0)

participants (4)

Frank Heckenbach
Maurice Lombardi
Mirsad Todorovac
Russell Whitaker