Discussion:
bug#23302: mention what are nonprinting characters
(too old to reply)
積丹尼 Dan Jacobson
2016-04-16 19:50:53 UTC
Permalink
In (info "(coreutils) Concept index") there are several items that talk
about nonprinting characters.

Well on each definition be sure to have a blue word link:: to a passage
about which characters are nonprinting, lest the user think e.g.,
SPC (' ') is nonprinting.
f0rhum
2016-04-17 08:16:21 UTC
Permalink
As per https://en.wikipedia.org/wiki/ASCII#ASCII_control_characters
Post by 積丹尼 Dan Jacobson
In (info "(coreutils) Concept index") there are several items that talk
about nonprinting characters.
Well on each definition be sure to have a blue word link:: to a passage
about which characters are nonprinting, lest the user think e.g.,
SPC (' ') is nonprinting.
Assaf Gordon
2018-10-27 22:14:59 UTC
Permalink
close 23302
stop

(triaging old bugs)
Post by f0rhum
Post by 積丹尼 Dan Jacobson
In (info "(coreutils) Concept index") there are several items that talk
about nonprinting characters.
Well on each definition be sure to have a blue word link:: to a passage
about which characters are nonprinting, lest the user think e.g.,
SPC (' ') is nonprinting.
As per https://en.wikipedia.org/wiki/ASCII#ASCII_control_characters
With no further comments in 2 years, I'm closing this bug.

-assaf
積丹尼 Dan Jacobson
2018-10-31 18:34:06 UTC
Permalink
Yes but every program has slightly different sets of non-printing
characters, so they need to list them exactly.
Assaf Gordon
2018-11-01 02:51:22 UTC
Permalink
Post by 積丹尼 Dan Jacobson
Yes but every program has slightly different sets of non-printing
characters, so they need to list them exactly.
To my understanding, printable characters in C/POSIX locale
are strictly defined here:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07_03_01_01
Where it says:
"print" is by definition "alnum", "punct", and the <space>
and alnum/punct/space are defined on that page.

From that, every C program uses isprint(3) to determine
if a octet (value 0 to 255) is printable or not.
http://man7.org/linux/man-pages/man3/isprint.3p.html

And all corteutils' program use said logic.
(all bets are off in non C locale, of course).

For example,
Let's generate a file containing all 256 octets:

env printf "$(env printf '\\x%02x' $(seq 0 255))" > 1

od's "z" type shows only printable characters:

$ od -An -tx1z 1
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f >................<
10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f >................<
20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f > !"#$%&'()*+,-./<
30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f >0123456789:;<=>?<
40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f >@ABCDEFGHIJKLMNO<
50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f >PQRSTUVWXYZ[\]^_<
60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f >`abcdefghijklmno<
70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f >pqrstuvwxyz{|}~.<
80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f >................<
90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f >................<
a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af >................<
b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf >................<
c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf >................<
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df >................<
e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef >................<
f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff >................<

od's "c" type shows non-printable characters as octal values or escape
sequences:

$ od -An -tc 1
\0 001 002 003 004 005 006 \a \b \t \n \v \f \r 016 017
020 021 022 023 024 025 026 027 030 031 032 033 034 035 036 037
! " # $ % & ' ( ) * + , - . /
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
@ A B C D E F G H I J K L M N O
P Q R S T U V W X Y Z [ \ ] ^ _
` a b c d e f g h i j k l m n o
p q r s t u v w x y z { | } ~ 177
200 201 202 203 204 205 206 207 210 211 212 213 214 215 216 217
220 221 222 223 224 225 226 227 230 231 232 233 234 235 236 237
240 241 242 243 244 245 246 247 250 251 252 253 254 255 256 257
260 261 262 263 264 265 266 267 270 271 272 273 274 275 276 277
300 301 302 303 304 305 306 307 310 311 312 313 314 315 316 317
320 321 322 323 324 325 326 327 330 331 332 333 334 335 336 337
340 341 342 343 344 345 346 347 350 351 352 353 354 355 356 357
360 361 362 363 364 365 366 367 370 371 372 373 374 375 376 377

tr can delete non-printables using a character class:

$ tr -cd '[:print:]' < 1 ; echo

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~



and printf's "%q" type will also escape all non-printables as octal values:

$ env printf "%q\n" "$(cat 2)"
-bash: warning: command substitution: ignored null byte in input

'\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037''
!"#$%&'\''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~'$'\177\200\201\202\203\204\205\206\207\210\211\212\213\214\215\216\217\220\221\222\223\224\225\226\227\230\231\232\233\234\235\236\237\240\241\242\243\244\245\246\247\250\251\252\253\254\255\256\257\260\261\262\263\264\265\266\267\270\271\272\273\274\275\276\277\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\327\330\331\332\333\334\335\336\337\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\367\370\371\372\373\374\375\376\377'

So it seems all these programs agree on what is a printable (and non-
printable) character - based on external definition.


Is there another instance you are aware of that behaves differently ?

-assaf
積丹尼 Dan Jacobson
2018-11-01 03:11:49 UTC
Permalink
Good! You need to then tie all the documentation you found, into the
coreutils documentation, as the official declaration of what you mean.
Just like "man perlrecharclass - Perl Regular Expression Character
Classes" does. I mean one cannot just hope the user will "Google" and
then land on "Wikipedia" and hope what is there is 100% the same.
Loading...