Discussion:
bug#29802: "uniq -c" doesn't like counting lines with nulls
(too old to reply)
PD
2017-12-21 08:40:34 UTC
Permalink
Uniq *sometimes* fails to combine lines containing a null character:

# uniq --version
uniq (GNU coreutils) 8.4

##### Count duplicate text lines:
# printf "\n\x00\n\x00\n" | cat -e | uniq -c
1 $
2 ^@$

##### Count duplicate binary lines:
# printf "\x00\n\x00\n\n" | uniq -c | cat -e
2 ^@$
1 $

##### Whoops, fail to count duplicate binary lines:
# printf "\n\x00\n\x00\n" | uniq -c | cat -e
1 $
1 ^@$
1 ^@$

This was the smallest test case; the original file had hundreds of lines
with nulls (\x00) and Ctrl-A (\x01) characters, and it was quite a
surprise when the output of 'sort testfile | uniq -c' had many pages of '1
^@$' followed by '496 ^A$': it was counting the Ctrl-A lines correctly,
but failing on the null-character lines.

For automated testing with 'delta' or 'git bisect', this works:
---
#!/bin/bash
a=$(sort $1 | cat -e | uniq -c | md5sum -)
b=$(sort $1 | uniq -c | cat -e | md5sum -)
if [[ "$a" != "$b" ]]; then
echo "PASS (bug present)"; exit 0
else
echo "FAIL (bug absent)"; exit 1
fi
----

I regret not having the time to test this with coreutils 8.28, but I
couldn't see anything in the git log to suggest this has been fixed:
http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=history;f=src/uniq.c;h=d1dac93c010d7333ced4b54fccbd965cbd5729c2;hb=HEAD

Cheers,
PD
Pádraig Brady
2017-12-21 16:33:52 UTC
Permalink
Post by PD
# printf "\n\x00\n\x00\n" | uniq -c | cat -e
1 $
Not reproducible on recent versions.
Might this have been specific to the i18n patch?
I.E. can you reproduce with LC_ALL=C set in the env?

thanks,
Pádraig
Assaf Gordon
2018-10-30 02:20:38 UTC
Permalink
tags 29802 moreinfo
close 29802
stop

(triaging old bugs)
Post by Pádraig Brady
Post by PD
# printf "\n\x00\n\x00\n" | uniq -c | cat -e
1 $
Not reproducible on recent versions.
Might this have been specific to the i18n patch?
I.E. can you reproduce with LC_ALL=C set in the env?
With no further comments in almost a year, I'm closing this bug.
Discussion can continue by replying to this thread.

-assaf

Continue reading on narkive:
Loading...