Discussion:
bug#18168: Bug in "sort -V" ?
Schleusener, Jens
2014-08-01 09:38:39 UTC
Permalink
Hi,

I am not sure if it's a bug or not but for my application cases the "sort"
command with use of the very helpful option "-V" (natural sort of
(version) numbers within text) not always delivers the by me expected
output.

Example input file (with four test cases):

1.0.5_src.tar.gz
1.0_src.tar.gz
2.0.5src.tar.gz
2.0src.tar.gz
3.0.5/
3.0/
4.0.5beta/
4.0beta/

Sorted ("sort -V") output file (with errors?):

1.0.5_src.tar.gz
1.0_src.tar.gz
2.0src.tar.gz
2.0.5src.tar.gz
3.0.5/
3.0/
4.0beta/
4.0.5beta/

By me expected output file:

1.0_src.tar.gz
1.0.5_src.tar.gz
2.0src.tar.gz
2.0.5src.tar.gz
3.0/
3.0.5/
4.0beta/
4.0.5beta/

You see that the sort works correctly if after the [0-9\.]* part follows
a alphabetic character but not if follows a non-alphabetic character like
a slash or an underscore.

Regards

Jens
Assaf Gordon
2018-11-06 18:48:07 UTC
Permalink
tags 18168 notabug
close 18168
stop

(triaging old bugs)

Hello,

It seems your message was lost and not replied to in 4 years.
Sorry about that.
Post by Schleusener, Jens
I am not sure if it's a bug or not but for my application cases the
"sort" command with use of the very helpful option "-V" (natural sort of
(version) numbers within text) not always delivers the by me expected
output.
Note that "-V/--version" is specifically sorting by Debian's *version*
sorting rules. It might seem like it's the same as "natural sort", but
it is not.

The exact rules are here:
https://www.debian.org/doc/debian-policy/ch-controlfields.html#version
https://readme.phys.ethz.ch/documentation/debian_version_numbers/
Post by Schleusener, Jens
1.0.5_src.tar.gz
1.0_src.tar.gz
2.0.5src.tar.gz
2.0src.tar.gz
3.0.5/
3.0/
4.0.5beta/
4.0beta/
1.0.5_src.tar.gz
1.0_src.tar.gz
2.0src.tar.gz
2.0.5src.tar.gz
3.0.5/
3.0/
4.0beta/
4.0.5beta/
1.0_src.tar.gz
1.0.5_src.tar.gz
2.0src.tar.gz
2.0.5src.tar.gz
3.0/
3.0.5/
4.0beta/
4.0.5beta/
The disagreement is about "1.0_src.tar.gz" vs "1.0.5_src.tar.gz"
and "3.0/" vs "3.0.5/" .

Note that these characters are not strictly valid characters in debian
version strings.

Let's try to compare them using Debian's own tools:

First, define a tiny shell function to help compare strings:

compver() {
dpkg --compare-versions "$1" lt "$2" \
&& printf "%s\n" "$1" "$2" \
|| printf "%s\n" "$2" "$1"
}

Then, compare the values:

$ compver 1.0.5_src.tar.gz 1.0_src.tar.gz
dpkg: warning: version '1.0.5_src.tar.gz' has bad syntax: invalid
character in version number
dpkg: warning: version '1.0_src.tar.gz' has bad syntax: invalid
character in version number
1.0.5_src.tar.gz
1.0_src.tar.gz

$ compver 3.0/ 3.0.5/
dpkg: warning: version '3.0/' has bad syntax: invalid character in
version number
dpkg: warning: version '3.0.5/' has bad syntax: invalid character in
version number
3.0.5/
3.0/

So sort's order agrees with Debian's ordering rules.
It might not be what a "natural sort" algorithm would do, but version-sort
is not exactly natural-sort.

Another detailed example of a version-sort is here:
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=22275



As such, I'm closing this bug.
Discussion can continue by replying to this thread.

-assaf
L A Walsh
2018-11-20 23:33:51 UTC
Permalink
Post by Assaf Gordon
Post by Schleusener, Jens
I am not sure if it's a bug or not but for my application cases the
"sort" command with use of the very helpful option "-V" (natural sort of
(version) numbers within text) not always delivers the by me expected
output.
Note that "-V/--version" is specifically sorting by Debian's *version*
sorting rules. It might seem like it's the same as "natural sort", but
it is not.
https://www.debian.org/doc/debian-policy/ch-controlfields.html#version
https://readme.phys.ethz.ch/documentation/debian_version_numbers/
Post by Schleusener, Jens
1.0.5_src.tar.gz
1.0_src.tar.gz
2.0.5src.tar.gz
2.0src.tar.gz
3.0.5/
3.0/
4.0.5beta/
4.0beta/
1.0.5_src.tar.gz
1.0_src.tar.gz
2.0src.tar.gz
2.0.5src.tar.gz
3.0.5/
3.0/
4.0beta/
4.0.5beta/
1.0_src.tar.gz
1.0.5_src.tar.gz
2.0src.tar.gz
2.0.5src.tar.gz
3.0/
3.0.5/
4.0beta/
4.0.5beta/
The disagreement is about "1.0_src.tar.gz" vs "1.0.5_src.tar.gz"
and "3.0/" vs "3.0.5/" .
Note that these characters are not strictly valid characters in debian
version strings.
---
I too would disagree with the above ordering.

This bug had me go and look at 2 places where I compared version
strings (I compared 2 algorithms) using the above as input, but removing
the '/' which really shouldn't be part of the version string as it looks
like
output from ls (though I probably should add that case in my torture
testing).
My 2nd algorithm looks like I looked at sources from rpm probably
derived from
some debian order.

My first algorithm I could justify as right or wrong gives the original
posters expected order, but the 2nd(likely deb) gives the deb order --
almost.
The addition of the '/' chars changes the sort order. Even that points
to the
assertion that "it shouldn't".

I.e. in the
3.0 v. 3.0.5, the latter comes out 'greater' in deb rules rules (and mine)

**BUT**

3.0/ v. 3.0.5/ and
3.0_ v. 3.0.5_ don't sort as might be expected, though these:

3.0- v. 3.0.5-
3.0() v 3.0.5()
3.0a v. 3.0.5a

show the 2nd expr as greater. I am thinking such inconsistencies are a bit
odd in a Version-sort, especially for a determinant tool?
Post by Assaf Gordon
compver() {
dpkg --compare-versions "$1" lt "$2" \
&& printf "%s\n" "$1" "$2" \
|| printf "%s\n" "$2" "$1"
}
$ compver 1.0.5_src.tar.gz 1.0_src.tar.gz
dpkg: warning: version '1.0.5_src.tar.gz' has bad syntax: invalid
character in version number
dpkg: warning: version '1.0_src.tar.gz' has bad syntax: invalid
character in version number
1.0.5_src.tar.gz
1.0_src.tar.gz
$ compver 3.0/ 3.0.5/
dpkg: warning: version '3.0/' has bad syntax: invalid character in
version number
dpkg: warning: version '3.0.5/' has bad syntax: invalid character in
version number
3.0.5/
3.0/
So sort's order agrees with Debian's ordering rules.
---
One might consider an error, to mean "indeterminant".

Especially -- it should be the case that the tool sort documents how it
sort(s) work within its manpages. -- hyperlinks to outside sources
doesn't usually cut it for this type of program (console based -- _most_
consoles
don't support hyperlinks).
Paul Eggert
2018-11-21 18:12:18 UTC
Permalink
I can't see us disagreeing with Debian. Perhaps you can file a bug report with
Debian and get them to switch to the algorithm you prefer.

Continue reading on narkive:
Loading...