Discussion:
bug#31332: touch unnecessarily calls dup2
(too old to reply)
John Steele Scott
2018-05-01 10:38:50 UTC
Permalink
From https://stackoverflow.com/questions/40446555/why-does-touch-call-the-dup2-syscall

***@citra:/tmp$ touch --version | head -1
touch (GNU coreutils) 8.25
***@citra:/tmp$ strace -ttt touch foo 2>&1 | tail -9
1525170579.952032 open("foo", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3
1525170579.952080 dup2(3, 0)            = 0
1525170579.952119 close(3)              = 0
1525170579.952156 utimensat(0, NULL, NULL, 0) = 0
1525170579.952209 close(0)              = 0
1525170579.952257 close(1)              = 0
1525170579.952294 close(2)              = 0
1525170579.952333 exit_group(0)         = ?
1525170579.952450 +++ exited with 0 +++

My analysis from that discussion:

    It's a historical artifact.

    The open()+dup2() pattern comes from the fd_reopen() function, which is used
    by several of the programs in the coreutils code base.

    Prior to coreutils commit e373bb1, fd_reopen() did not do open()+dup2(), but
    closed the desired file descriptor before opening a new one. This was the
    case when touch started using this function at coreutils commit 478bd89. Per
    that commit message, the intent was to reduce the number of file descriptors
    touch would have open.

In the scheme of things the extra call to dup2() is not a big deal. We've lived
with it for 10 years. On my laptop it would take 25,000 invocations of touch
before those unnecessary dup2() calls add up to a second. But touch is called a
lot, on systems all over the world. So maybe you guys care about this?

Cheers,

John
Assaf Gordon
2018-10-30 03:17:07 UTC
Permalink
tags 31332 notabug
close 31332
stop
Post by John Steele Scott
From https://stackoverflow.com/questions/40446555/why-does-touch-call-the-dup2-syscall
touch (GNU coreutils) 8.25
1525170579.952032 open("foo", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3
1525170579.952080 dup2(3, 0)            = 0
1525170579.952119 close(3)              = 0
1525170579.952156 utimensat(0, NULL, NULL, 0) = 0
1525170579.952209 close(0)              = 0
1525170579.952257 close(1)              = 0
1525170579.952294 close(2)              = 0
1525170579.952333 exit_group(0)         = ?
1525170579.952450 +++ exited with 0 +++
    It's a historical artifact.
    The open()+dup2() pattern comes from the fd_reopen() function, which is used
    by several of the programs in the coreutils code base.
Not exactly an "artifact" - but an indented operation.

The goal of "fd_reopen" is not just to open a file,
but to ensure the returned file descriptor (an "int") has a specific
value.

For example, standard input (STDIN) is typically file descriptor value
0. calling fd_reopen first opens the file (the kernel returns file
descriptor 3), then it ensures file descriptor 0 is associated with the
same file (and then, calling "close(3)" gets rid of the other file
descriptor).

Checking how fd_reopen is used in coreutils, it is almost always used
to ensure STDIN/STDOUT point to the desired file(s):
====
$ git grep fd_reopen
src/csplit.c: if (! STREQ (name, "-") && fd_reopen (STDIN_FILENO, name,
O_RDONLY, 0) < 0)
src/dd.c:/* Restart on EINTR from fd_reopen(). */
src/dd.c:ifd_reopen (int desired_fd, char const *file, int flag, mode_t
mode)
src/dd.c: ret = fd_reopen (desired_fd, file, flag, mode);
src/dd.c: if (ifd_reopen (STDIN_FILENO, input_file, O_RDONLY |
input_flags, 0) < 0)
src/dd.c: || ifd_reopen (STDOUT_FILENO, output_file, O_RDWR |
opts, perms) < 0)
src/dd.c: && (ifd_reopen (STDOUT_FILENO, output_file, O_WRONLY
| opts, perms)
src/nohup.c: if (fd_reopen (STDIN_FILENO, "/dev/null", O_WRONLY, 0)
< 0)
src/nohup.c: ? fd_reopen (STDOUT_FILENO, file, flags, mode)
src/nohup.c: ? fd_reopen (STDOUT_FILENO, in_home,
flags, mode)
src/split.c: && fd_reopen (STDIN_FILENO, infile, O_RDONLY, 0) < 0)
src/stty.c: if (fd_reopen (STDIN_FILENO, device_name, O_RDONLY |
O_NONBLOCK, 0) < 0)
src/touch.c: fd = fd_reopen (STDIN_FILENO, file,
===

This means the rest of the program can just operate on STDIN (or STDOUT).


As such, I'm closing this item.
Discussion can continue by replying to this thread.

-assaf
John Steele Scott
2018-10-30 04:07:23 UTC
Permalink
Hi Assif,

Thanks for your reply.
Post by Assaf Gordon
tags 31332 notabug
close 31332
stop
 From https://stackoverflow.com/questions/40446555/why-does-touch-call-the-dup2-syscall
touch (GNU coreutils) 8.25
1525170579.952032 open("foo", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3
1525170579.952080 dup2(3, 0)            = 0
1525170579.952119 close(3)              = 0
1525170579.952156 utimensat(0, NULL, NULL, 0) = 0
1525170579.952209 close(0)              = 0
1525170579.952257 close(1)              = 0
1525170579.952294 close(2)              = 0
1525170579.952333 exit_group(0)         = ?
1525170579.952450 +++ exited with 0 +++
     It's a historical artifact.
     The open()+dup2() pattern comes from the fd_reopen() function, which is used
     by several of the programs in the coreutils code base.
Not exactly an "artifact" - but an indented operation.
I called it a historical artefact because the call to dup2() did not occur when touch started calling fd_reopen(). Prior to http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=e373bb19 fd_reopen() initially did "close(desired_fd); fd = open(...)" which would always do the right thing if stdin was the desired fd. No call to dup2().
Post by Assaf Gordon
The goal of "fd_reopen" is not just to open a file,
but to ensure the returned file descriptor (an "int") has a specific
value.
For example, standard input (STDIN) is typically file descriptor value 0. calling fd_reopen first opens the file (the kernel returns file descriptor 3), then it ensures file descriptor 0 is associated with the same file (and then, calling "close(3)" gets rid of the other file descriptor).
Checking how fd_reopen is used in coreutils, it is almost always used
====
$ git grep fd_reopen
src/csplit.c:  if (! STREQ (name, "-") && fd_reopen (STDIN_FILENO, name, O_RDONLY, 0) < 0)
src/dd.c:/* Restart on EINTR from fd_reopen().  */
src/dd.c:ifd_reopen (int desired_fd, char const *file, int flag, mode_t mode)
src/dd.c:      ret = fd_reopen (desired_fd, file, flag, mode);
src/dd.c:      if (ifd_reopen (STDIN_FILENO, input_file, O_RDONLY | input_flags, 0) < 0)
src/dd.c:           || ifd_reopen (STDOUT_FILENO, output_file, O_RDWR | opts, perms) < 0)
src/dd.c:          && (ifd_reopen (STDOUT_FILENO, output_file, O_WRONLY | opts, perms)
src/nohup.c:      if (fd_reopen (STDIN_FILENO, "/dev/null", O_WRONLY, 0) < 0)
src/nohup.c:                ? fd_reopen (STDOUT_FILENO, file, flags, mode)
src/nohup.c:                        ? fd_reopen (STDOUT_FILENO, in_home, flags, mode)
src/split.c:      && fd_reopen (STDIN_FILENO, infile, O_RDONLY, 0) < 0)
src/stty.c:      if (fd_reopen (STDIN_FILENO, device_name, O_RDONLY | O_NONBLOCK, 0) < 0)
src/touch.c:      fd = fd_reopen (STDIN_FILENO, file,
===
This means the rest of the program can just operate on STDIN (or STDOUT).
I haven't looked at all those cases, but my point was that the call to dup2() adds an extra 40us to each invocation of touch. In the case of touch, it really only cares about the fd being STDOUT_FILENO or !STDOUT_FILENO; it doesn't need to be STDIN_FILENO exactly. With a small amount of work touch could use open() instead of fd_reopen(). The code would be somewhat simpler and it would save 40us of execution time.

Or maybe it wouldn't, since there's an extra fd open when the program terminates and we need to wait for the kernel to clean that up anyway? I don't know.

Cheers,

John
Paul Eggert
2018-10-30 05:38:36 UTC
Permalink
Prior tohttp://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=e373bb19 fd_reopen() initially did "close(desired_fd); fd = open(...)" which would always do the right thing
No, as the old code did the wrong thing for "touch /dev/stdin", whereas the
current code works.
John Steele Scott
2018-10-30 06:14:45 UTC
Permalink
Prior tohttp://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=e373bb19  fd_reopen() initially did "close(desired_fd); fd = open(...)" which would always do the right thing
No, as the old code did the wrong thing for "touch /dev/stdin", whereas the current code works.
I stand corrected. Okay, I see that we don't want to close /dev/stdin at the start of touch. Thanks for making me think of this case.

I'd much much obliged if one of you could enlighten me as to why touch needs to treat /dev/stdin as a special case after the file has been opened though. Wouldn't the following work just as well, with one less system call?

--- a/src/touch.c
+++ b/src/touch.c
@@ -132,8 +132,8 @@ touch (const char *file)
   else if (! (no_create || no_dereference))
     {
       /* Try to open FILE, creating it if necessary.  */
-      fd = fd_reopen (STDIN_FILENO, file,
-                      O_WRONLY | O_CREAT | O_NONBLOCK | O_NOCTTY, MODE_RW_UGO);
+      fd = open (file,
+                 O_WRONLY | O_CREAT | O_NONBLOCK | O_NOCTTY, MODE_RW_UGO);
 
       /* Don't save a copy of errno if it's EISDIR, since that would lead
          touch to give a bogus diagnostic for e.g., 'touch /' (assuming
@@ -166,9 +166,9 @@ touch (const char *file)
                      (no_dereference && fd == -1) ? AT_SYMLINK_NOFOLLOW : 0)
         == 0);
 
-  if (fd == STDIN_FILENO)
+  if (fd != STDOUT_FILENO)
     {
-      if (close (STDIN_FILENO) != 0)
+      if (close (fd) != 0)
         {
           error (0, errno, _("failed to close %s"), quoteaf (file));
           return false;

Cheers,

John
Paul Eggert
2018-11-12 22:23:04 UTC
Permalink
Post by John Steele Scott
I'd much much obliged if one of you could enlighten me as to why touch needs to treat /dev/stdin as a special case after the file has been opened though. Wouldn't the following work just as well, with one less system call?
--- a/src/touch.c
+++ b/src/touch.c
@@ -132,8 +132,8 @@ touch (const char *file)
   else if (! (no_create || no_dereference))
     {
       /* Try to open FILE, creating it if necessary.  */
-      fd = fd_reopen (STDIN_FILENO, file,
-                      O_WRONLY | O_CREAT | O_NONBLOCK | O_NOCTTY, MODE_RW_UGO);
+      fd = open (file,
+                 O_WRONLY | O_CREAT | O_NONBLOCK | O_NOCTTY, MODE_RW_UGO);
       /* Don't save a copy of errno if it's EISDIR, since that would lead
          touch to give a bogus diagnostic for e.g., 'touch /' (assuming
@@ -166,9 +166,9 @@ touch (const char *file)
                      (no_dereference && fd == -1) ? AT_SYMLINK_NOFOLLOW : 0)
         == 0);
-  if (fd == STDIN_FILENO)
+  if (fd != STDOUT_FILENO)
     {
-      if (close (STDIN_FILENO) != 0)
+      if (close (fd) != 0)
That patch has trouble if 'open' returns 1. The resulting fd is never closed and
for some filesystems (NFS, for example) one needs to close the fd to find out
whether the fdutimensat call actually worked.
John Steele Scott
2018-11-12 22:41:36 UTC
Permalink
Post by John Steele Scott
I'd much much obliged if one of you could enlighten me as to why touch needs to treat /dev/stdin as a special case after the file has been opened though. Wouldn't the following work just as well, with one less system call?
--- a/src/touch.c
+++ b/src/touch.c
@@ -132,8 +132,8 @@ touch (const char *file)
    else if (! (no_create || no_dereference))
      {
        /* Try to open FILE, creating it if necessary.  */
-      fd = fd_reopen (STDIN_FILENO, file,
-                      O_WRONLY | O_CREAT | O_NONBLOCK | O_NOCTTY, MODE_RW_UGO);
+      fd = open (file,
+                 O_WRONLY | O_CREAT | O_NONBLOCK | O_NOCTTY, MODE_RW_UGO);
          /* Don't save a copy of errno if it's EISDIR, since that would lead
           touch to give a bogus diagnostic for e.g., 'touch /' (assuming
@@ -166,9 +166,9 @@ touch (const char *file)
                       (no_dereference && fd == -1) ? AT_SYMLINK_NOFOLLOW : 0)
          == 0);
  -  if (fd == STDIN_FILENO)
+  if (fd != STDOUT_FILENO)
      {
-      if (close (STDIN_FILENO) != 0)
+      if (close (fd) != 0)
That patch has trouble if 'open' returns 1. The resulting fd is never closed and for some filesystems (NFS, for example) one needs to close the fd to find out whether the fdutimensat call actually worked.
Thanks Paul,

So if someone did want to wipe out that dup2 call from GNU touch, they'd need to track separately whether they opened the file or whether they are using a passed-in file descriptor, rather than relying on the value of fd to indicate that. Got it.

I'll stop flogging this dead horse now. Thanks both Paul and Assaf for your replies.

Cheers,

John

Loading...