Centralizes docs on Ino, specifically zero (#1560)

Signed-off-by: Adrian Cole <adrian@tetrate.io>
This commit is contained in:
Crypt Keeper
2023-07-07 08:33:50 +08:00
committed by GitHub
parent 0ec3c852d6
commit 6a9088b46b
16 changed files with 109 additions and 27 deletions

View File

@@ -931,10 +931,8 @@ Firstly, wasi-testsuite does not require the inode of dot-dot, possibly because
the wasip2 adapter doesn't populate it (but we don't really know why). the wasip2 adapter doesn't populate it (but we don't really know why).
The only other reason to populate it would be to avoid wasi-libc's stat fanout The only other reason to populate it would be to avoid wasi-libc's stat fanout
when it is missing. However, the inode for dot-dot is not cached, and is also when it is missing. However, wasi-libc explicitly doesn't fan-out to lstat on
likely not possible to get on a pseudo file. Even if we could get it, we would the ".." entry on a zero ino.
have to use stat. In other words, pre-populating this would have the same cost
as waiting for something to call stat instead.
Fetching dot-dot's inode despite the above not only doesn't help wasi-libc, but Fetching dot-dot's inode despite the above not only doesn't help wasi-libc, but
it also hurts languages that don't use it, such as Go. These languages would it also hurts languages that don't use it, such as Go. These languages would
@@ -945,10 +943,70 @@ In summary, there are no significant upsides in attempting to pre-fetch
dot-dot's inode, and there are downsides to doing it anyway. dot-dot's inode, and there are downsides to doing it anyway.
See See
* https://github.com/WebAssembly/wasi-libc/pull/345 * https://github.com/WebAssembly/wasi-libc/blob/bd950eb128bff337153de217b11270f948d04bb4/libc-bottom-half/cloudlibc/src/libc/dirent/readdir.c#L87-L94
* https://github.com/WebAssembly/wasi-testsuite/blob/main/tests/rust/src/bin/fd_readdir.rs#L108 * https://github.com/WebAssembly/wasi-testsuite/blob/main/tests/rust/src/bin/fd_readdir.rs#L108
* https://github.com/bytecodealliance/preview2-prototyping/blob/e4c04bcfbd11c42c27c28984948d501a3e168121/crates/wasi-preview1-component-adapter/src/lib.rs#L1037 * https://github.com/bytecodealliance/preview2-prototyping/blob/e4c04bcfbd11c42c27c28984948d501a3e168121/crates/wasi-preview1-component-adapter/src/lib.rs#L1037
### Why don't we require inodes to be non-zero?
We don't require a non-zero value for `Dirent.Ino` because doing so can prevent
a real one from resolving later via `Stat_t.Ino`.
We define `Ino` like `d_ino` in POSIX which doesn't special-case zero. It can
be zero for a few reasons:
* The file is not a regular file or directory.
* The underlying filesystem does not support inodes. e.g. embed:fs
* A directory doesn't include inodes, but a later stat can. e.g. Windows
* The backend is based on wasi-filesystem (a.k.a wasip2), which has
`directory_entry.inode` optional, and might remove it entirely.
There are other downsides to returning a zero inode in widely used compilers:
* File equivalence utilities, like `os.SameFile` will not work.
* wasi-libc's `wasip1` mode will call `lstat` and attempt to retrieve a
non-zero value (unless the entry is named "..").
A new compiler may accidentally skip a `Dirent` with a zero `Ino` if emulating
a non-POSIX function and re-using `Dirent.Ino` for `d_fileno`.
* Linux `getdents` doesn't define `d_fileno` must be non-zero
* BSD `getdirentries` is implementation specific. For example, OpenBSD will
return dirents with a zero `d_fileno`, but Darwin will skip them.
The above shouldn't be a problem, even in the case of BSD, because `wasip1` is
defined more in terms of `getdents` than `getdirentries`. The bottom half of
either should treat `wasip1` (or any similar ABI such as wasix or wasip2) as a
different operating system and either use different logic that doesn't skip, or
synthesize a fake non-zero `d_fileno` when `d_ino` is zero.
However, this has been a problem. Go's `syscall.ParseDirent` utility is shared
for all `GOOS=unix`. For simplicity, this abstracts `direntIno` with data from
`d_fileno` or `d_ino`, and drops if either are zero, even if `d_fileno` is the
only field with zero explicitly defined. This led to a change to special case
`GOOS=wasip1` as otherwise virtual files would be unconditionally skipped.
In practice, this problem is rather unique due to so many compilers relying on
wasi-libc, which tolerates a zero inode. For example, while issues were
reported about the performance regression when wasi-libc began doing a fan-out
on zero `Dirent.Ino`, no issues were reported about dirents being dropped as a
result.
In summary, rather than complicating implementation and forcing non-zero inodes
for a rare case, we permit zero. We instead document this topic thoroughly, so
that emerging compilers can re-use the research and reference it on conflict.
We also document that `Ino` should be non-zero, so that users implementing that
field will attempt to get it.
See
* https://github.com/WebAssembly/wasi-filesystem/pull/81
* https://github.com/WebAssembly/wasi-libc/blob/bd950eb128bff337153de217b11270f948d04bb4/libc-bottom-half/cloudlibc/src/libc/dirent/readdir.c#L87-L94
* https://linux.die.net/man/3/getdents
* https://www.unix.com/man-page/osx/2/getdirentries/
* https://man.openbsd.org/OpenBSD-5.4/getdirentries.2
* https://github.com/golang/go/blob/go1.20/src/syscall/dirent.go#L60-L102
* https://go-review.googlesource.com/c/go/+/507915
## sys.Walltime and Nanotime ## sys.Walltime and Nanotime
The `sys` package has two function types, `Walltime` and `Nanotime` for real The `sys` package has two function types, `Walltime` and `Nanotime` for real

View File

@@ -1018,7 +1018,7 @@ func writeDirents(buf []byte, dirents []fsapi.Dirent, d_next uint64, direntCount
} }
// writeDirent writes DirentSize bytes // writeDirent writes DirentSize bytes
func writeDirent(buf []byte, dNext uint64, ino uint64, dNamlen uint32, dType fs.FileMode) { func writeDirent(buf []byte, dNext uint64, ino fsapi.Ino, dNamlen uint32, dType fs.FileMode) {
le.PutUint64(buf, dNext) // d_next le.PutUint64(buf, dNext) // d_next
le.PutUint64(buf[8:], ino) // d_ino le.PutUint64(buf[8:], ino) // d_ino
le.PutUint32(buf[16:], dNamlen) // d_namlen le.PutUint32(buf[16:], dNamlen) // d_namlen

View File

@@ -7,16 +7,38 @@ import (
"time" "time"
) )
// Dirent is an entry read from a directory. // Ino is the file serial number, or zero if unknown.
//
// The inode is used for a file equivalence, like os.SameFile, so any constant
// value will interfere that.
//
// When zero is returned by File.Readdir, certain callers will fan-out to
// File.Stat to retrieve a non-zero value. Callers using this for darwin's
// definition of `getdirentries` conflate zero `d_fileno` with a deleted file
// and skip the entry. See /RATIONALE.md for more on this.
type Ino = uint64
// FileType is fs.FileMode masked on fs.ModeType. For example, zero is a
// regular file, fs.ModeDir is a directory and fs.ModeIrregular is unknown.
//
// Note: This is defined by Linux, not POSIX.
type FileType = fs.FileMode
// Dirent is an entry read from a directory via File.Readdir.
// //
// # Notes // # Notes
// //
// - This extends `dirent` defined in POSIX with some fields defined by // - This extends `dirent` defined in POSIX with some fields defined by
// Linux. See https://man7.org/linux/man-pages/man3/readdir.3.html and // Linux. See https://man7.org/linux/man-pages/man3/readdir.3.html and
// https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/dirent.h.html // https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/dirent.h.html
// - This has a subset of fields defined in Stat_t. Notably, there is no
// field corresponding to Stat_t.Dev because that value will be constant
// for all files in a directory. To get the Dev value, call File.Stat on
// the directory File.Readdir was called on.
type Dirent struct { type Dirent struct {
// Ino is the file serial number, or zero if not available. // Ino is the file serial number, or zero if not available. See Ino for
Ino uint64 // more details including impact returning a zero value.
Ino Ino
// Name is the base name of the directory entry. Empty is invalid. // Name is the base name of the directory entry. Empty is invalid.
Name string Name string

View File

@@ -55,7 +55,7 @@ type File interface {
// //
// - Implementations should cache this result. // - Implementations should cache this result.
// - This combined with Dev can implement os.SameFile. // - This combined with Dev can implement os.SameFile.
Ino() (uint64, syscall.Errno) Ino() (Ino, syscall.Errno)
// IsDir returns true if this file is a directory or an error there was an // IsDir returns true if this file is a directory or an error there was an
// error retrieving this information. // error retrieving this information.

View File

@@ -13,8 +13,9 @@ type Stat_t struct {
// Dev is the device ID of device containing the file. // Dev is the device ID of device containing the file.
Dev uint64 Dev uint64
// Ino is the file serial number. // Ino is the file serial number, or zero if not available. See Ino for
Ino uint64 // more details including impact returning a zero value.
Ino Ino
// Uid is the user ID that owns the file, or zero if unsupported. // Uid is the user ID that owns the file, or zero if unsupported.
// For example, this is unsupported on some virtual filesystems or windows. // For example, this is unsupported on some virtual filesystems or windows.

View File

@@ -97,7 +97,7 @@ func (UnimplementedFile) Dev() (uint64, syscall.Errno) {
} }
// Ino implements File.Ino // Ino implements File.Ino
func (UnimplementedFile) Ino() (uint64, syscall.Errno) { func (UnimplementedFile) Ino() (Ino, syscall.Errno) {
return 0, 0 return 0, 0
} }

View File

@@ -139,7 +139,8 @@ func synthesizeDotEntries(f *FileEntry) ([]fsapi.Dirent, syscall.Errno) {
} }
result := [2]fsapi.Dirent{} result := [2]fsapi.Dirent{}
result[0] = fsapi.Dirent{Name: ".", Ino: dotIno, Type: fs.ModeDir} result[0] = fsapi.Dirent{Name: ".", Ino: dotIno, Type: fs.ModeDir}
// See /RATIONALE.md for why we don't attempt to get an inode for ".." // See /RATIONALE.md for why we don't attempt to get an inode for ".." and
// why in wasi-libc this won't fan-out either.
result[1] = fsapi.Dirent{Name: "..", Ino: 0, Type: fs.ModeDir} result[1] = fsapi.Dirent{Name: "..", Ino: 0, Type: fs.ModeDir}
return result[:], 0 return result[:], 0
} }

View File

@@ -27,7 +27,7 @@ func (r *lazyDir) Dev() (uint64, syscall.Errno) {
} }
// Ino implements the same method as documented on fsapi.File // Ino implements the same method as documented on fsapi.File
func (r *lazyDir) Ino() (uint64, syscall.Errno) { func (r *lazyDir) Ino() (fsapi.Ino, syscall.Errno) {
if f, ok := r.file(); !ok { if f, ok := r.file(); !ok {
return 0, syscall.EBADF return 0, syscall.EBADF
} else { } else {

View File

@@ -124,7 +124,7 @@ type cachedStat struct {
dev uint64 dev uint64
// dev is the same as fsapi.Stat_t Ino. // dev is the same as fsapi.Stat_t Ino.
ino uint64 ino fsapi.Ino
// isDir is fsapi.Stat_t Mode masked with fs.ModeDir // isDir is fsapi.Stat_t Mode masked with fs.ModeDir
isDir bool isDir bool
@@ -132,7 +132,7 @@ type cachedStat struct {
// cachedStat returns the cacheable parts of fsapi.Stat_t or an error if they // cachedStat returns the cacheable parts of fsapi.Stat_t or an error if they
// couldn't be retrieved. // couldn't be retrieved.
func (f *fsFile) cachedStat() (dev, ino uint64, isDir bool, errno syscall.Errno) { func (f *fsFile) cachedStat() (dev uint64, ino fsapi.Ino, isDir bool, errno syscall.Errno) {
if f.cachedSt == nil { if f.cachedSt == nil {
if _, errno = f.Stat(); errno != 0 { if _, errno = f.Stat(); errno != 0 {
return return
@@ -148,7 +148,7 @@ func (f *fsFile) Dev() (uint64, syscall.Errno) {
} }
// Ino implements the same method as documented on fsapi.File // Ino implements the same method as documented on fsapi.File
func (f *fsFile) Ino() (uint64, syscall.Errno) { func (f *fsFile) Ino() (fsapi.Ino, syscall.Errno) {
_, ino, _, errno := f.cachedStat() _, ino, _, errno := f.cachedStat()
return ino, errno return ino, errno
} }
@@ -435,7 +435,7 @@ func readdir(f readdirFile, path string, n int) (dirents []fsapi.Dirent, errno s
dirents = make([]fsapi.Dirent, 0, len(fis)) dirents = make([]fsapi.Dirent, 0, len(fis))
// linux/darwin won't have to fan out to lstat, but windows will. // linux/darwin won't have to fan out to lstat, but windows will.
var ino uint64 var ino fsapi.Ino
for fi := range fis { for fi := range fis {
t := fis[fi] t := fis[fi]
if ino, errno = inoFromFileInfo(path, t); errno != 0 { if ino, errno = inoFromFileInfo(path, t); errno != 0 {

View File

@@ -151,7 +151,7 @@ func TestFileIno(t *testing.T) {
tests := []struct { tests := []struct {
name string name string
fs fs.FS fs fs.FS
expectedIno uint64 expectedIno fsapi.Ino
}{ }{
{name: "os.DirFS", fs: dirFS, expectedIno: st.Ino}, {name: "os.DirFS", fs: dirFS, expectedIno: st.Ino},
{name: "embed.api.FS", fs: embedFS}, {name: "embed.api.FS", fs: embedFS},

View File

@@ -44,7 +44,7 @@ type osFile struct {
// cachedStat returns the cacheable parts of fsapi.Stat_t or an error if they // cachedStat returns the cacheable parts of fsapi.Stat_t or an error if they
// couldn't be retrieved. // couldn't be retrieved.
func (f *osFile) cachedStat() (dev, ino uint64, isDir bool, errno syscall.Errno) { func (f *osFile) cachedStat() (dev uint64, ino fsapi.Ino, isDir bool, errno syscall.Errno) {
if f.cachedSt == nil { if f.cachedSt == nil {
if _, errno = f.Stat(); errno != 0 { if _, errno = f.Stat(); errno != 0 {
return return
@@ -60,7 +60,7 @@ func (f *osFile) Dev() (uint64, syscall.Errno) {
} }
// Ino implements the same method as documented on fsapi.File // Ino implements the same method as documented on fsapi.File
func (f *osFile) Ino() (uint64, syscall.Errno) { func (f *osFile) Ino() (fsapi.Ino, syscall.Errno) {
_, ino, _, errno := f.cachedStat() _, ino, _, errno := f.cachedStat()
return ino, errno return ino, errno
} }

View File

@@ -76,7 +76,7 @@ func (r *readFile) Dev() (uint64, syscall.Errno) {
} }
// Ino implements the same method as documented on fsapi.File. // Ino implements the same method as documented on fsapi.File.
func (r *readFile) Ino() (uint64, syscall.Errno) { func (r *readFile) Ino() (fsapi.Ino, syscall.Errno) {
return r.f.Ino() return r.f.Ino()
} }

View File

@@ -31,7 +31,7 @@ func statFile(f fs.File) (fsapi.Stat_t, syscall.Errno) {
return defaultStatFile(f) return defaultStatFile(f)
} }
func inoFromFileInfo(_ string, t fs.FileInfo) (ino uint64, err syscall.Errno) { func inoFromFileInfo(_ string, t fs.FileInfo) (ino fsapi.Ino, err syscall.Errno) {
if d, ok := t.Sys().(*syscall.Stat_t); ok { if d, ok := t.Sys().(*syscall.Stat_t); ok {
ino = d.Ino ino = d.Ino
} }

View File

@@ -34,7 +34,7 @@ func statFile(f fs.File) (fsapi.Stat_t, syscall.Errno) {
return defaultStatFile(f) return defaultStatFile(f)
} }
func inoFromFileInfo(_ string, t fs.FileInfo) (ino uint64, err syscall.Errno) { func inoFromFileInfo(_ string, t fs.FileInfo) (ino fsapi.Ino, err syscall.Errno) {
if d, ok := t.Sys().(*syscall.Stat_t); ok { if d, ok := t.Sys().(*syscall.Stat_t); ok {
ino = d.Ino ino = d.Ino
} }

View File

@@ -33,7 +33,7 @@ func statFile(f fs.File) (fsapi.Stat_t, syscall.Errno) {
return defaultStatFile(f) return defaultStatFile(f)
} }
func inoFromFileInfo(_ string, t fs.FileInfo) (ino uint64, err syscall.Errno) { func inoFromFileInfo(_ string, t fs.FileInfo) (ino fsapi.Ino, err syscall.Errno) {
return return
} }

View File

@@ -72,7 +72,7 @@ func statFile(f fs.File) (fsapi.Stat_t, syscall.Errno) {
} }
// inoFromFileInfo uses stat to get the inode information of the file. // inoFromFileInfo uses stat to get the inode information of the file.
func inoFromFileInfo(filePath string, t fs.FileInfo) (ino uint64, errno syscall.Errno) { func inoFromFileInfo(filePath string, t fs.FileInfo) (ino fsapi.Ino, errno syscall.Errno) {
if filePath == "" { if filePath == "" {
// This is a fs.File backed implementation which doesn't have access to // This is a fs.File backed implementation which doesn't have access to
// the original file path. // the original file path.