Debugging a Golang Bug with Non-Blocking Reads
One of the ways we stream data from databases to the end-user is with a named pipe. Specifically, we do this with DuckDB to stream JSON-formatted results directly to the user. In the process of building this, we discovered a bug in how Go handles reading data from pipes.
The Original Code
The original code for reading data looks like this. First we create a named pipe:
$ mkfifo p.pipe
And then we generate data and pass it to the pipe. Instead of DuckDB, I'll use a simpler example to write data:
import "os"
go func() {
pipe, _ := os.OpenFile("p.pipe", os.O_WRONLY|os.O_APPEND, os.ModeNamedPipe)
for range 5 {
pipe.WriteString("Hello")
time.Sleep(1000 * time.Millisecond)
}
err := pipe.Close()
}()
pipe, err := os.OpenFile("p.pipe", os.O_RDONLY|syscall.O_NONBLOCK, os.ModeNamedPipe)
buf := make([]byte, 65536)
for {
n, err := pipe.Read(buf)
}
You'll notice a few things:
time.Sleep()
This lets us simulate a slow writer - perhaps a slow query or network connection.O_NONBLOCK
It's possible that our writer never successfully opens the file for writing - perhaps due to an error elsewhere. If this happens, our reader will block indefinitely waiting for data. Non-blocking IO lets us coordinate with the writer.
The Bug
The expected behavior is that the for {}
loop runs infinitely. However, on my M1 Mac, this
is not the case. The code blocks on pipe.Read()
. If you're on a Mac, you can try the full program
in this gist.
What's strange is that this code does work:
- On Linux
- If we remove
time.Sleep()
from the writer!
After testing and coming up with the simplest reproducible example, I filed a bug report. Much to my surprise, someone identified the issue and wrote a patch the same day. There a lot of information in the ticket. The root causes are:
- Go's implementation on darwin uses kqueue() to do non-blocking reads, however, named pipes were excluded from part of that logic.
- There was a race condition where, if the writer does not close before the reader reads all data, then we skip the logic to poll.
Hopefully a fix will come out in go 1.23.
The Workaround
Go's os.File interface is a wrapper around underlying syscalls. What if we made
those syscalls directly? We can do exactly that with the syscall
package. Our revised
reader looks like this:
import "syscall"
pipe, err := syscall.Open("p.pipe", os.O_RDONLY|syscall.O_NONBLOCK, 0666)
// pipe, err := os.OpenFile("p.pipe", os.O_RDONLY|syscall.O_NONBLOCK, os.ModeNamedPipe)
buf := make([]byte, 65536)
for {
n, err := syscall.Read(pipe, buf)
// n, err := pipe.Read(buf)
}
This works! Our for-loop no longer blocks due to this bug. We have to do a little more bookkeeping ourselves but the logic is identical to before.
You can see the full solution we implemented here. Feedback welcome!
Conclusion
We're building software to let builders use analytical databases without the plumbing. We're doing a lot of fun systems-y kinds of things in Go. If this is interesting to you then let's talk!