Endre Simo has recently ported the Pigo face detection library from the Go runtime to web browsers using WebAssembly. The port illustrates the performance potential of WebAssembly in order to run heavy-weight desktop applications in a browser context.
Simo is a senior software developer and open-source contributor to several popular image-processing projects. Triangle, for instance, for the most artistic people among developers, takes an image and converts it to computer-generated art using Delaunay triangulation. Caire resizes images in a way that respects the main content of the image. An example is shown below.
InfoQ interviewed Simo, and asked about the benefits of the port and the technical challenges encountered. Answers have been edited for clarity.
InfoQ: You have authored or contributed to a number of open-source projects, mostly tackling image processing and image generation problems. What brought you to machine learning and live face detection?
Endre Simo: I have a long-time interest in face detection and optical flow in general, which, in turn, awoke a researcher and data-analyst side in me. Because in the last couple of years I was involved in image processing, computer vision and all these sort of things, and since I am also an active contributor in the Go community, I thought that it was the proper time to undertake a project bringing about something which the Go programmers were really missing: a very lightweight, platform-agnostic, pure Go face detection library, which does not require any third-party dependency.
At the time when I started to think about the idea of developing a face detection library in Go, the only existing library for face detection and optical flow targeting the Go language was GoCV, a Go (C++) binding for OpenCV, but many can acknowledge that working with OpenCV is sometimes daunting, since it requires a lot of dependencies and there are major differences between versions which could break existing code.
InfoQ: BSD-licensed OpenCV also provides facial recognition capabilities and wrappers for a number of other languages, including JavaScript. What drove you to write Pigo, what features does it provide, and what differentiates it from OpenCV?
Simo: First of all, I do not really like wrappers or bindings around an existing library, even though it might help in some circumstances to interoperate with some low-level code (like C, for example) without the need to reimplement the code base in the targeted language. Let me explain why:
- First, it forces you to dig deeper into the library own architecture in order to transpose it to the desired language
- Second, which is more important, it costs you with slower build times since it needs to transpose C code to the targeted language. Not to mention that the deployment becomes more complicated, and you can forget about a single static binary file, as is the case with the Go binaries.
So the major takeaway in my decision to start working on a simple computer vision library suitable specifically for face detection was the huge time needed by GoCV at the first compilation. The Pigo face detection library (which by the way is based on the Object Detection with Pixel Intensity Comparisons Organized in Decision Trees paper) is very lightweight, it has zero dependencies, exposes a very simple and elegant API, and more importantly is very fast, since there is no need for image preprocessing prior to detection. One of the most important features of Go is the generation of cross-build executables. Being a library 100% developed in Go thus means that it is very easy to upload the binary file to small platforms like Raspberry Pi, where space constraints are important. This is not the case with OpenCV (GoCV) which requires a lot of resources and produces slower build times.
In terms of features it might not cover all the functionalities of OpenCV since the latter is a huge library with a big amount of functions included for numerical analysis and geometrical transformations, but Pigo does very well what it has been purposed to, i.e. detecting faces. The first version of the library could only do face detection but during the development new features have been added like pupils/eyes detection and facial landmark points detection. My desire is to develop it even further and have it do gesture recognition. This will be a major takeaway and also a heavy task since it implies to work with pre-trained data adapted to the binary data structure required by the library, or to put it otherwise to train a data set which is adaptable to the data structure of a binary cascade classification.
InfoQ: Why porting Pigo to WebAssembly?
Simo: The idea of porting Pigo to WebAssembly originated from the simple fact that the Go ecosystem was missing a well-founded and generally available library for accessing the webcam. The only library I found targeted the Linux environment, which obviously was not an option. So in order to prove the library real-time face detection capabilities, I opted to create the demos in Python and communicate with the Go code (the detection part has been written in Go) through shared object (.so) libraries. I did not obtain the desired results, the frame rates were pretty bad, so I thought that I will try integrating/porting to WebAssembly.
InfoQ: Can you tell us about the process and technical challenges of porting Pigo to WebAssembly? How easy is it to port a Go program to Wasm?
Simo: Porting Pigo to WebAssembly was a delightful experience. The implementation went smoothly without any major drawbacks. This is probably due to the well written
syscall/js
Go API. Possibly, the only thing which you need to be aware of if you are working with thesyscall
API is that the JavaScript callback functions should always be invoked inside a goroutine, otherwise you will encounter deadlock. However, if you have enough experience with Go’s concurrency mechanism, that shouldn’t pose any problems. Another aspect is related to how you should invoke the Javascript functions as Go functions, since thesyscall/js
package has been developed I think having the Javascript coder in mind. In the end, this is only a matter of experience.Another important aspect that a Wasm integrator should keep in mind is that as WebAssembly runs in the browser, it is no longer possible to access a file from the persistent storage. This means that the only option for accessing the files required by an application is through some
http
calls supported by the JavaScript language, like thefetch
method. This can be considered a drawback since it imposes some kind of limitations. First, you need to have an internet connection for accessing some external assets. Second, it could introduce some latency between the request and response. It is much faster to access a file located on the running system than to access a file through a web connection. This can pose noticeable problems (memory consumption in particular) when you have to deal with a lot of external assets: either you load all the assets prior to running the application, or you need to fetch the new assets on the fly – which can suspend the application ocasionally.
InfoQ: What performance improvements did you notice, if any?
Simo: The Wasm integration has proved that the library is capable of real-time face detection. The registered time frames were well above 50 FPS, which was not the case with the Python integration. I notified some small drops in FPS when I enabled the facial landmark points detection functions, but this is somehow obvious since it needs to run the same detection algorithm over the 15 facial points in total.
[Example of facial landmarks detection as performed by Pigo]
InfoQ: You now have face detection running in the browser. How do you see that being used in connection with other web applications?
Simo: Running a non-JavaScript face detection library in the web browser gives you a lot of satisfaction not just because it is running in the browser, since there are many other face detection libraries targeting the Javascript language, but because you know that it was specifically designed for the Go community. That means someone familiar with the Go language can pick up the implementation details and understand the API more easily.
The Wasm port of the library is a proof of concept that Go libraries could be easily transposed to WebAssembly. I see a tremendous potential in this port because it opens the door to a lot of creative development. Furthermore, I’ve presented a few possible use cases as Python demos (I might transpose them to Wasm at some time), for example a Snapchat-like face masquerade, face blurring, blink and talk detection, face triangulation etc. I have also integrated it into Caire, where it has been used to avoid face deformation on images with dense content. With the face detection activated, the algorithm tries to avoid cropping the pixels inside the detected faces, retaining the face zone unaltered.
InfoQ: How long did it take you to have a working wasm port? Did you enjoy the experience? Do you encourage developers to target WebAssembly today, or do you assess that it is wiser to wait for the technology to mature (in bundle size, features, tooling, ecosystem, etc.)?
Simo: Since I worked on the Wasm implementation part-time, I haven’t really counted how many hours it took to have a working solution, but it went pretty smooth. I really encourage developers to target WebAssembly because it has great potential, and it’s beginngin to have wide adoption among many programmers. Many languages already offer support for WebAssembly, so I think it will have a bright feature in the following years, considering that WASI (WebAssembly System Interface) which is a subgroup of Wasm is also getting the interest of systems programmers.