Researchers from the Human Sensing Laboratory at Carnegie Mellon University (CMU) have published a paper on DensePose from WiFi, an AI model which can detect the pose of multiple humans in a room using only the signals from WiFi transmitters. In experiments on real-world data, the algorithm achieves an average precision of 87.2 at the 50% IOU threshold.
Because WiFi signals are one-dimensional, most of the previous methods for detecting people using WiFi can only locate a person's center of mass, and usually can detect only one person. The CMU technique incorporates amplitude and phase data from three WiFi signals captured by three different receivers. This produces a 3x3 feature map which can be passed to a neural network that produces UV maps of human body surfaces, which can localize multiple people as well as determining their poses. According to the researchers,
The performance of our work is still limited by the public training data in the field of WiFi-based perception, especially under different layouts. In future work, we also plan to collect multi-layout data and extend our work to predict 3D human body shapes from WiFi signals. We believe that the advanced capability of dense perception could empower the WiFi device as a privacy-friendly, illumination-invariant, and cheap human sensor compared to RGB cameras and Lidars.
The process begins by collecting five Channel-state-information (CSI) samples, which are the "the ratio between the transmitted signal wave and the received signal wave." Each sample contains 30 frequencies, and is taken from signals sent from each of three transmitters to three receivers; the result is two raw-data tensors of shape 150 x 3 x 3, one for phase and one for amplitude. This is converted by "modality translation network" into a 1280 x 720 image tensor. This is then processed as if it were an image captured by a camera, using the state-of-the-art pose detection network DensePose.
Translating WiFi Signals to a 2D Image. Source: https://arxiv.org/abs/2301.00250
The model was evaluated on a dataset of WiFi signals paired with video recordings of scenes containing from one to five people. The scenes were recorded in offices and classrooms. Although there are no annotations of the video to give ground truth for the evaluation, the researchers applied pretrained DensePose models to the videos to create pseudo ground truth. Overall, the model could "effectively detect the approximate locations of human bounding boxes" and the pose of torsos, but struggled with detecting limbs.
In a Hacker News discussion about the work, one user pointed out that in 2020 the IEEE announced 802.11bf project for WLAN sensing which is targeted for release in 2024. Another user said,
If [WiFi sensing] can detect breathing of humans reliably, then it solves a huge problem in having home automation with automatic lighting - particularly in bathrooms. I've never had a decent bathroom occupancy sensor (they all end up wanting to detect fairly large motions) - the obvious solution is AI with a camera but for obvious reasons no, but if a couple of base stations can localize person positions to rooms in the house (and provide other services) then that kind of solves the whole issue!
Although the CMU researchers have not released their code or model, the Papers with Code website links to GitHub repositories for three other similar projects.