TY - JOUR
T1 - I Am an Earphone and I Can Hear My User's Face
T2 - Facial Landmark Tracking Using Smart Earphones
AU - Zhang, Shijia
AU - Lu, Taiting
AU - Zhou, Hao
AU - Liu, Yilin
AU - Liu, Runze
AU - Gowda, Mahanth
N1 - Publisher Copyright:
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2023/12/16
Y1 - 2023/12/16
N2 - This article presents EARFace, a system that shows the feasibility of tracking facial landmarks for 3D facial reconstruction using in-ear acoustic sensors embedded within smart earphones. This enables a number of applications in the areas of facial expression tracking, user interfaces, AR/VR applications, affective computing, and accessibility, among others. Although conventional vision-based solutions break down under poor lighting and occlusions, and also suffer from privacy concerns, earphone platforms are robust to ambient conditions while being privacy-preserving. In contrast to prior work on earable platforms that perform outer-ear sensing for facial motion tracking, EARFace shows the feasibility of completely in-ear sensing with a natural earphone form factor, thus enhancing the comfort levels of wearing. The core intuition exploited by EARFace is that the shape of the ear canal changes due to the movement of facial muscles during facial motion. EARFace tracks the changes in shape of the ear canal by measuring ultrasonic channel frequency response of the inner ear, ultimately resulting in tracking of the facial motion. A transformer-based machine learning model is designed to exploit spectral and temporal relationships in the ultrasonic channel frequency response data to predict the facial landmarks of the user with an accuracy of 1.83 mm. Using these predicted landmarks, a 3D graphical model of the face that replicates the precise facial motion of the user is then reconstructed. Domain adaptation is further performed by adapting the weights of layers using a group-wise and differential learning rate. This decreases the training overhead in EARFace. The transformer-based machine learning model runs on smart phone devices with a processing latency of 13ms and an overall low power consumption profile. Finally, usability studies indicate higher levels of comforts of wearing EARFace's earphone platform in comparison with alternative form factors.
AB - This article presents EARFace, a system that shows the feasibility of tracking facial landmarks for 3D facial reconstruction using in-ear acoustic sensors embedded within smart earphones. This enables a number of applications in the areas of facial expression tracking, user interfaces, AR/VR applications, affective computing, and accessibility, among others. Although conventional vision-based solutions break down under poor lighting and occlusions, and also suffer from privacy concerns, earphone platforms are robust to ambient conditions while being privacy-preserving. In contrast to prior work on earable platforms that perform outer-ear sensing for facial motion tracking, EARFace shows the feasibility of completely in-ear sensing with a natural earphone form factor, thus enhancing the comfort levels of wearing. The core intuition exploited by EARFace is that the shape of the ear canal changes due to the movement of facial muscles during facial motion. EARFace tracks the changes in shape of the ear canal by measuring ultrasonic channel frequency response of the inner ear, ultimately resulting in tracking of the facial motion. A transformer-based machine learning model is designed to exploit spectral and temporal relationships in the ultrasonic channel frequency response data to predict the facial landmarks of the user with an accuracy of 1.83 mm. Using these predicted landmarks, a 3D graphical model of the face that replicates the precise facial motion of the user is then reconstructed. Domain adaptation is further performed by adapting the weights of layers using a group-wise and differential learning rate. This decreases the training overhead in EARFace. The transformer-based machine learning model runs on smart phone devices with a processing latency of 13ms and an overall low power consumption profile. Finally, usability studies indicate higher levels of comforts of wearing EARFace's earphone platform in comparison with alternative form factors.
UR - http://www.scopus.com/inward/record.url?scp=85183324188&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85183324188&partnerID=8YFLogxK
U2 - 10.1145/3614438
DO - 10.1145/3614438
M3 - Article
AN - SCOPUS:85183324188
SN - 2577-6207
VL - 5
JO - ACM Transactions on Internet of Things
JF - ACM Transactions on Internet of Things
IS - 1
M1 - 1
ER -