Abstract:
Conventional membrane microphones and radio links can fail in military or rescue scenarios because of wind, extreme ambient noise, structural obstructions or electromagnetic jamming. To address these constraints we built and evaluated a non-contact laser-microphone system that classifies spoken words of the NATO phonetic alphabet (“Alpha”– “Zulu”) in near-real time. A low-cost laser is directed onto an acrylic panel; the reflection modulated by speech-induced surface vibrations are sensed by a photodiode, amplified with automatic gain control and recorded on a laptop. The raw waveform is converted into spectrograms that feed a 2-D convolutional neural network. A corpus of 2,600 utterances was collected from five male speakers (20 repetitions × 26 letters) under controlled indoor conditions. After training the model achieved 80.2 % validation accuracy on unseen repetitions, with most errors confined to acoustically similar pairs (e.g. Hotel/Echo). Unlike airborne microphones, the optical path remains effective through glass, acrylic or sealed enclosures, enabling reliable voice acquisition at standoff distances and in contaminated or hostile environments. The results demonstrate that combining laser vibrometry with lightweight deep learning yields a viable speech interface where traditional audio sensors or wireless links are unusable, offering a foundation for robust, field-deployable voice communication tools in mission-critical operations.