1. Introduction
Silicon photonic devices are very vulnerable to wavelength and temperature changes, especially for wavelength operational devices such as asymmetric Mach-Zehnder interferometer (AMZI)- or micro-rings- based filters and modulators. A slight wavelength or temperature shift can cause serious performance degradation or even functional failures [1]. Although operation conditions can be figured out and maintained by a series of measures including careful pre-calibration, pre-building temperature look-up table, and correct temperature monitor, it is still quite challenging to use these devices under on-field environment and side-by-side integration with electronic modules where unexpected local temperature fluctuations may occur due to workload fluctuations in electronics and cannot be correctly reflected by a pre-built temperature table. Furthermore, silicon photonic AMZI and ring resonators are very sensitive to fabrication errors, and thus the pre-calibrated condition for one device cannot be applied to the other, which induces great time and cost consumption for one-by-one calibration. Wavelength reallocation is also hindered by re-aligning these wavelength filters. Therefore, we need a quick and real-time method, totally different from above pre-calibrated one, to automatically align and recover the working wavelength upon the input wavelength and temperature changes. In this work, we report a reinforced multi-agent adaptive-action Q-learning method for this purpose and experimentally demonstrate a continuous long-time (~10 hours) automatic blind alignment for the working wavelength over full-C-band for a silicon photonic device containing two vernier ring filters and two MZI switches [2], a more difficult structure than a single ring, which has not been realized in previous works [3]–[6]. Compared to previous wavelength locking schemes such as neural network [3], feedback control [4], [5], and scattering monitor by camera [6], this Q-learning method has the following merits: (1) General applicability for various devices. (2) Model-less blind control without needing device structure information and pre-building device models. (3) Simultaneous locking for multiple devices without needing to build a one-to-many feedback model and circuits. (4) Without needing pre-calibration, building look-up table and temperature measurement. Building a temperature look-up table is possible for a single and independent device, whereas it is very formidable for multiple coupled devices of different temperature-wavelength properties. (5) No tolerance range limitation in principle for the wavelength and temperature changes. (6) Simple and light algorithm (implementable in microcontroller units (MCU) which already exists in transceivers). To our knowledge, this work for the first time introduces reinforced Q-learning into controlling photonic devices and reports automatic working wavelength alignment over wide input wavelength shifts and temperature changes for vernier ring filters.