18.337 Project:
Parallel Stochastic Optimization of Ultrafast Optical Filters
Midterm Update
Project Overview
I’m interested in the design of systems of filters for use in femtosecond laser cavities to control intra-cavity dispersion. The optical bandwidths occurring in few-cycle lasers are so wide that propagation of the pulse through dispersive materials in the laser threatens to destroy the pulse. Mirrors whose dispersion is the opposite of that normally encountered in materials must be used to counteract the pulse spreading that would otherwise occur. To produce pulses that are only a few femtosecond in length, it is necessary to build mirrors whose spectral group delay is engineered to within hundreds of attoseconds over an octave of bandwidth.
Unfortunately, dielectric filters are notoriously difficult to design when all you care about is reflectivity, let alone when the phase properties must also be optimized. Thin-film mirrors are IIR filters, and thus there is no efficient algorithm to produce a globally optimal design. As such, nonlinear optimization techniques must be used to find locally optimal solutions. This involves optimizing a merit function with a few hundred parameters (the layer thicknesses), minimizing a weighted sum of group delay and reflectance error over a set of a few hundred wavelengths.
Parallel Algorithm
To try to improve upon the local minimization typically done, I’d like to try parallel stochastic optimization techniques. One obvious global optimization approximation technique is simulated annealing. SA is a serial algorithm, but since it is stochastic there is advantage to running many such optimizations in parallel and taking the best result of the group. Work has already been done (Dr. Lee at the University of Utah) showing that the advantage gained is significant and proposing an adaptive method for annealing scheduling.
Another way to go is to keep sequential SA, and focus on parallelizing the computation of the merit function. My guess is that this is not a good idea, since merit function computation is quick enough that communications overhead with almost certainly dominate, making the parallel algorithm slower than a single-processor implementation.
Given that parallel SA has already been studied, the focus of my project will be how to best implement it for the optimization of the optical filters discussed above. First, there is the question of how to best parallelize the SA. How often do you stop and compare results between the processors? Clearly, at some point the best current design should be shared among all processors, but doing so too often will be inefficient and would negate the whole point of the parallelism. Second, should each processor start with the same design or with initial perturbations in different directions?
A second candidate is some sort of genetic algorithm, run in parallel but with occasional crossover between the best candidates of each processor.
Implementation Issues
One question I’m really unsure of at this point is which high-level language to use. Ideally, this would be done in C using MPI since the computation of thin-film reflectivity is not very efficient when implemented in an interpreted language like MATLAB or Mathematica. (Very little time is spent in library functions, and there are nested loops.) Empirically, I’ve found that C code runs almost an order of magnitude faster than MATLAB code, even with the new MATLAB JIT compiler. Thus, I’m worried that if I do this in MATLAB*P or Grid Mathematica, I’ll end up with an algorithm that is harder faster than a serial one written in C. This is especially the case for MATLAB*P, since it runs Octave on the back end which is a rather slow implementation of the MATLAB interpreter.
In light of the above, Grid Mathematica seems the best choice of high level language since it at least has some JIT compilation of its interpreted code.
Midterm Update
It turns out that my original idea was a rather bad one, for two reasons. First, it turns out that, according to the literature, parallel SA is best done by essentially just running a serious of independent trials. There are issues of parameter optimization and annealing schedule, but those details don’t have anything to do with parallel implementation and so it looks like I’d learn and demonstrate very little parallel computing with the project.
The second problem was that the more I read about SA as applied to optical thin-film design, the more it became clear that SA would have a difficult time with the complex designs used in ultrafast lasers. The search space increases exponentially, and for a 160 dimensional problem it’s not clear that a reasonable annealling schedule is possible given that, in theory, one needs to give the algorithm enough time to have a reasonable chance of reaching a significant portion of the space. In other words, I was overly optimistic about the ability of SA to achieve some sort of miracle in optimizing a highly non-convex system with hundreds of continuous dimensions. In the end it looks like it would come down to the fact that 100 years divided by 32 is still too long to wait.
New Idea
Given the nature of wave interference, the merit function of any thin-film filter tends to be incredibly non-convex. However, the same physics of interference causes the merit function surface to be quasi-periodic, with local minima occurring at layer thickness spacings roughly corresponding to an optical period of the dominant wavelength for the design. This suggests the following somewhat novel1 idea: treat the refinement of a given system as a combinatorial optimization problem. For a given state (layer profile) we’ll define neighboring states as those where one layer has been perturbed to the next periodic minimum (the period of a given layer can be approximated and quickly refined using a 1D newton method, if needed). In almost all instances, these perturbations will result in an increase of the merit function we’re trying to minimize. However, the hope is that running a full optimization  with the new state as a starting point might result in a better design than before. A few experiments with a simple thin film design suggests that this is true: a 20 layer antireflection coating was able to be improves by moving two layers to new minima and re-optimizing. That this method is reasonable is also suggestion by the observation (see figure) that the local minima found along one dimension tend to correspond to local minima in others.
So, the general idea is to search the space defined by the tree of neighboring states as new inputs to the optimization, iterating up to some depth and in some order to be determined.
Implementation Details
The biggest detail to flesh out is how the tree will be searched. There is also the question of whether or not repeating the process from each optimized node will have any benefit. It would be nice if the answer to the latter question were no, as this would drastically increase the computational complexity and suggest that the algorithm was missing a great deal of nearby local minima. To limit the scope of this project to something reasonable, I will only consider the case of one round of combinatorial search and optimization.
Even with that simplification, there are still many possible ways to implement the program. Right now I’m running simulations (in serial MATLAB) in an attempt to get some sort of statistical feel for the space. How quickly down the tree does improvement become unlikely? Do perturbation where at least two layers move tend to work better than those with one? If a node turns out to have a bad value, can we safely ignore its children?
Parallelism
Like any combinatorial search where the cost is dominated by node evaluation, this is eminently parallelizable. However, it is not trivially so, as evaluating each state node will have an essentially random computational complexity, dictated by the tree termination criteria and variance in optimization times. Thus, scheduling among the computational nodes will have to be dynamic and non-blocking. There will also be a granularity trade-off between the efficiency of letting each node run on it’s own for a significant time without communication, and the advantages to be gained by comparing the best results of each node and perhaps focusing all nodes on subtrees of the best current result.
Current Progress
Unfortunately, no actual parallel code has been written yet, due to the redirection of the project. However, simulations in MATLAB have been done which suggest the algorithm will be at least moderately successful, and the majority of the analysis code has been written and tested in C. The largest amount of work will be split between writing the thin-film analysis code and the MPI communication code, so significant progress has been made even though no parallel runs have been made yet.
1 I did find a paper (from a Russian, of course) were something similar was done, but only on single layers, and with full refinement done after each single perturbation. Nobody appears to try searching deeper into the “tree” of possibilities where multiple layers can be perturbed before re-optimization. I may find out why...