THE IDEA:
This post is to investigate if it is possible to perform optimization using taylor series estimates of first derivatives. Most optimization problems involve converting cost function into a parametric form, and evaluating its first derivative. Then by equating the derivative to zero, you solve for the parameters that describe the solution. For now, I will ignore the second order conditions.
So the optimization problem becomes: Find optimal set of parameters a* that minimize F(a). Or find a* so that
(eq1) $ \left. \frac{d F}{d a} \right|_{a^*} = 0$
Most times, its not easy to solve this analytically, so people resort to numerical techniques. One of the most common numerical technique is to use a gradient descent method, where you start from an initial guess, and propagate solutions until (n+1) as,
(eq2) $ a^{n+1} = a^{n} - \mu \left. \frac{d F}{d a} \right|_{a^n} $
However, it may not be possible to calculate the derivative of the cost function always. So I was wondering how well will approximating the derivative with numerical approximation work?
(eq3) $ \left. \bar{ \frac{d F}{d a}} \right|_{a^n} = \frac{F(a^{n} + \delta a^n ) - F(a^{n} )}{\delta a^n} $
THE EXPERIMENT:
To test this, I chose to fit exponential data to data generated from an exponential function. The data to be fit was chosen as,
(eq4) $Y = a_0 e^{a_1} + a_2 + \alpha(0,\sigma)$
I chose the generating function as,
(eq5) $a_0 = 5$, $a_1 = -3$, $a_2 = 3$, $\sigma = 0.5$.
To estimate the derivative from eq3 I chose the perturbation as 1% of the current solution and learning rate of 0.001. I started with an initial guess of 1,1,1. I ran the solution update according to eq2 4000 times. I chose these numbers on purpose because I wanted to give any advantage to the optimization scheme.* My goal was to give no advantage to the optimization routine, and I chose parameters for optimization as lame as possible. It is possible to get better initial guesses. Knowing structure of eq4, you can estimate the initial guesses for parameters very close to the real solution. The learning rate need not be decided apriori, I could have introduced a subloop to keep reducing learning rate until it results in small change in solution parameters. I could have made the perturbation size in eq3 adaptive, so the errors in estimating errors are small .However, I chose to not do any of these.
I decided to do a monte-carlo simulation by using this method a 1000 times, and checking how many times the optimization routine crashes, or gives unstable results.
RESULTS:
I was expecting (and hoping) for a catastrophic failure. Surprisingly, the optimization gave stable result every time I tried it. Its 5:30 am now, and have ran the simulations about 104 times, and no crashes, so stopped it.
Figure 1. Representative results from optimization.
LEARNING:
Optimization using estimates of first derivative from taylor series expansion provided stable results, and may be viable option to use for optimization. Given the simplistic implementation, and guaranteed convergence. However, choosing perturbation size large can result in unbounded errors. When I tried using 20% perturbation size, I got large errors (i know 20%). So next post will be about how to choose appropriate perturbation size, either analytically or numerically. Also, second order estimates (central differencing) of derivatives are more stable than first order estimates (forward or backward differencing), so am assuming this will result in improved stability, and larger range for perturbation size. These will be topics of upcoming posts.
CONCLUSION.
Its 6 am, off to sleep. :)
_____
*I can write a whole post on how to choose parameters for optimization, and how each one affects the final solution. However, its trivial. And if you read this far, you can figure it out on your own.

No comments:
Post a Comment