Optimization for Training Deep Models 2