SECTION I

HYBRID vehicles have become increasingly popular in the automotive marketplace in the past decade. The most common type is the electric hybrid, which consists of an internal combustion engine (ICE), a battery, and at least one electric machine (EM). Hybrids are built in several configurations including series, parallel, and the series-parallel configuration considered here. Hybrid vehicles are characterized by multiple energy sources; the strategy to control the energy flow among these multiple sources is termed “energy management” and is crucial for good fuel economy. An excellent overview of this area is available in [4].

The energy management problem has been studied extensively in academic circles. Various control design methods are used, including rule-based [5], [6], [7], [8], neural networks [9], game theory [10], and fuzzy logic [11]. There are many proposed methods available for both the non-causal (cycle known in advance) and causal (cycle unknown in advance) cases [12], [13], [14], as well as those with partial future information [15], [16]. The most commonly used optimization strategies are the equivalent consumption minimization strategy (ECMS) [17], [18], [19], [20], [21], [22] and stochastic or deterministic dynamic programming [23], [24], [25], [26]. The majority of existing work focuses on controllers that seek to minimize fuel consumption, while ignoring other attributes that affect the smoothness and responsiveness of vehicle acceleration, which are commonly referred to as “drivability.” In practice, fuel-optimal controllers can lead to excessive gear shifting and engine starting/stopping [27], [28], [29], [30], and hence poor drivability. Previous research has addressed drivability in a suboptimal manner by incorporating penalties on engine starts in an ECMS formulation [18]. The reference [31] addressed engine starts indirectly by including a hysteresis term “to avoid a too frequent switch on—switch off of the [internal combustion] engine, which would cause an additional energy use and wearout.”

In this paper, drivability restrictions are directly incorporated in a causal, optimal controller design method for the energy management system of a hybrid electric vehicle (HEV). The focus is on drivability with respect to engine start-stop and gear shifts; a host of other drivability issues, such as low-frequency longitudinal vibration and other attributes which are typically mitigated by hardware design or low-level control actions, are not considered. The main optimization tool is shortest path stochastic dynamic programming (SP-SDP), which, as explained in [26], [32], [33], [34], is a specific formulation of stochastic dynamic programming (SDP) that allows infinite horizon optimization problems to be addressed without the use of discounting (a discount factor in the cost function assures convergence by weighting future costs exponentially less than current costs). In the energy management problem, the power requested by the driver as a function of time is modeled as a stationary, finite-state Markov chain [23]. The state space of the Markov chain is constructed to include a terminal state corresponding to key-off [26]. The terminal state is designed to be absorbing (that is, it is reached in finite time with probability one, and there is zero probability of transitioning out of it). If zero cost is incurred in the absorbing terminal state, then the expected value of the cost function is finite, even without the discount factor often used in hybrid vehicle applications of SDP [33], [34].

The controllers generated through SP-SDP are causal state feedbacks and hence are directly implementable in a real-time control architecture. The controllers are provably optimal if the driving behavior matches the assumed Markov chain model. In this paper, the Markov chains representing driver behavior are modeled on standard government test cycles, as in [23] and [26]. It is also possible to build the Markov chains on the basis of real-world driving data, as reported in [35].

In addition to generating a class of optimal controllers, the SP-SDP method allows direct study of the tradeoffs between different performance goals, here, drivability and fuel economy. The ability to easily generate Pareto tradeoff curves is perhaps just as interesting as a specific fuel economy benefit. The designer can generate both the maximum attainable performance curve and causal controllers that generate the computed performance. Drivability is emphasized in this paper, but one could also study the fuel economy tradeoff with other attributes such as emissions, battery wear, or engine noise characteristics.

One place where SP-SDP can have a major impact is in controller design for new vehicles. Significant effort is required to develop a controller for a new drivetrain, especially with a completely new architecture. The SP-SDP method can automatically generate a provably optimal controller for a given vehicle architecture and component sizing much faster than a person could do it manually.

The work reported here is a collaborative effort between the University of Michigan and Ford Motor Company. The vehicle studied is a modified Volvo S-80 prototype and does not match any vehicle currently on the market. As a benchmark, Ford provided a controller developed for this prototype vehicle. This industrial controller was described in [2] and is termed hereafter the “baseline” controller. In addition, a high-fidelity vehicle simulation model calibrated for the prototype vehicle was provided; this is the same simulation model used by Ford to develop HEV control algorithms and to evaluate fuel economy and drivability for production vehicles [36].

The remainder of this paper is organized as follows. Section II presents the vehicle architecture and two dynamic models; one is a simplified vehicle model for controller design and the second is the high-fidelity model mentioned above. The drivability metrics used in the optimization problem are presented in Section III. The particular form of infinite-horizon stochastic optimal control used here, SP-SDP, is presented in Section IV, and a key result that greatly enhances offline computational speed is presented in Section V. The procedure for sweeping out the Pareto tradeoff surface is presented in Section VI; this involves computing a large family of controllers based on the simplified control-oriented model and evaluating each controller's performance with the high-fidelity model, which will more closely approximate the actual performance on the prototype vehicle. The main results of the work are presented in Sections VII and VIII. Concluding remarks are given in Section IX. The Appendix provides additional information on enhancing off-line computational speed for SP-SDP and points out a relation between SP-SDP and ECMS.

SECTION II

The vehicle studied in this paper is a prototype series-parallel electric hybrid and is shown schematically in Fig. 1. A 2.4 L diesel engine is coupled to the front axle through a dual clutch 6-speed transmission. An electric machine, $EM1$, is directly coupled to the engine crankshaft and can generate power regardless of clutch state. A second electric machine, $EM2$, is directly coupled to the rear axle through a fixed gear ratio without a clutch and always rotates at a speed proportional to vehicle speed. Energy is stored in a 6 Ah (1.9 kWh) battery pack with a recommended State of Charge (SOC) range of 0.35–0.65. The system parameters are listed in Table I.

The work presented in this paper uses two separate dynamic models to represent the same prototype hybrid vehicle. The first model is quite simple. It has a sample time of 1 s, uses lookup tables, and has very few states. It is used for controller design via dynamic programming, and is called the “control-oriented” model. The second model, provided by Ford Motor Company, is a complex, MATLAB/Simulink-based model with a large number of parameters and states [36]. Each subsystem in the vehicle is represented by an appropriate block with its own dynamics and low-level controllers. This model, which accurately represents the transient response of the engine, transmission and driveline, is referred to as the “high-fidelity” model in the remainder of this paper.

This combination of models allows the controller to be designed on the basis of a simple model for computational tractability, while providing performance assessment on the basis of a model that much more closely reflects the complicated dynamics of the prototype vehicle.

When using SP-SDP, the off-line computation cost is very sensitive to the number of system states. For this reason, the model used to develop the controller must be as simple as possible. The vehicle model used here contains the minimum functionality required to model the vehicle behavior of interest on a second-by-second basis. Dynamics much faster than the sample time of 1 s are ignored. Long-term transients that only weakly affect performance are also ignored; coolant temperature is one example.

The vehicle hardware allows three main operating conditions:

**Parallel Mode:**The engine is on and the clutch is engaged.**Series Mode:**The engine is on and the clutch is disengaged. The only torque to the wheels is through $EM2$.**Electric Mode:**The engine is off and the clutch is disengaged; again the only torque to the wheels is through $EM2$.

The model does not restrict the direction of power flow. The electric machines can be either motors or generators in all modes.

The dynamics of the internal combustion engine are ignored; it is assumed that the engine torque exactly matches valid commands and the fuel consumption is a function only of speed, $\omega_{\rm ICE}$, and torque, $T_{\rm ICE}$. The fuel consumption $\mathdot{m}_{f}$ is derived from a lookup table based on dynamometer testing TeX Source $$\mathdot{m}_{f}=F(\omega_{\rm ICE},T_{\rm ICE}).$$

The dual clutch transmission has discrete gears and no torque converter. The transmission is modeled with a constant mechanical efficiency $\eta_{\rm trans}$. Gear shifts are allowed every time step and transmission dynamics are assumed negligible. While the physical configuration of the transmission allows arbitrary shifting, the low-level transmission controller enforces sequential up/down shifting and the model respects this assumption. This technique is advantageous in hardware because shifts execute by smoothly transitioning between the two clutches and continually transmitting torque. One transmission shaft holds the even gears and the other the odd gears. An arbitrary gear may be selected when the clutch is disengaged. When the clutch is engaged, the vehicle is in parallel mode and the engine speed is assumed directly proportional to wheel speed based on the current gear ratio $R_{g}$ TeX Source $$\omega_{\rm ICE}=R_{g}\omega_{\rm wheel}.$$ The electric machine $EM1$ is directly coupled to the crankshaft, and thus rotates at the engine speed $\omega_{\rm ICE}$ TeX Source $$\omega_{EM1}=\omega_{\rm ICE}.$$

In parallel mode, when providing power to the wheels, the torques $T_{\rm ICE}$ and $T_{EM1}$ are proportional to wheel torque based on the current gear ratio $R_{g}$ and $\eta_{\rm trans}$; during regenerative braking, for example, when absorbing power from the wheels,^{1} the torques are proportional based on $R_{g}$ and $1/\eta_{\rm trans}$. Similarly, the rear electric machine torque $T_{EM2}$ is proportional to the machine's gear ratio $R_{EM2}$ and rear differential efficiency $\eta_{\rm diff}$ when providing power, and is proportional to $R_{EM2}$ and $1/\eta_{\rm diff}$ when absorbing power. The total wheel torque $T_{\rm wheel}$ from both axles is thus the sum of the ICE and $EM1$ torques to the wheel and the rear electric machine $EM2$ torque to the wheel, namely
TeX Source
$$R_{g}\tau_{\rm trans}(T_{\rm ICE}+T_{EM1})+R_{EM2}\tau_{\rm diff}(T_{EM2})=T_{\rm wheel}\eqno{\hbox{(1)}}$$ where
TeX Source
$$\tau_{\rm trans}(T)=\cases{\eta_{\rm trans},T&if $T\geq 0,$\cr{T\over\eta}_{\rm trans}&otherwise}$$ and similarly for $\tau_{\rm diff}$.

The clutch can be disengaged at any time, and power is delivered to the road through the rear electric machine $EM2$. This condition is treated as the neutral gear 0, which combines with the six standard gears for a total of seven gear states. If the engine is on with the clutch disengaged, the vehicle is in series mode. The engine-$EM1$ combination acts as a generator and can operate at an arbitrary torque and speed. If the engine is off while the clutch is disengaged, the vehicle is in electric mode. The clutch is never engaged with the engine off, so this mode is undefined and not used.

The battery system is similarly reduced to table lookup form. The electrical dynamics due to the motor, battery, and power electronics are assumed sufficiently fast to be ignored. The energy losses and efficiencies in these components can be grouped together such that the change in battery SOC is a function $\bar{\kappa}$ of electric machine speeds $\omega_{EM1}$ and $\omega_{EM2}$, torques $T_{EM1}$ and $T_{EM2}$, and battery SOC at the current time step TeX Source $${\rm SOC}_{k+1}=\bar{\kappa}\left({\rm SOC}_{k},\omega_{EM1_{k}},\omega_{EM2_{k}},T_{EM1_{k}},T_{EM2_{k}}\right).\eqno{\hbox{(2)}}$$ In this simplest configuration, assuming a known vehicle speed, the only state variable required for the vehicle model is the battery SOC. Changes in battery performance due to temperature, age, and wear are ignored. Additional states are required to represent the stochastic drive cycle and to track drivability metrics.

During operation, the desired wheel torque is defined by the driver. If we assume the vehicle must meet the torque demand perfectly, then the sum of the ICE and EM contributions to wheel torque (1) must equal the demanded torque $T_{\rm demand}$
TeX Source
$$T_{\rm wheel}=T_{\rm demand}.$$ This adds a constraint to the control optimization, reducing the four control inputs to a 3 degree-of-freedom problem. In parallel mode, the control inputs are *Engine Torque*, $EM1$ *Torque*, and *Gear*. In series mode, the electric machine command becomes $EM1$ *Speed*.

Optimization using the control-oriented model assumes a perfect driver during the design process; specifically, the desired road power is calculated as the exact power required to drive the cycle at that time. A proportional-integral-differential (PID) controller based on velocity feedback is used to represent a causal driver during simulation of the high-fidelity model. Now, given vehicle speed, demanded road power and this choice of control inputs, the dynamics become an explicit function $\kappa_{k}$ of the state *Battery SOC* and the three control choices shown in Fig. 2
TeX Source
$${\rm SOC}_{k+1}=\kappa_{k}\left({\rm SOC}_{k},T_{\rm ICE_{k}},T_{EM1_{k}},Gear_{k}\right).\eqno{\hbox{(3)}}$$ In series mode, $T_{EM1}$ is replaced with $\omega_{EM1}$. The engine fuel consumption can be calculated from the control inputs.

This control-oriented model uses several assumptions about the allowed vehicle behavior.

- Regenerative braking is used as much as possible up to the actuator limits; friction brakes provide any remaining torque.
- The clutch in the transmission allows the diesel engine to be decoupled from the wheels, permitting all-electric or series operation.
- There is no ability to slip the clutch for vehicle launch.
- There are no traction control restrictions on the amount of torque that can be applied to the wheels.

The high-fidelity model contains the baseline controller algorithm. To generate simulation results using this controller, an automated driver follows the target cycle using the baseline controller. To use the high-fidelity model with the control algorithm developed here, the SP-SDP controller is implemented in Simulink by interfacing appropriate feedback and command signals: Battery State of Charge, Vehicle Speed, Engine State, Gear Command, etc. The high-fidelity model can then be driven by the SP-SDP controller along a given drive cycle using a causal driver model.

The baseline prototype energy management controller studied here is quite complex. Its key features are contained in three modules, as depicted in Fig. 3. Driver power demand is determined from pedal position. One module determines the battery power flow and adds it to the driver demand to determine the *Total Power*. A second module determines the engine state based on the *Total Power* using a state machine with hysteresis. A third rule-based module then determines individual actuator commands (e.g., power from the engine and the two electric machines) based on the *Total Power* and the desired engine state. The gear is selected independently by the transmission controller.

The primary tuning parameters are five scalar functions, two in the Battery Power module and three functions of vehicle speed in the Engine State Machine module. One advantage of the baseline architecture is that engine behavior and battery charge maintenance features are largely confined to their respective blocks, simplifying the tuning process considerably.

SECTION III

Drivability is a term that covers many aspects of vehicle performance including acceleration, engine noise, braking, automated shifting activity, and shift quality [37], [38]. Meeting a customer's expectations of drivability often involves a tradeoff with fuel economy. As an example, optimal fuel economy for gasoline engines typically dictates upshifting at the lowest possible speed, but this leaves the driver with little acceleration ability after the upshift. Consequently, upshifts are scheduled to occur at a speed that is higher than the value that is best for fuel economy.

Industry experts were consulted to assist in quantifying aspects of drivability that are strongly coupled to the energy-management controller. It was recommended that attention be focused on the frequency and timing of gear-shift events and engine-start/stop events. The mean time between events and the number of short-duration events were recommended as metrics, where a short-duration event means that dwell time in a particular state is less than a specified acceptable value. For the transmission, a particularly annoying short-duration event is “hunting,” that is, rapid shifting between the same two gears. Fig. 4 shows seven possible metrics based on mean and short-duration drivability metrics for the engine and transmission. For later use, these metrics are referred to as the “complex” drivability metrics.

In order to incorporate these complex drivability metrics into the model, and then into the optimization problem, states would have to be added to keep track of the duration between shifts and between engine starts and stops, as well as the mean number of these events over a given time interval. While this is theoretically possible, the well-known “curse of dimensionality” would render the associated stochastic optimization problem computationally intractable. Even if the optimization problem could be solved, the designer would be faced with the difficult job of assigning relative weights to each of the metrics when performing a tradeoff analysis.

We chose therefore to simplify these complex metrics into two measures of drivability that can be more easily used. The first drivability metric is termed gear events, and is defined to be the total number of shift events on a given trip. The second drivability metric is termed engine events, and is defined to be the total number of engine start and stop events on a trip. By definition, engine starts and stops are each counted as an event. Each shift with the clutch engaged is counted as a gear event, whereas engaging or disengaging the clutch is not counted as a gear event, regardless of the gear before or after the event.

Fig. 5 shows that the complex and simple metrics are strongly correlated; specifically, the figure shows that reducing the total number of engine on-off events over a drive cycle reduces the occurrence of events where the engine is on for less than 3, 5, 10, or 30 s. The data are shown along with a straight-line least-squares fit. The other complex metrics listed in Fig. 4 show similar correlations, being approximately monotone functions of the simple metrics. The data in Fig. 5 were obtained by simulating the SP-SDP controllers of the ensuing sections on the high-fidelity model. The results are presented here in order to motivate the use of the simplified metrics in the rest of this paper.

The first step in the design of a controller with acceptable drivability properties is to pose a cost function that permits a compromise between fuel economy and drivability. This is achieved by the use of penalties. Specifically, the cost function over a particular drive cycle (suppressing the summing index) is
TeX Source
$$J=\sum_{0}^{T}\mathdot{m}_{f}+\alpha\sum_{0}^{T}{\bf I}_{GE}(x,u)+\beta\sum_{0}^{T}{\bf I}_{EE}(x,u)$$ where ${\bf I}(x,u)$ are indicator functions and thus equal one when a state and control combination produces a gear event (GE) or engine event (EE) as defined in Section III-B, and are zero otherwise. $T$ is the total trip time, from key-on to key-off. The drivability behavior is not incorporated as a direct constraint, so the search for the weighting factors $\alpha$ and $\beta$ involves some trial and error because the mapping from penalty to outcome is not known *a priori*. Note that setting $\alpha$ and $\beta$ to zero corresponds to solving for optimal fuel economy without regard to drivability.

Controllers based only on fuel economy and drivability completely drain the battery as they seek to minimize fuel. An additional cost is added to ensure that the vehicle is charge sustaining over the cycle. This SOC-based cost only occurs at the terminal state, $x_{T}$ (that is, at the end of the trip at key-off), and is represented as a function $\phi_{{\rm SOC}}(x_{T})$. The performance index for a particular drive cycle is then TeX Source $$J=\sum_{0}^{T}\mathdot{m}_{f}+\alpha\sum_{0}^{T}{\bf I}_{GE}(x,u)+\beta\sum_{0}^{T}{\bf I}_{EE}(x,u)+\phi_{{\rm SOC}}(x_{T}).\eqno{\hbox{(4)}}$$

SECTION IV

As the cycle is not known exactly in advance, this optimization is conducted in the stochastic sense by minimizing the expected sum of a running cost function $c(x_{k},u_{k},w_{k}),$ where $x_{k}$ is the state, $u_{k}$ is a particular control choice in the set of allowable controls $U(x_{k})$, and $w_{k}$ is a random variable arising from the unknown drive cycle. The expectation over the random process $w$ is denoted $E_{w}$. The optimization problem is TeX Source $$\eqalignno\qquad\min\ E_{w}\sum_{k=0}^{\infty}c(x_{k},u_{k},w_{k})&\hbox{(5)}\cr&$$ subject to the system dynamics TeX Source $$\eqalignno\qquad\qquad x_{k+1}=f(x_{k},u_{k},w_{k})&\hbox{(6)}}\cr&$$ with $u_{k}\in U(x_{k})$, where TeX Source $$U(x_{k})=\left\{u_{k}\vert g_{1}(x_{k},u_{k})\leq 0,g_{2}(x_{k},u_{k})=0\right\}.$$ Actuator limits, torque delivery requirements, and other system requirements are incorporated in the constraints $g_{1}$ and $g_{2}$, which are enforced at each time step, in contrast to drivability goals, which involve performance over the whole cycle.

To implement the optimization goal (5), the running cost function is prescribed to represent (4) TeX Source $$\eqalignno{c(x,u,w)&=\mathdot{m}_{f}(x,u)+\alpha{\bf I}_{GE}(x,u)\cr&\quad+\beta{\bf I}_{EE}(x,u)+\phi_{{\rm SOC}}(x,w).&\hbox{(7)}}$$ The SOC-based cost $\phi_{{\rm SOC}}(x,w)$ applies only at the end of the trip, when the key-off event occurs. As explained in Section IV-D, the transition to key-off is captured by the stochastic drive-cycle model in the random process $w$. The cost $\phi_{{\rm SOC}}(x,w)$ at the key-off event replaces the terminal-time cost $\phi_{{\rm SOC}}(x_{T})$ in (4).

To determine the optimal control strategy for this vehicle, the SP-SDP algorithm is used [25], [26], [33], [34]. This method directly generates a causal, time-invariant, state-feedback controller. Characteristics of future driving behavior are specified via a finite-state Markov chain rather than exact future knowledge. Given the system model (6), the optimal cost $V^{\ast}(x)$ over an infinite horizon is a function of the state $x$ and satisfies TeX Source $$V^{\ast}(x)=\min_{u\in U(x)}E_{w}\left[c(x,u,w)+V^{\ast}\left(f(x,u,w)\right)\right]\eqno{\hbox{(8)}}$$ where $c(x,u,w)$ is the instantaneous cost as a function of state and control; (7) is a typical example. This equation represents a compromise between minimizing the current cost $c(x,u,w)$ and the expected future cost $V(f(x,u,w))$. The control $u$ is selected based on the expectation over $w$, rather than a deterministic cost, because the future can only be estimated based on the probability distribution of $w$. Note that the cost $V(x)$ is a function of the state only. This cost is finite for all $x$ if every point in the state space has a positive probability of eventually transitioning to an absorbing state that incurs zero cost from that time onward. Here, the absorbing state is key-off, the end of the drive cycle.

The optimal control $u^{\ast}$ is any control that achieves the minimum cost $V^{\ast}(x)$ TeX Source $$u^{\ast}(x)=\mathop{\arg\min}_{u\in U(x)}E_{w}\left[c(x,u,w)+V^{\ast}\left(f(x,u,w)\right)\right].\eqno{\hbox{(9)}}$$

**Remark**: At each time step $k$, random variable $w_{k}$ in (8) and (9) may be conditioned on the state and control input
TeX Source
$$P(w_{k}\vert x_{k},u_{k}).\eqno{\hbox{(10)}}$$

The drive cycle is modeled as a Markov chain. The drive cycle is assigned two states: current velocity $v_{k}$ and current acceleration $a_{k}$, which are included in the full system state $x_{k}$. The random variable $w_{k}$ in (8) is the acceleration at the next time step. Specifying the drive cycle is equivalent to assigning a probability distribution to $w$, that is, specifying TeX Source $$P(a_{k+1}\vert v_{k},a_{k})\eqno{\hbox{(11)}}$$ for pairs $v_{k},a_{k}$. Following [25], the transition probabilities (11) are estimated from known drive cycles that represent typical behavior, referred to as “design cycles.” The variables $v_{k}$, $a_{k}$, and $a_{k+1}$ are discretized to form a grid. For each discrete state $[v_{k},a_{k}]$ there are a variety of outcomes $a_{k+1}$. The probability of each outcome $a_{k+1}$ is estimated based on its frequency of occurrence during the design cycle, and is clearly a function of the state as in (10); see [25], [26] for more detail. Specific design cycles include the standard cycles used to establish “window sticker” fuel economy such as the Federal Test Procedure (FTP). As mentioned previously, design cycles might also include measured driving behavior over “real world” vehicle use.

Bringing this all together, the full system state vector $x$ contains five components: one state for the vehicle (*Battery SOC*), two states for the stochastic driver $(v_{k},a_{k})$, and two states to study drivability (*Gear* and *Engine State*). This formulation is termed the “SP-SDP-Drivability” controller. A summary of system states is shown in Table II.

The inputs to the model are engine torque, gear number, and the powers or torques of the two electric machines. Looking ahead, Section V shows how an off-line optimization step can be used to replace the two electric machines by a single input representing total electric machine power, thereby reducing the inputs for the optimization problem to engine torque, gear number, and total electric machine power. The power balance to meet driver demand given in (1) then allows the elimination of one more input. The final control input $u$ that will be used in the optimization problem consists therefore of *Engine Torque* and *Gear*.

**Remarks**: (a) The form of the Bellman equation (8) associated with any dynamic programming problem allows an analytical comparison with ECMS and is discussed in the Appendix. (b) As demands on controller functionality grow, so also must the complexity of the design model. For example, to study fuel economy using deterministic dynamic programming, the only state required is the battery state of charge; the control inputs are *Engine Torque* and *Gear*. Two more states are required to study the stochastic version, and the simplified drivability model used here requires two additional states.

As mentioned in Section IV-B, the dynamics of the system must contain an absorbing state. For this case, the absorbing state represents key-off, when the driver has finished the trip, shut down the vehicle, and removed the key. Once the key-off event occurs, there are no further costs incurred, the trip is over, and the vehicle cannot be restarted. The probability of transitioning to this state is zero unless the vehicle is completely stopped $(v_{k},a_{k}=0)$. The probability of a trip ending once the vehicle is stopped is calculated based on the design cycles. This probability is less than one because a stopped vehicle could represent a traffic light or other typical driving event that does not correspond to the end of a trip.

For fuel economy certification, the battery final SOC must be close to the initial SOC. To include this in the SP-SDP formulation, a cost is imposed when the vehicle transitions into the key-off state and the SOC is less than the initial SOC. This penalty accrues only once, so the absorbing state has zero cost from then onwards. Here we add a quadratic penalty in SOC if the final SOC is less than the initial SOC. No penalty is assigned if the final SOC is higher than the initial SOC.

The effects of this key-off penalty are clearly visible in the value function $V(x)$. For the fuel-only case, the value function depends on the current acceleration, velocity, and SOC. Fig. 6 shows $V(x)$ as a function of SOC for one particular acceleration and several velocities, with target final SOC equal to 0.5. Notice that at low velocities, the value function has a pronounced quadratic shape for SOC under 0.5, but it flattens out at higher speeds. The SOC penalty only occurs at key-off, which can only occur at zero speed. Thus the SOC key-off penalty strongly affects the value function at low speeds, when there is a higher probability of key-off in the near future. At higher speeds, there is little chance of key-off anytime soon, so the SOC penalty only weakly affects the value function. Moreover, there will be a deceleration phase before reaching zero speed and thus an opportunity to recharge the battery.

Stochastic Dynamic Programming is inherently computationally intensive and can quickly become intractable. The computational burden is exponential in the number of system states; thus the cost function (7) should depend on a minimal number of states.

For optimization, at each time step a penalty is assigned if either a shift or engine event occurs. The two additional states required to implement this cost function are the current gear and the engine state. Thus, including drivability in the optimization imposes roughly a factor of ten increase in computation over the fuel-only case.

In contrast, suppose the metric of interest were based on a moving window in time. The number of required grid points scales with the number of time steps used to specify the metric. For the 1 second update time studied here, penalizing engine events of 5 s duration or less (rather than the simple on/off used here) would require five grid points for the time history, increasing the size of the state-space by a corresponding factor of 5 over the on/off case.

SECTION V

Consider a Bellman equation of the form TeX Source $$V^{\ast}(x)\!=\!\min_{\mathhat{u}\in\mathhat{U}(x),\bar{u}\in\bar{U}(x,\mathhat{u})}E_{w}\left[c(x,\mathhat{u},\bar{u},w)\!+\!V^{\ast}\left(f(x,\mathhat{u},w)\right)\right]\eqno{\hbox{(12)}}$$ and define TeX Source $$\mathhat{c}(x,\mathhat{u})=\min_{\bar{u}\in\bar{U}(x,\mathhat{u})}E_{w}\left[c(x,\mathhat{u},\bar{u},w)\right].\eqno{\hbox{(13)}}$$ Then $V^{\ast}(x)$ satisfies (12) if, and only if, it satisfies TeX Source $$V^{\ast}(x)=\min_{\mathhat{u}\in\mathhat{U}(x)}E_{w}\left[\mathhat{c}(x,\mathhat{u})+V^{\ast}\left(f(x,\mathhat{u},w)\right)\right].\eqno{\hbox{(14)}}$$

The proof and more detail are available in the Appendix. This result allows a significant reduction in computational complexity for problems that have the specific structure (12). The reduced Bellman equation (14) may be solved using only the reduced control space $\mathhat{U}(x)$. This structure appears quite often in energy management problems (see Appendix).

The above decomposition has been exploited in previous work without explicit theoretical justification [16], [39]. A typical example is the power-split HEV configuration which uses engine power and speed as inputs without an engine speed state [39]. The fuel-minimizing engine speed $(\bar{u})$ for each engine power $(\mathhat{u})$ is precomputed and stored as a table (see Appendix).

The following subsection details the physical explanation of the structure (12) for the vehicle considered in this work and how the decomposition is implemented.

In comparison to previous work in [1], the addition of a second electric machine makes the computation of a SP-SDP solution more complex by forcing the algorithm to consider an additional dimension in the control space. If the additional control variable is discretized with say $N=10$ points, the size of the minimization operation in (8) over pairs $(x,u)$ increases by a factor of ten. Exploiting the structure represented by (12) and using Minimization Decomposition reduces the computational cost to that of a vehicle with a single electric machine, i.e., a 90% reduction. The addition of the second electric machine is then approximately free in terms of computing an off-line solution to the SP-SDP problem.

Intuitively, Minimization Decomposition lumps the two electric machines into a single “Super Electric Machine.” This device is a black box that takes a desired wheel-torque command as an input and uses the vehicle velocity, engine torque, and gear to achieve the desired torque with minimal electric power, as shown in Fig. 7. The required minimization is static, and in an off-line setting such as SP-SDP, can be done once and reused. Once the static optimization is performed, the Super Electric Machine appears as a single power source for the SP-SDP optimization. Internally, however, the Super Electric Machine optimizes between the two (or possibly more) electric machines and issues appropriate commands.

A more technical justification follows. The torque balance (1) allows a tradeoff between the two electric machine torques. The system dynamics are only affected by the net change in SOC
TeX Source
$$\mathhat{u}:\delta {\rm SOC}\eqno{\hbox{(15)}}$$ and not by the split of the electric machine torques, which can be defined by one command
TeX Source
$$\bar{u}:\bar{T}_{EM2}.\eqno{\hbox{(16)}}$$ For a given $\bar{T}_{EM2}$, velocity, and gear, $\bar{T}_{EM1}$ is exactly determined by $\delta {\rm SOC}$. Since the power-split optimization is static (i.e., independent of the dynamic states of the model, including SOC), it takes the form (12) and can be computed *a priori* using (13) without loss of optimality. This reduces the dimension of the control space by one. The fundamental assumption that allows this to work is that the electric machine behaviors depend only on the current values of the EM torque commands, current gear, and velocity, and in particular, do not depend on their past values. For any control command under consideration, the knowledge of current gear, velocity, $\bar{T}_{EM2},\delta {\rm SOC}$, and the required wheel torque uniquely determines all the terms in the torque balance (1). An optimal $\bar{u}$ can then be selected as in (13). The required engine torque is determined from the torque balance, accounting for the direction-dependent efficiency losses in the transmission.

The physical control inputs to the system are engine torque, gear, EM1 torque and EM2 torque. The constraint to match driver demand torque removes one degree-of-freedom. By replacing the two electric machine commands with a single electric wheel torque command, the SP-SDP algorithm has only two control inputs.

SECTION VI

SP-SDP-based controllers are compared to a baseline industrial controller. SP-SDP controllers are designed using the control-oriented model and evaluated using the high-fidelity vehicle simulation model of Section II-D. This demonstrates some robustness by using two models of the same vehicle, differing in the level of detail in their dynamics. Strictly speaking, the optimality guarantees are no longer valid because the test model is different from the design model. For practical purposes, a strictly optimal model-based controller is unattainable in hardware because a model will always have some mismatch with a real vehicle. Demonstrating excellent performance on the (exact) design model is only marginally useful as it presents no model uncertainty. By designing the controller on a simple model and testing on a (not perfectly matched) complex model, we more closely approximate the process of designing on the basis of a model and testing on hardware.

Both SP-SDP and the baseline controllers are simulated on two government test cycles, the US Federal Test Procedure and the New European Drive Cycle (NEDC), which are shown in Fig. 8. Procedurally, this is conducted as follows.

- A family of SP-SDP controllers is designed according to the methods of Section IV. A family is generated by fixing the model driving statistics and sweeping the two drivability penalties $\alpha$ and $\beta$ in (7).
- Each controller in the family is simulated on the high-fidelity model using a causal driver, thus accounting for all the dynamics and real vehicle characteristics neglected in the optimization.
- The fuel economy and drivability metrics are recorded. Fuel economy is computed in units of MPG (Miles driven Per Gallon of fuel consumed), and hence larger numbers mean better fuel economy.

In the end, each family contains a few hundred individual controllers which have each been simulated on the cycle in question. Each simulation yields a data point with associated fuel economy and drivability metrics. Each controller in the family has different drivability and fuel economy characteristics because of the varying drivability penalties.

Because the simulations on the high-fidelity model use a causal driver model, the final SOC is not guaranteed to exactly match the starting SOC. This could yield false fuel economy results, so all fuel economy estimates are corrected based on the final SOC of the drive cycle. This is done by estimating the additional fuel required to charge the battery to its initial SOC, or the potential fuel savings shown by a final SOC that is higher than the starting level. This correction is applied according to TeX Source $$\Delta m_{f}=C_{\rm Batt}\Delta {\rm SOC}{{\rm BSFC}_{\min}\over\eta_{\max}^{\rm Regen}}\eqno{\hbox{(17)}}$$ where $\Delta m_{f}$ is the adjustment to the fuel used, $C_{\rm Batt}$ is the battery capacity, $\Delta {\rm SOC}$ is the difference between the starting and ending SOC, $\rm BSFC_{\min}$ is the best Brake Specific Fuel Consumption for the engine, and $\eta_{\max}^{\rm Regen}$ is the best charging efficiency of the electric system. This correction is a reasonable approximation but not exact; the exact correction depends on the controller and the particular cycle. For the FTP cycle, the mean fuel economy correction for the SOC deviations presented in Fig. 9(e) is 1.6%, with a 1.3% standard deviation. Hence, using this simple correction does not change the conclusions of the presented results in any substantial way.

Fuel economy numbers in this paper always include the SOC correction. The fuel economy of the baseline controller running the FTP cycle is used as the nominal value for normalization. Therefore the normalized fuel economy of the baseline controller on FTP is one.

SECTION VII

The three metrics of interest in this paper are the numbers of gear and engine events, and the total fuel consumption corrected for SOC. The family of controllers generated as described in Section VI yields the results shown in Fig. 9 for the FTP cycle and the NEDC.

Fig. 9(a) and (b) show 3-D scatter plots of fuel economy versus gear and engine events for the two cycles. Each point represents a single controller driven on the cycle in question. The total numbers of gear events and engine events are shown on the horizontal axes, while fuel economy is shown on the vertical axis as normalized MPG.^{2} The combination of these points form a surface in 3-D space depicting the tradeoff surface of various operating conditions. Fig. 9(a) shows a family of controllers designed using FTP statistics running the FTP cycle. Fuel economy data presented in this paper are normalized to the fuel economy of the baseline controller on FTP, shown as a large solid square. Hence, a fuel economy greater than one means more miles would be traveled using the same fuel as consumed by the baseline controller, or equivalently, less fuel would be consumed for the same distance traveled. A polynomial surface is fit to the raw data and used to generate isoclines of constant number of gear events, shown as solid and dashed lines.

Fig. 9(c) is a 2-D view of Fig. 9(a) looking along the gear events axis. Each line in the plot represents a constant number of gear events, while the horizontal and vertical axes show the number of engine events and normalized fuel economy respectively. This particular vehicle is relatively insensitive to the number of gear events, so most of the results concentrate on the tradeoff between engine activity and fuel economy. The final SOC for these simulations is shown in Fig. 9(e). All simulations start at 0.5 SOC.

Similarly, a family of controllers is designed and simulated on the NEDC. Fuel economy results are again shown in 3-D and 2-D in Fig. 9(b) and (d), while the final SOC is shown in Fig. 9(f). Again, fuel economy is normalized to the baseline controller performance on FTP, so the baseline controller is slightly less fuel efficient on NEDC (0.99) than FTP (1.00).

The frontiers of the 2-D and 3-D point clouds in Fig. 9 clearly demonstrate the tradeoff between fuel economy and drivability. The plot of final SOC for the FTP cycle [see Fig. 9(e)] shows a distinct downward trend for large numbers of engine events. The target final SOC is 0.5, which the controllers come very close to achieving when engine events are unrestricted (low penalties).

The final SOC penalty $\phi_{{\rm SOC}}(x,w)$ in (7) used in the control design process is only applied if the final SOC is below this target. For final SOCs above the target, the only cost is the fuel spent charging the battery. With smaller numbers of engine events, the controller has less freedom to turn the engine on and charge the battery. In effect, the controllers become more conservative and maintain higher SOCs to avoid either additional engine starts or a final SOC that is too low.

An interesting phenomenon occurs when the engine events penalty is very high. In this case, to avoid engine shut-down, the only option is to disengage the clutch and enter series mode. With this artifice, it is possible to have a cycle with no other engine events than the initial start and final stop.

The results show some unexpected trends. Fig. 9(c) and (d) show a slight decrease in fuel economy for large numbers of engine events. In some cases on FTP, decreasing the number of gear events actually increases fuel economy [see 9(c)]. There are two issues here. The optimization is with respect to expected value and not a single sample path as in the plot, and the plot depicts controller performance on the high-fidelity model and not the control-oriented design model. To determine which of these two explanations is the correct one, simulations of controller performance for the FTP cycle were conducted on the simplified vehicle model (i.e., the control design model) in order to eliminate the issue of model mismatch. These simulations show the same trends discussed above, implying that model mismatch is not causing these phenomena. Large numbers of cycles were then simulated [35] to check the performance in the expected sense rather than for a single cycle. This second set of simulations show fuel economy is monotonic in both gear and engine events as one would expect.

The SP-SDP results show significant (11%) performance improvements over the baseline controller for the metrics considered here. Production controllers incorporate many additional attributes, such as noise, harshness, durability, safety, accessory loading, diagnostics, etc. These attributes may decrease the performance margin when fully incorporated. One obvious example is emissions, which was not considered in either the baseline or SP-SDP controllers. For previous work on NOx emissions, see [23], [40], and [41].

SECTION VIII

Several controllers are studied in greater detail on the FTP cycle, which generally yields more interesting behavior than the NEDC. The performance of the baseline controller is compared to three SP-SDP controllers in Table III, all running the FTP cycle. The SP-SDP controllers are designed using FTP statistics and are selected from those shown in Fig. 9(a), (c), and (e). SP-SDP #1 is the controller with the best corrected fuel economy without regard to drivability. The peak of the fuel economy surface [see Fig. 9(a)] is very close to the baseline controller operating point in terms of drivability. SP-SDP #2 has the closest drivability metrics to the baseline controller, and is closely related to SP-SDP #1. SP-SDP #3 is selected by finding a controller with similar fuel economy to the baseline controller and about half the number of drivability events. Essentially, we are presenting two possible design choices: improved fuel economy with similar drivability (SP-SDP#2), or similar fuel economy with reduced drivetrain activity (SP-SDP#3). The designer may also select some compromise between the two.

Time histories of the baseline and SP-SDP #2 controllers are presented for the first 500 s of the FTP cycle in Fig. 10. The engine torque/speed operating points for these two controllers on the full FTP cycle are shown in Fig. 11.

Summary metrics are shown for the baseline and SP-SDP controllers in Table IV. The forward wheel energy is the integral of all motoring (output) wheel power, the engine output energy is the total energy delivered at the engine output shaft, engine brake specific fuel consumption (BSFC) (g/kWh) is the total fuel consumed divided by the total engine output energy, and friction braking energy is the energy dissipated by the friction brakes.

For the electrical propulsion system, the electro-mechanical charge energy is the total mechanical energy absorbed by the electric machines, and the electro-mechanical discharge energy is the forward mechanical energy provided by the electric machines. The electro-mechanical losses are the difference between the two minus the change in battery energy (due to final SOC) and represent all losses in the electrical system including accessory loads. The round-trip electrical efficiency is the discharge energy plus any net change in SOC divided by the charge energy.

Figs. 10 and 11 and Table IV lend some insight into the performance differences between the SP-SDP and baseline controller. Table IV shows the SP-SDP controller is more efficient in its use of the diesel engine. The engine primarily operates near a high efficiency point or completely off, and yields a lower average BSFC as shown in Fig. 11. The high-torque operating points are also visible in Fig. 10. The electric machines are used more extensively by the SP-SDP controller, which allows more efficient ICE utilization and more efficient overall electrical propulsion. Friction braking is also minimized by the SP-SDP controller.

SECTION IX

An energy management controller for a prototype parallel-series hybrid electric vehicle has been developed using SP-SDP to optimally perform the inherent tradeoff between fuel efficiency and drivability. The SP-SDP-based controllers minimize the expected value of a cost function, using a statistical description of expected driving behavior. Here, the cost was a weighted sum of consumed fuel and drivability penalties based on shift events and engine on-off events. By varying the weights, the Pareto tradeoff surface of fuel economy versus drivability for the SP-SDP-based controllers was evaluated on a high-fidelity vehicle simulation model.

The performance of the SP-SDP-based controllers was compared against an industrial baseline controller. For the same level of drivability, the SP-SDP-based controllers were 11% more fuel efficient than the baseline controller on the FTP cycle and the NEDC. The SP-SDP-based controllers were designed for the driving statistics of each of the two cycles.

In general, dynamic programming is well known to suffer from the “curse of dimensionality,” referring to the exponential explosion of problem size with the number of state and control variables. The system model addressed here had three power sources, namely, an internal combustion engine and two electric machines. A two-step off-line optimization strategy was presented that preserved optimality while presenting the SP-SDP algorithm with an equivalent system model that contained a single “super” electric machine. Ultimately, this allowed the SD-SDP algorithm to be run on a desktop PC. A similar two-step optimization strategy is applicable to other vehicle configurations that have multiple actuator degrees of freedom.

While the excellent fuel economy of the SP-SDP controllers is very interesting, we feel a more important observation is that the SP-SDP design method produces causal controllers that respect constraints and perform well on (and off [35]) standard test cycles. SP-SDP-based controllers can be directly implemented in a realistic control environment with little manual tuning, as demonstrated on an industrial vehicle model which includes detailed subsystem models, dynamics, delays, and limits.

Equation (12) may be written as TeX Source $$V^{\ast}(x)\!=\!\min_{\mathhat{u}\in\mathhat{U}(x)}\min_{\bar{u}\in\bar{U}(x,\mathhat{u})}E_{w}\left[c(x,\mathhat{u},\bar{u},w)\!+\!V^{\ast}\left(f(x,\mathhat{u},w)\right)\right]\eqno{\hbox{(18)}}$$ and by the linearity of expectation TeX Source $$\eqalignno{V^{\ast}(x)&=\min_{\mathhat{u}\in\mathhat{U}(x)}\min_{\bar{u}\in\bar{U}(x,\mathhat{u})}\left(E_{w}\left[c(x,\mathhat{u},\bar{u},w)\right]\right.\cr&\quad\left.+E_{w}\left[V^{\ast}\left(f(x,\mathhat{u},w)\right)\right]\right).&\hbox{(19)}}$$ The expectation of the value function is independent of $\bar{u}$ yielding TeX Source $$\displaylines{V^{\ast}(x)=\min_{\mathhat{u}\in\mathhat{U}(x)}\left(\min_{\bar{u}\in\bar{U}(x,\mathhat{u})}E_{w}\left[c(x,\mathhat{u},\bar{u},w)\right]\right.\hfill\cr\hfill\left.+E_{w}\left[V^{\ast}\left(f(x,\mathhat{u},w)\right)\right]\right).\quad\hbox{(20)}}$$ Using the definition (13), (20) becomes (12).$\hfill\blackbox$

To implement the controller developed using Minimization Decomposition, $\bar{u}$ must still be determined. It may be precomputed and stored when calculating (13) TeX Source $$\bar{u}^{\ast}(x,\mathhat{u})=\mathop{\arg\min}_{\bar{u}\in\bar{U}(x,\mathhat{u})}c(x,\mathhat{u},\bar{u})\eqno{\hbox{(21)}}$$ and TeX Source $$\mathhat{c}(x,\mathhat{u})=c\left(x,\mathhat{u},\bar{u}^{\ast}(x,\mathhat{u})\right)=\min_{\bar{u}\in\bar{U}(x,\mathhat{u})}c(x,\mathhat{u},\bar{u}).\eqno{\hbox{(22)}}$$

This process reduces the space of control actions by $\bar{U}$. The computation scales linearly with the number of possible control actions, and can be significantly reduced depending on the problem structure and the size of $\bar{u}$.

Minimization decomposition may also be used when solving for non-stationary value functions by appropriately replacing $V(x)$ with a time-dependent $V_{k}(x)$, for either deterministic or stochastic cases [16].

**Remark: (Functional Form to use Minimization Decomposition)** Suppose a system has dynamics $f(x,\mathhat{u},\bar{u},w)$ that are independent of some control component $\bar{u}$ and can be reformulated into a function $\mathhat{f}$, such that
TeX Source
$$\mathhat{f}(x,\mathhat{u},w)=f(x,\mathhat{u},\bar{u},w)\eqno{\hbox{(23)}}$$ with probability 1 (w.p. 1). Then the Bellman equation satisfies (12) and the minimization decomposition may be used.

While the property (23) seems quite restrictive, it occurs surprisingly often in the energy management problem. It is likely to occur if the number of control inputs $M$ exceeds the dimension of the state space $N$, leaving a null control direction as used in [38].

**Remark: (State Decomposition)** In this energy management problem (as in most formulations) the dynamics may clearly be broken down into two parts
TeX Source
$$f(x,u,w)={f_{u}(x,u)\brack f_{w}(x,w)}\eqno{\hbox{(24)}}$$ where the deterministic states are the known vehicle dynamics and the stochastic driver dynamics are independent of the control input.

This allows the control inputs to be studied without the effect of $w$, simplifying the verification of condition (23). Whenever the number of actuators exceeds the dimension of $f_{u}$, (23) is likely to hold.

The main point is this: if the number of control inputs exceeds the number of states, the required computation can often be drastically reduced. Even with discrete states (e.g., gear number) the same techniques may often be used.

Consider for example the “Power-Split” architecture of the Toyota Prius and Ford Escape, with a cost function that penalizes fuel use and SOC deviations from nominal to attain charge sustenance. If one assumes that the dynamics of engine speed changes are negligible at the timescales for energy management, the only vehicle state is SOC, as velocity and acceleration are assigned by the driver (stochastically when using SP-SDP). Assuming the vehicle matches driver demand torque, the system is defined by two inputs. By using specific definitions of the system variables, the optimization reduces to two one-degree-of-freedom problems. A common method is to treat the two control inputs as engine speed and engine power. Suppose instead we choose engine speed $\omega_{\rm ICE}$ and electrical power $P_{\rm elec}$, a slightly different definition. This allows a major decoupling of the system dynamics. The evolution of SOC is now only dependent on $P_{\rm elec}=\mathhat{u}$ and completely independent of $\omega_{\rm ICE}=\bar{u}$. The engine speed that results in minimum fuel use for a given $P_{\rm elec}$ can be calculated off-line because it is independent of SOC. This results in engine fuel consumption as a 1-D function of power $\mathhat{c}(x,\mathhat{u})=\mathhat{c}(x,P_{\rm elec})$, rather than the standard 2-D functions of power and speed $c(x,\mathhat{u},\bar{u})=\mathhat{c}(x,P_{\rm elec},\omega_{\rm ICE})$.

One of the most well known optimization methods for energy management in HEVs is the “equivalent consumption minimization strategy” (ECMS) [19], [42]. This method optimizes for fuel economy only, which is equivalent to taking the running cost in (8) as fuel flow rate, that is, $c=\mathdot{m}_{f}$. ECMS is popular among academics because it requires little computation and seems easy to implement. At each time step, the controller minimizes a function that trades off battery usage versus fuel TeX Source $$u_{k}^{\ast}(x)=\mathop{\arg\min}_{u\in U}\left[\mathdot{m}_{f}(x,u)+\lambda_{k}\Delta {\rm SOC}(x,u)\right].\eqno{\hbox{(25)}}$$ The design parameter is the weighting factor $\lambda_{k}$, which represents the relative value of battery charge in terms of fuel. In actual practice, a real difficulty arises because, unless $\lambda_{k}$ is carefully chosen, the vehicle will not be charge sustaining. The required values for $\lambda_{k}$ are highly cycle dependent and typically require on-line estimation.

It is now shown for the fuel only case that (8) and (9) of the SP-SDP algorithm yield a form very similar to (25) for the computation of the optimal control, with the added benefit that the SP-SDP algorithm automatically adjusts the weighting function. First, note that the state can be taken as $x=[{\rm SOC},\bar{x}]^{\prime}$, where $\bar{x}$ consists of vehicle velocity and acceleration. $\bar{x}$ is independent of the control input $u$ because vehicle acceleration is defined by the stochastic driver. The model (6) can thus be expressed in the form TeX Source $${{\rm SOC}_{k+1}\brack\hfill\bar{x}_{k+1}}={{\rm SOC}_{k}+\Delta {\rm SOC}({\rm SOC}_{k},\bar{x}_{k},u_{k})\brack\bar{f}(\bar{x}_{k},w_{k})\hfill}.\eqno{\hbox{(26)}}$$

Next, let $V^{\ast}$ be the optimal cost to go function in the Bellman equation (8), and define TeX Source $$Q(\sigma,\bar{x})=E_{w}\left[V^{\ast}\left({\sigma\brack\bar{f}(\bar{x},w)}\right)\right]$$ for an arbitrary SOC $\sigma$. Substituting the SOC dynamics (26) for $\sigma$, the Bellman equation (8) becomes TeX Source $$\displaylines{V^{\ast}({\rm SOC},\bar{x})=\min_{u\in U}\left[\mathdot{m}_{f}({\rm SOC},\bar{x},u)\right.\hfill\cr\hfill\left.+Q\left({\rm SOC}+\Delta {\rm SOC}({\rm SOC},\bar{x},u),\bar{x}\right)\right].\quad\hbox{(27)}}$$ The running cost $c=\mathdot{m}_{f}$ in (8) is not a function of the random variable $w$ and can be removed from the expectation. From this, the expression for the optimal control becomes TeX Source $$\displaylines{u^{\ast}({\rm SOC},\bar{x})=\mathop{\arg\min}_{u\in U}\left[\mathdot{m}_{f}({\rm SOC},\bar{x},u)\right.\hfill\cr\hfill\left.+Q\left({\rm SOC}+\Delta {\rm SOC}({\rm SOC},\bar{x},u),\bar{x}\right)\right].\quad\hbox{(28)}}$$ Doing a first-order Taylor expansion of $Q$ then yields TeX Source $$\displaylines{u^{\ast}({\rm SOC},\bar{x})\approx\mathop{\arg\min}_{u\in U}\left[\mathdot{m}_{f}({\rm SOC},\bar{x},u)+Q({\rm SOC},\bar{x})\right.\hfill\cr\hfill\left.+{\partial Q({\rm SOC},\bar{x})\over\partial {\rm SOC}}\Delta {\rm SOC}({\rm SOC},\bar{x},u)\right].\quad\hbox{(29)}}$$ Recognizing that $Q({\rm SOC},\bar{x})$, being independent of $u$, does not affect the minimization, and substituting $x=[{\rm SOC},\bar{x}]^{\prime}$ into (29), then yields TeX Source $$u^{\ast}(x)\approx\mathop{\arg\min}_{u\in U}\left[\mathdot{m}_{f}(x,u)+{\partial Q(x)\over\partial {\rm SOC}}\Delta {\rm SOC}(x,u)\right].\eqno{\hbox{(30)}}$$

It follows that $\partial Q(x)/\partial {\rm SOC}$ is equivalent to the weighting factor $\lambda$ in (25). The SP-SDP algorithm has the same structure as the ECMS method, but the weighting factor is a function of the state variables, and is automatically updated on-line. There is a variant of ECMS method called adaptive ECMS (A-ECMS) in which the weighting factor is also allowed to change over time based on the current driving conditions [19]. A-ECMS is even more similar to the SP-SDP algorithm in that both methods have a weighting factor that is updated on-line as a function of the state.

This relationship is illustrated by again studying the value function $V(x)$ as a function of SOC for fixed acceleration and velocity shown in Fig. 6. The local slope of $V(x)$ in the figure is closely related to the weighting factor in (30), which, once again, is analogous to $\lambda$ in (25). Fundamentally, all fuel-minimizing control algorithms must estimate the value of battery charge in terms of fuel and thus have some equivalent to the weighting factor. It may appear linearly and explicitly as in ECMS, or nonlinearly and implicitly as in SP-SDP. All known information is incorporated in the weighting factor: current state, plant dynamics, and expected future driver demands. Once this weighting factor is determined, the control problem is a simple static optimization.

A basic difference of the algorithms lies in how they estimate the value of battery charge in terms of fuel: ECMS uses a value assigned by the designer; A-ECMS estimates a value based on battery charge and recent history; deterministic dynamic programming uses exact future knowledge; and SP-SDP uses estimates of cycle statistics. A benefit of dynamic programming methods, such as SP-SDP, is that they can optimally accommodate more complicated objectives, such as the fuel and drivability metrics studied here.

The authors would like to thank the reviewers for their helpful suggestions to clarify the explanation of the minimization decomposition.

This work was supported under a National Science Foundation Graduate Research Fellowship. The work of D. F. Opila was supported by NDSEG and NSF-GRFP fellowships. The work of D. F. Opila, J. A. Cook, and J. W. Grizzle was supported by a grant from Ford Motor Company. Portions of this work have appeared in [1], [2], [3].

D. F. Opila and R. B. Gillespie are with the Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI 48109 USA (e-mail: dopila@umich.edu; brentg@umich.edu).

X. Wang and R. McGee are with Ford Motor Company, Dearborn, MI 48126 USA.

J. A. Cook and J. W. Grizzle are with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109 USA (e-mail: jeffcook@umich.edu; grizzle@umich.edu).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

^{1}Note that the model accounts for power loss in both directions.

^{2}Recall that more miles per gallon means better fuel economy, while the inverse would hold if units of liters per 100 kilometers were used.

No Data Available

No Data Available

None

No Data Available

- This paper appears in:
- No Data Available
- Issue Date:
- No Data Available
- On page(s):
- No Data Available
- ISSN:
- None
- INSPEC Accession Number:
- None
- Digital Object Identifier:
- None
- Date of Current Version:
- No Data Available
- Date of Original Publication:
- No Data Available

Normal | Large

- Bookmark This Article
- Email to a Colleague
- Share
- Download Citation
- Download References
- Rights and Permissions