Making Real-Time decisions in dynamic systems – Target seeking methods inspired by Gazing Heuristics

In most decision-making situations, a firm is regularly confronted with the need to balance multiple objectives. Negotiation is often considered as a valuable approach for achieving this balance. The object of these negotiations would be to ascertain the relative importance of the various concerns and design relative weights which can be used while determining the optimal actions. This approach is effective, popular and applicable when there is sufficient time between decisions. However it is infeasible to implement in high frequency decision making situations.

Let’s take, for instance, the ride sharing setting. One of the strategic decisions could be in determining the optimal number of drivers in the system. This decision hinges on the number of drivers being proportional to the expected passenger demand and the desired service levels. However, more drivers also imply reduced earnings for each. We can imagine the passenger services and the driver onboarding managers negotiating the relative importance of these concerns with the core goal of ensuring the sustainability of the platform.  

These decisions might not be one-off. In fact, managers may frequently have to revisit the tradeoff by observing the performance over time, adjusting their positions appropriately. Firstly, this is a natural way to deal with varying concerns and, secondly, since these decisions are tactical in nature, it is feasible for the managers to learn from the performance feedback and gently finetune their positions. Such an approach is frequently used in other domains too for instance, in healthcare, a clinic/hospital might wish to determine appropriate number of doctors or nurses to balance their staffing costs whilst ensuring appropriate care and service levels for their patients.  

However, in an operational setting where decisions must be made rapidly, negotiations might be inefficient and often infeasible. For instance, imagine the decision process of matching passengers to drivers in the ride-share setting—a high-frequency decision process (wherein matching decisions are made multiple times a second) in a dynamic setting where passenger demand and driver’s locations are constantly in flux. 

Ride sharing platforms, such as Uber, Grab, DiDi etc., must aim to ensure high quality matches of passengers who request a ride with drivers who wish to ferry the passenger for a fee. In its simple form the three parties have varying primary interests: passenger want minimal waiting times before pickup (lower waiting time implying better passenger experience), drivers might prefer long passenger trips (longer trip implying higher fares and better earnings), and the platform might want to maximize number of successful matchings (for higher revenues).  

As an example of how the different stakeholder’s objectives conflict, observe the example in figure 2. If the platform tries to assign nearest drivers to passengers (Case A) by minimizing pickup distance, the drivers will earn less than when it tries to maximize revenue (Case B). While maximising driver earnings may improve driver satisfaction, this would come at the cost of longer waiting time for passengers. This approach, however, does not guarantee similar performance in the long run since each matching decision is made myopically, focusing on short-term gains rather than considering the broader implications for all stakeholders involved.  

Figure 2: Optimal matching solutions with different objectives

Motivated by this class of problems, Lyu et al. explore these trade-offs in detail in their paper “Multi-Objective Stochastic Optimization: A Case of Real-Time Matching in Ride-Sourcing Markets”. The authors firstly observe that complex tradeoffs in operational setting can be handled efficiently by setting targets for each of the objectives. Targets are commonly used in operational settings wherein the expected achievement is not instantaneous results, but the performance over a period of time—for instance, daily earnings of drivers, average wait times over numerous trips etc.  

However, even with a known target, seeking it in a dynamic environment is a non-trivial task with complications arising when the target is stretched and not achievable on all fronts. To handle this, the authors explore the concept of target-based optimal solution as illustrated in Figure 3. In this framework, we assume that there are two objectives to be maximized, with the shaded area characterizing the attainable region. In scenario (a), if the pre-determined multi-objective target is deemed unattainable, the target-based optimal solution achieves the point that is closest to the unattainable target. In scenario (b), if the pre-determined target is deemed attainable, the target-based optimal solution achieves the target itself.  

Figure 3 – Target based optimal solution

To solve this problem in an online setting, the authors propose an algorithm that adapts to the dynamics of the environment by adjusting the model parameters in an adaptive fashion. The algorithm is inspired by the physical phenomenon called the ‘gazing heuristic’. This theory is based on observation of how humans and other animals behave in nature.

McLeod & Dienes’ via wikipedia

As a simple example, say a catcher wants to catch a ball. To catch the ball, the person fixes their gaze on the ball (this is the target) and runs to catch the ball in such a way that the angle of gaze remains constant. This can be achieved by adjusting their stride length, speed, direction etc., or equivalently by tracking the distance from the catcher’s real-time position to the ball. Gaze Heuristic is a common logic used by both human beings and animals to adaptively interact with the environment. This approach has also been used to handle more complicated tasks such as directing combat pilots in an airborne dogfight.

Figure 4: Gazing Heuristic Analogy  

The Adaptive Matching (AM) policy thus derived is shown to achieve the target-based optimal solution, with numerous experiments validating the effectiveness of its approach in practice. Since the publication of this paper, there has been considerable interest in this work. Notablly, researchers from Meituan, a pioneer and leader in the gig economy space in China, have implemented this methodology into their own real-time dispatching algorithms.  

This work was conferred the 2024 INFORMS TSL Outstanding Paper under the SIG Urban Transportation as an outstanding paper in the field of Transportation Science and Logistics.

Meituan’s Real-Time Intelligent Dispatching Algorithms Build the World’s Largest Minute-Level Delivery Network 

Effectively balancing multiple objectives in dynamic decision-making environments, such as ride-sharing platforms, requires adaptive strategies. By leveraging insights from the gazing heuristic, the AM policy enhances operational efficiency and demonstrates practical relevance. This research significantly advances the understanding of multi-objective optimization in real-time systems.