Dynamic scheduling with cancellations : an application to chemotherapy appointment booking

Article History: Received 27 April 2017 Accepted 02 March 2018 Available 22 April 2018 We study a dynamic scheduling problem that has the feature of due dates and time windows. This problem arises in chemotherapy scheduling where patients from different types have specific target dates along with time windows for appointment. We consider cancellation of appointments. The problem is modeled as a Markov Decision Process (MDP) and approximately solved using a direct-search based approximate dynamic programming (ADP) technique. We compare the performance of the ADP technique against the myopic policy under diverse scenarios. Our computational results reveal that the ADP technique outperforms the myopic policy on majority of problem sets we generated.


Introduction
Dynamic scheduling problems arise in diverse fields such as manufacturing, transportation, project management, and healthcare.In these problems, unlike static scheduling problems, jobs arrive randomly at the system.Depending on the field, jobs could be orders, patients, and tasks.Arriving jobs are allocated to resources such as machines, operating rooms, and surgery rooms.Dynamic scheduling problems have received considerable attention in the operations research field in the last decade.
One of the recent variations of dynamic scheduling problems considers jobs with target dates and time windows.In these problems, jobs arriving randomly at a facility are ideally scheduled to specific target dates; if this is not possible, they are scheduled to days within a time window.Scheduling jobs outside their time windows results in penalty.These problems typically arise in chemotherapy appointment booking where jobs are considered as patients.Patients from each type have specific target dates and tolerance limits.In such settings, inefficient patient scheduling causes excessive wait listing, late patient appointment notifications, pharmacy congestion, unbalanced workload between nurses and considerable clerical work [1].Similar problems also arise in manufacturing where jobs from different types are scheduled for production considering target dates.In such settings, early scheduling causes inventory cost, whereas late scheduling results in penalty cost.Chemotherapy appointment booking takes into account treatment protocols which are designed to maximize the efficacy of a chemotherapy treatment.Treatment protocols specify things such as the drugs to be administered, the dosage, appointment duration, the number of days between treatments, and tolerance limits [1].In chemotherapy settings, appointments may be cancelled due to the fact that patients' protocols may change.It is fair to say that cancellations were generally ignored in the earlier work on chemotherapy appointment booking since the inclusion of cancellations makes the respective mathematical model significantly complex.However, cancellations should be among important features of realistic models.Towards that end, we consider the chemotherapy appointment booking problem with cancellations.
In line with the literature, we model the problem as a discounted infinite horizon Markov Decision Process (MDP).Owing to intractability in state and action spaces, we resort to direct-search based approximate dynamic programming (ADP) to approximately solve the problem.In this paper, we make the following contributions: • We consider chemotherapy patient appointment booking problem with cancellations of treatments.• We formulate the problem as an MDP.
• We employ a direct-search based ADP technique for approximately solving the underlying MDP model.• We compare the performance of the direct-search based ADP technique against a myopic policy.• Our computational results reveal that the direct-search based ADP improves the solution of the myopic policy on majority of problem sets we generated.
The paper is structured as follows.Section 2 reviews the relevant literature.In Section 3, the chemotherapy appointment booking problem with cancellations is described and its MDP formulation is provided.Section 4 includes the description of the Direct search-based ADP.Numerical results are provided in Section 5. Section 6 includes concluding remarks.

Literature review
Patient scheduling has been widely studied in the literature [2][3][4].Green et al. [2] studied the issue of designing the outpatient appointment schedule and establishing dynamic priority rules for admitting patients into service.They formulated the problem as a finite-horizon dynamic program and identified the properties of the optimal policy.Cardoen et al. [3] reviewed operations research papers that discuss operating planning and scheduling.Geng et al. [12] studied dynamic outpatient scheduling for a diagnostic facility with two waitingtime targets.The authors developed a finitehorizon MDP model for this problem, and characterized the optimal scheduling policy by proving the monotonicity and concavity properties of the components of the MDP model.
Tsai and Teng [13] proposed a stochastic appointment system for patients with a dynamic call-in sequence to outpatient clinics with multiple resources.In their model, the schedule for a singleservice period includes a fixed number of blocks of equal length.Their results indicate that their stochastic model was able to schedule patients more efficiently as compared to traditional appointment systems.
There are several studies about patient scheduling on chemotherapy in the literature.These are summarized below.

Problem description and Markov decision process model
We consider the following dynamic patient scheduling problem (see [1] for a similar description).
• We consider an infinite time horizon and a finite rolling booking horizon.The booking horizon is a rolling period of N days.• Patients are classified on the basis of their appointment tolerances.• Each day patients of each type with specific target dates arrive at the facility randomly.Arrival distributions are assumed to be stationary, and arrivals across patient types and target dates are assumed to be independent.
• In line with the literature [1], we assume that each appointment requires one appointment slot.• At the end of each day, scheduling to future days over a booking horizon is performed for arriving patients or those patients are diverted (i.e., they are sent to another hospital or served through overtime).(As stated in [1], diversion can be thought as overtime in chemotherapy settings or outsourcing in the manufacturing setting).Diversion/overtime capacity is assumed to be significantly higher than the maximum number of patients that need to be diverted on any day.• There is no cost incurred for scheduling patients to days within their tolerance limits, whereas scheduling a patient to a day outside the tolerance limit results in a type-dependent scheduling cost per day.• Diverting patients or serving them through overtime results in diversion cost, which is the same for patients of each type.• We consider cancellation of appointments.
In line with the related literature [18] we assumed that once an appointment is cancelled, it is not rescheduled to a future date.This is due to the fact that rescheduling cancelled appointments to a future day would further complicate the problem.• The objective is to perform scheduling of arriving patients to available days (or divert them) in a booking horizon so as to minimize total discounted expected cost.
In line with the literature [1,2,19], we assume that scheduling costs for patients of each type are linear in the number of days outside their time window.The function of these costs is to penalize the system for violating time windows.As stated in [1], higher scheduling costs are incurred for patients with lower tolerances.The function of diversion costs is to penalize the system for not scheduling a patient [1].
It is worth noting that in our problem, the diversion is not necessarily considered as overtime.Utilizing overtime means the appointment can be dealt with through overtime on the day the decision is given.Hence, in line with the literature [1], we do not consider scheduling an appointment to a future day and at the same time performing it through overtime.where I is the number of patient types, x ikn for i = 1, . . ., I, k = 1, . . ., N , n = 1, . . ., N is number of type-i patients with target date k scheduled to day n, y ik for i = 1, . . ., I, k = 1, . . ., N is number of type-i patients with target date k waiting to be scheduled.

Action sets
The decision to be made at the end of each day is to determine patients to book on specific days and diverted patients with specific target dates.A x;y denotes the set of available actions in state s = (x; y).Any action in A xy is represented by: a = (a ikn , z ik ), where a ikn is the number of type-i patients with target date k to book on day n, z ik is the number of diverted type-i patients with target date k.
The set of feasible actions must satisfy the following constraints: where C 1 is daily resource capacity and C 2 is maximum number of patients diverted or served through overtime each day.Constraint 1 ensures that total number of patients scheduled on each day is limited by daily capacity.Constraint 2 dictates that total number of patients diverted (or served through overtime) is limited by diversion capacity.Constraint 3 ensures that all of the arriving patients must be either scheduled or diverted.

Transition probabilities
After all scheduling actions are taken, the number of new requests for each type of patient and number of treatments of each type that are cancelled affect the transition to the next state of the system.
Let d ikn be the number of type-i patients with target date k booked on day n that are cancelled, and P r(q ik ) be the probability that q ik new requests for type-i patients arrived.The probability that d ikn treatments for type-i patients with target date k scheduled to day n are cancelled is represented by the term Q ikn (d ikn ), which is expressed as follows: where p ikn is the probability that a treatment of type i with target date k booked to day n is cancelled.On selecting decision a in state s, one component of the next state equals with probability Hence the state changes according to if s ′ satisfies Eqs. ( 4), (5), and (6).
i = 1, . . ., I; k = 1, . . ., N Equations ( 4) and (5) show the new number of type-i patients with target date k booked on day n.Equation ( 6) defines the new number of type-i patients with target date k waiting to be scheduled.

Costs
As stated in [1], the cost of scheduling a type-i patient with target date k to day n is denoted b(i, k, n).It is given by where L i for i = 1, . . ., I and U i for i = 1, . . ., I are lower and upper tolerance limits for type-i patients, respectively.c i 1 for i = 1, . . ., I and c i 2 for i = 1, . . ., I are unit early and unit late costs for type-i patients, respectively.The immediate cost is expressed as (8) where d(i) for i = 1, . . ., I is a per unit penalty cost for diverting a type-i patient.

Bellman's equations
Discounting with discount factor λ is assumed in our model.Bellman's equations for finding a policy that minimizes the expected infinite horizon discounted cost are expressed as follows: where D is the set of all possible demand vectors.However, our MDP model is intractable owing to the fact that state and action spaces grow exponentially with the number of patient types and the length of the booking horizon.Therefore we resort to approximate dynamic programming (ADP) for approximately solving our model.The ADP technique we utilized in this research is described next.

Direct-search based approximate dynamic programming
ADP has been extensively used for solving intractable MDPs in diverse fields such as manufacturing, healthcare, and revenue management [20].ADP techniques are mainly classified as linear programming (LP)-based ADPs and simulationbased ADPs.In the LP-based ADP approach, the underlying MDP model is transformed into the equivalent LP version of Bellman's equations, and then approximate value function is used to make the LP model tractable [21][22][23].Whereas simulation-based ADP techniques find an approximate solution to the Bellman's equations by simulating the evolution of the system over a number of initial states in order to tune the parameters [24,25].They employ simulation models such as statistical sampling and reinforcement learning methods for estimating the value functions [26].
In ADP, the value function is approximated through a combination of basis functions, which represent some important features of the state of the system.There are certain ways for doing so, and one of them is linear approximation, which takes the following form: where r k for k = 1, . . ., K are tuning parameters and Φ k (s) for k = 1, . . ., K are basis functions.
After the value function is approximated, an ADP policy is obtained by tuning the approximation parameters iteratively.In particular, the goal of ADP techniques is to find the optimal parameter vector that minimizes a certain performance metric such as the sum of squared differences between the approximate cost-to-go function and the estimated cost-to-go function over sampled states [27].The resulting optimization problem is generally solved using regression-based techniques [20,24].However, we utilize an approach that solves an optimization problem for tuning the ADP parameters to achieve the best policy resulting from those parameters.We then obtain the ADP policy using the approximate value functions.

Retrieving the ADP Policy from the Approximate Value Function
After the end of the parameter tuning phase which makes the approximate value of a given state available, the ADP policy is retrieved by computing a decision vector for any desired state of the system.As stated in [27], that decision vector is myopic with respect to the value function approximation of our MDP.The decision retrieval problem for a particular state s of our MDP model is given by min where Ṽ (x ′ , y ′ ) is the approximate value of state (x ′ , y ′ ).
Details about basis functions for the ADP implementation are discussed next.

Basis Functions
The basis function we chose for the ADP technique are as follows: x i,kk,n − a 3kn ) 2 /32 x i,kk,n − a 3kn ) 2 /32 x i,kk,n − a 3kn ) 2 /32 Our basis function utilizes available capacity for each day in retrieving the ADP policy and dictates that patients with lower tolerance limits have higher priority when scheduling is performed.The first term corresponds to available capacity for tolerance (0, 0), the next three terms correspond to available capacity for tolerance (1, 1), and the next five terms correspond to available capacity for tolerance (2, 2).

Direct Search
We tune parameters using direct search (see [27] and [28] for the implementation of direct search).Unlike regression-based techniques, direct search considers the ultimate goal of finding good policies.In particular, direct search deals with an optimization problem where the variables consist of feasible r's and the objective function value is the expected cost of the policy induced by the corresponding parameter vector [27].The resulting optimization problem is expressed as follows: where T is a random variable denoting the final step of the search, s t is the state at step t of the search, π r is the policy obtained by the parameter vector r, π r (s t ) is the action dictated by the policy π r in the state at stage t, and c(s t , π r (s t )) is immediate cost incurred at step t as a result of choosing π r (s t ).Here, π r is obtained by solving the aforementioned decision retrieval problem via the parameter vector r used to approximate the value function for each possible state visited during the search.The objective function in Eq. ( 11) is the expected cost of the policy π r [27].
During the implementation of the direct searchbased ADP, we let r range from 0 to 10000 in increments of 500.We choose the value based on the best policy performance when decisions are made by solving Equation 10.

Numerical results
This section focuses on the comparison of the performance of the direct search-based ADP policy with that of the myopic policy.The myopic policy is obtained by solving the problem min (a,z)∈Ax,y c(a, z) for any state s ∈ S. We used AMPL with CPLEX 12 for solving all integer programs.

Experimental design
We consider a chemotherapy center where the length of the booking horizon is 10 (for larger problems the length of the booking horizon is set to 20).In line with the literature [1], we set the number of patient types to 3 and tolerance limits to (0,0), (1,1), and (2,2).We assume that arrival distribution is independent Poisson with mean 1 for each type for each day over the booking horizon.Probability that a scheduled treatment is cancelled has three levels: 0.01, 0.05, and 0.1.We assume capacity levels of 25 and 30; the former level corresponds to the low capacity case whereas the latter represents the medium capacity case owing to the fact that the total mean daily demand nearly equals 30 (for larger problems, those levels are set to 55 and 60.).In line with the literature [1], we set c i 1 , the unit early scheduling costs per day to 100, 75, and 50, whereas c i 2 , the unit late scheduling costs per day, is set to 125, 100, and 75.We set diversion cost per patient for each patient type to 50 and 250.The case when the diversion cost equals 50 is referred by [1] as "rigid tolerance case", whereas other choices for the diversion costs are referred as "relaxed tolerance cases".We also consider the cases where early (late) scheduling costs are much higher than late (early) scheduling cost for the relaxed tolerance case.In line with the literature [1], we set the unit early scheduling cost per day for each patient type to 300, 225, and 150, respectively for the high early scheduling cost case whereas the unit late scheduling cost per day for each patient type was set to 250, 200, and 150, respectively for the high late scheduling cost case.Discount factor is set to 0.99.By changing the length of the booking horizon, cancellation probability, levels of capacity, diversion cost, and scheduling cost, we generated 32 problem sets.Simulation run lengths are set to 50.For each problem set, we ran 50 independent simulations.

Results
We present our results in Tables 1-8.Each row of the below tables represents a different problem set.For each problem, we utilized P aired − t test to statistically identify whether the ADP policy or the myopic policy performs better.Total costs obtained by the ADP policy and the myopic policy are listed in the second and third column of each table, respectively.The percentage improvement obtained by the ADP policy over the myopic policy is given in the fourth column of each table.The final column of each table reveals whether there is a statistically significant improvement over the myopic policy.We begin with the analysis of results for the rigid tolerance cases.As mentioned earlier, in the rigid tolerance case diversion cost per patient is 50.The results indicate that in reasonable-sized problems (i.e., N = 10), the ADP policy statistically performs better than the myopic policy except for the case where the level of capacity is 30 and cancellation probability is 0.5 (see Table 1).Additionally, the ADP policy generally performs better in low capacity situations as compared to medium capacity situations.It is also worth noting that the percentage difference between the two policies is significantly high in the case where the level of capacity is medium and cancellation probability is high.Finally, the superiority of the ADP policy over the myopic policy in the rigid tolerance case diminishes in large-sized problems (i.e., N = 20) (see Table 3).In the relaxed tolerance case, the ADP policy significantly performs better than the myopic policy for reasonable-sized problems (see Table 2).The corresponding percentage improvement over the myopic policy is higher than that in the rigid tolerance case.Further, unlike in the rigid tolerance case, the performance of ADP increases when the level of capacity is increased from low to medium level.Yet, for large-sized problems, using ADP results in very small improvement when cancellation probability is small; if this probability is high, the performance of ADP and the myopic policy turns out to be the same (see Table 4).The results for the high early scheduling cost case indicate that the performance of the ADP policy is significantly higher than that of the myopic policy for reasonable-sized problems (see Table 5).The corresponding percentage improvement over the myopic policy is higher as compared to that for the rigid tolerance case.However, the performance of the ADP policy for this case decreases when the problem size increases (see Table 6).We have similar observations for the high late scheduling cost case (see Tables 7 and 8).

Table 1 .
Results for the rigid tolerance case with N = 10.

Table 2 .
Results for the relaxed tolerance case with N = 10.

Table 3 .
Results for the rigid tolerance case with N = 20.

Table 4 .
Results for the relaxed tolerance case with N = 20.

Table 5 .
Results for the relaxed tolerance high early scheduling case with N = 10.

Table 6 .
Results for the relaxed tolerance high early scheduling case with N = 20.

Table 7 .
Results for the relaxed tolerance high late scheduling case with N = 10.

Table 8 .
Results for the relaxed tolerance high late scheduling case with N = 20.We studied a chemotherapy appointment booking problem where patients have specific target dates, are classified based on their tolerance limits, and are scheduled to days in advance.Unlike the relevant literature, we considered cancellations of treatments.We provided an MDP formulation of this problem and because of huge state and action spaces, we approximately solved the problem using a direct search-based ADP.Direct search-based ADP can be considered as a relatively new technique among other ADP techniques.In this research, we demonstrated that it can be a viable method for solving chemotherapy appointment booking problems.In particular, our work revealed that the performance of the myopic policy can be significantly improved through the implementation of direct search-based ADP.It is worth noting that further improvements may be achieved by trying various basis functions as part of direct search-based ADP.Cancellation of the treatments in the chemotherapy appointment booking problem with target dates and tolerance limits were not considered in the relevant literature because of resulting computational challenges.By including cancellations into our model, we filled this gap in this area.As a future research, overbooking and no-shows can be incorporated into our model and direct searchbased ADP can be utilized for solving other dynamic scheduling problems.