Publications

• Sorted by Date • Classified by Publication Type • Classified by Research Category •

Maximizing the Probability of Arriving on Time: A Practical Q-Learning Method

Zhiguang Cao, Hongliang Guo, Jie Zhang, Frans Oliehoek, and Ulrich Fastenrath. Maximizing the Probability of Arriving on Time: A Practical Q-Learning Method. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI), pp. 4481–4487, February 2017.

Download

pdf [325.8kB]

Abstract

The stochastic shortest path problem is of crucial importancefor the development of sustainable transportation systems.Existing methods based on the probability tail model seekfor the path that maximizes the probability of arriving at thedestination before a deadline. However, they suffer from lowaccuracy and/or high computational cost. We design a novelQ-learning method where the converged Q-values have thepractical meaning as the actual probabilities of arriving ontime so as to improve accuracy. By further adopting dynamicneural networks to learn the value function, our method canscale well to large road networks with arbitrary deadlines.Experimental results on real road networks demonstrate thesignificant advantages of our method over other counterparts.

BibTeX Entry

@inproceedings{Cao17AAAI,
    title =     {Maximizing the Probability of Arriving on Time: A Practical Q-Learning Method},
    author =    {Zhiguang Cao and Hongliang Guo and Jie Zhang and Frans Oliehoek and Ulrich Fastenrath},
    booktitle = AAAI17,
    year =      {2017},
    month =     feb,
    pages =      {4481--4487},
    url =       {https://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14308},
    abstract = {
The stochastic shortest path problem is of crucial importance
for the development of sustainable transportation systems.
Existing methods based on the probability tail model seek
for the path that maximizes the probability of arriving at the
destination before a deadline. However, they suffer from low
accuracy and/or high computational cost. We design a novel
Q-learning method where the converged Q-values have the
practical meaning as the actual probabilities of arriving on
time so as to improve accuracy. By further adopting dynamic
neural networks to learn the value function, our method can
scale well to large road networks with arbitrary deadlines.
Experimental results on real road networks demonstrate the
significant advantages of our method over other counterparts.
    }
}

Generated by bib2html.pl (written by Patrick Riley) on Wed Jun 17, 2026 13:52:26 UTC