Artificial intelligence is rapidly becoming a critical component of modern digital systems. While much attention has focused on model accuracy and large-scale training in centralized data centers, an equally important factor is often overlooked: latency. In many real-world applications, the value of AI depends not only on how accurate the model is, but also on how quickly it can produce a decision. This talk explores how low-latency AI inference is emerging as a new workload class for Internet infrastructure. We begin with a simple observation: in the age of AI, outcomes increasingly depend on both accuracy and response time. This principle can be seen across domains. In the public sector, modern unmanned systems combine remote human control with elements of autonomy and AI-assisted targeting, where decision speed can be critical. In the private sector, applications such as visual product search in mobile eCommerce rely on machine vision models that must return results quickly to maintain user engagement. These requirements raise important questions for network operators. If AI inference must operate with very low latency, where should it run? While model training will continue to reside in centralized data centers, inference workloads may increasingly move closer to users at CDN points of presence or ISP edge facilities. From an operator’s perspective, this shift introduces several new considerations. Edge environments that historically relied on CPU-based workloads may begin to incorporate GPUs as new infrastructure components. Power consumption, while manageable for inference workloads, becomes an operational factor. Network architecture—such as IPv6 deployment and latency optimization—directly influences AI performance. In addition, edge software stacks designed around CPUs must evolve to coordinate scarce GPU resources efficiently. Finally, as AI becomes embedded in operational and decision-making systems, traditional security principles remain essential. AI models are only as reliable as their training data, and infrastructure security continues to play a critical role in ensuring trustworthy outcomes. This talk discusses the emerging Age of AI Inference at the Edge and its implications for network operators, CDN platforms, and Internet infrastructure.

Akamai
Alex Leung is Akamai's Senior Enterprise Architect where he serves as a trusted advisor for leading broadcasters, helping them to transform their services in adaptation to the OTT delivery trend, and navigate the complex array of technologies therein. Over the past years at Akamai, he has led media consultancy projects that helped regional broadcasters optimize their media streaming operations in preparation for large events, including World Cup 2018 and Indian Premier League 2019. Prior to joining Akamai, Alex led a number of challenging projects through his 20-year career, ranging from a video-on-demand e-learning platform for Hong Kong Police Force to an image search engine based on Apache SOLR. He holds a Master in Applied Physics from Stanford University and a Bachelor Degree in Engineering Physics from Cornell University.
主持人 / 王彥傑
資訊局局長/趙式隆、詹婷怡、余若凡、王彥傑、郭奕豪
Jack Kwok、Achie
Steve Crocker
Edgemoor Research Institute
Tony Smith
APNIC
梁增偉
Akamai
Bastien Claeys
Nokia
Stanley Chen
Tomoki Yoshikawa
Home NOC Operators Group
Philip Paeps
Alternative Enterprises
Masataka Mawatari
JPIX
Yoshinobu Matsuzaki
IIJ
Tashi Phuntsho
FLEXOPTIX
岑育霖
RETN
Taisuke Sato
Seiko Solutions
Scott Fisher
Team Cymru
Pavel Odintsov
FastNetMon LTD