Archives AI News

Uber and Momenta will test fully driverless cars in Germany

16322427935575f54072fbbf51

Uber and one of its many robotaxi partners, Momenta, will test fully driverless cars in Germany next year. The news comes as Europe continues to lag behind the US and China in the number of commercially operational robotaxi services. The companies say they will test Level 4 autonomous vehicles in Munich starting in 2026. (Level […]

Driving for the Horizon: New Android Automotive solution on cloud offers faster builds

image1 ZYbtIRH.max 600x600 1

The automotive industry is in the midst of a profound transformation, accelerating towards an era of software-defined vehicles (SDVs). This shift, however, presents significant challenges for manufacturers and suppliers alike. Their priority is making great vehicles, not great software, though the latter now contributes — and is increasingly a necessity — to achieve the former. These OEMs must find ways to bring greater efficiency and quality to their software delivery and establish new collaboration models, among other hurdles to achieving their visions for SDVs.  To help meet this moment, we’ve created Horizon, a new open-source software factory for platform development with Android Automotive OS — and beyond. With Horizon, we aim to support the software transformation of the automotive industry and tackle its most pressing challenges by providing a standardized development toolchain so OEMs can generate value by focussing on building products and experiences. In early deployments at a half-dozen automotive partners, we’ve already seen between 10x to 50x faster feedback for developers, leading to high-frequency releases and higher build quality. In this post we will outline how Horizon helps overcome the key impediments to automotive software transformation. The Roadblocks to Innovation in Automotive Software Development Today, traditional automotive manufacturers (OEMs) often approach software development from a hardware-centric perspective that lacks agility and oftentimes struggles to scale. This approach makes software lifecycle support burdensome and is often accompanied by inconsistent and unreliable tools, slowing down development.  OEMs face exploding development costs, quality issues and slow innovation, making it difficult to keep pace with new market entrants and the increasing demand for advanced features. Furthermore, most customers expect frequent, high-quality over-the-air (OTA) software updates similar to what they receive on other devices such as on their smartphones, forcing most OEMs to mirror the consumer electronics experience.  But a car is not a television or refrigerator or even a rolling computer, as many now describe them. Vehicles are made up of many separate, highly complex systems, which typically require the integration of numerous components from multiple suppliers who often provide "closed box" solutions. Even as vehicles have become more connected, and dependent on these connective systems for everything from basic to advanced operations, the vehicle platform has actually become harder, not easier, to integrate and innovate with.  We knew there had to be a better way to keep up with the pace necessary to provide a great customer experience. Introducing HORIZON: A Collaborative Path Forward To tackle these pressing industry challenges, Google and Accenture have initiated Horizon. It is an open-source reference development platform designed to transform the automotive industry into a software-driven innovation market.  Our vision for Horizon is enabling automakers and OEMs to greatly accelerate their time to market and increase the agility of their teams while significantly reducing development costs. Horizon provides a holistic platform for the future of automotive software, enabling OEMs to invest more in innovation rather than just integration. Key Capabilities Driving Software Excellence Horizon offers a comprehensive suite of capabilities, establishing a developer-centric, cloud-powered, and easy-to-adopt open industry standard for embedded software. 1. Software-First Development with AAOS Horizon champions a virtual-first approach to product design, deeply integrating with Android Automotive OS (AAOS) to empower software-led development cycles. This involves the effective use of the vehicle hardware abstraction layer (VHAL), virtio, and high-fidelity cloud-based virtual devices like Cuttlefish, which can scale to thousands of instances on demand. This approach allows for scalable automated software regression tests, elastic direct developer testing strategies, and can be seen as the initial step towards creating a complete digital twin of the vehicle. 2. Streamlined Code-Build-Test Pipeline Horizon aims to introduce a standard for the entire software development lifecycle: Code: It supports flexible and configurable code management using Gerrit, with the option to use GerritForge managed service via our Google Cloud Marketplace for productive deployments. With Gemini Code Assist, integrated in Cloud Workstations, you can supercharge development by leveraging code completion, bug identification, and test generation, while also aiding in explaining Android APIs. Build: The platform features a scaled build process that leverages intelligent cloud usage and dynamic scaling. Key to this is the caching for AAOS platform builds based on warmed-up environments and the integration of the optimized Android Build File System (ABFS), which can reduce build times by more than 95% and allow full builds from scratch in one to two minutes with up to 100% cache hits. Horizon supports a wide variety of build targets, including Android 14 and 15, Cuttlefish, AVD, Raspberry Pi devices, and the Google Pixel Tablet. Build environments are containerized, ensuring reproducibility. Test: Horizon enables scalable testing in Google Cloud with Android’s Compatibility Test Suite (CTS), utilizing Cuttlefish for virtualized runtime environments. Remote access to multiple physical build farms is facilitated by MTK Connect, which allows secure, low-latency interaction with hardware via a web browser, eliminating the need for hardware to be shipped to developers. 3. Cloud-Powered Infrastructure Built on Google Cloud, Horizon ensures scalability and reliability. Deployment is simplified through tools like Terraform, GitOps and Helm charts, offering a plug-and-play toolchain and allowing for tracking the deployment of tools and applications to Kubernetes. Unlocking Value for Auto OEMs and the Broader Industry The Horizon reference platform delivers significant benefits for Auto OEMs: Reduced costs: Horizon offers a reduction in hardware-related development costs and an overall decrease in rising development expenses. Faster time to market: By accelerating development and enabling faster innovation cycles, Horizon helps OEMs reduce their time to market and feature cycle time. Increased quality and productivity: The platform enables stable quality and boosts team productivity by providing standardized toolsets and fostering more effective team collaboration. Enhanced customer experience: By enabling faster, more frequent and higher-quality builds, OEMs can change the way they develop vehicle software, thus offering enhanced customer experiences and unlocking new revenue streams through software-driven services. Strategic focus: Horizon underpins the belief that efficient software development platforms should not be a point of differentiation for OEMs; instead, their innovation should be focused on the product itself. This allows OEMs to devote more time and resources to software development with greater quality, efficiency, and flexibility. Robust ecosystem: To ensure scalable, secure, and future-ready deployments across diverse vehicle platforms, Horizon aims to foster collaboration between Google, integration partners, and platform adopters. While advancing the reference platform capabilities, Horizon also allows for tailored integration and compatibility with vehicle hardware, legacy systems and compliance standards. The Horizon ecosystem It’s been said that the best software is the one you don’t notice, so seamless and flawless is its functioning. This is especially true when it comes to the software-defined vehicle, where the focus should be on the road and the joy of the trip. This is why we believe the platforms enabling efficient software development shouldn’t be differentiating for automakers — their vehicles should be. Like a solid set of tires or a good sound system, software is now essential, but it’s not the product itself. That is the full package put together by the combination of design, engineering, development, and production. Because software development is now such an integral part of that process, we believe it should be an enabler, not a hindrance, for automakers. To that end, the Google Cloud, Android, and Accenture teams have continuously aimed to simplify access and the use of relevant toolchain components. The integration of OpenBSW and the Android Build File System (ABFS) are just the latest waypoints in a journey that started with GerritForge as providing a managed Gerrit offering, and continuing with additional partners in upcoming releases. Please, join us on this journey. We invite you to become a part of the community to receive early insights, provide feedback, and actively participate in shaping the future direction of Horizon. You can also explore our open-source releases on Github to evaluate and customize the Horizon platform by deploying it in your Google Cloud environment and running reference workloads. Horizon is a new dawn for the future of automotive software, though we can only get there together, through open collaboration and cloud-powered innovation.  A special thanks to a village of Googlers and Accenture who delivered this, Mike Annau, Ulrich Gersch, Steve Basra, Taylor Santiago, Haamed Gheibi, James Brook, Ta’id Holmes, Sebastian Kunze, Philip Chen, Alistair Delva, Sam Lin, Femi Akinde, Casey Flynn, Milan Wiezorek, Marcel Gotza, Ram Krishnamoorthy, Achim Ramesohl, Olive Power, Christoph Horn, Liam Friel, Stefan Beer, Colm Murphy, Robert Colbert, Sarah Kern, Wojciech Kowalski, Wojciech Kobryn, Dave M. Smith, Konstantin Weber, Claudine Laukant, Lisa Unterhauser — Opening image created using Imagen 4 with the prompt: Generate a blog post header image for the following blog post, illustrating the concept of a software-defined vehicle <insert the first six paragraphs>.

Any-Step Density Ratio Estimation via Interval-Annealed Secant Alignment

arXiv:2509.04852v1 Announce Type: new Abstract: Estimating density ratios is a fundamental problem in machine learning, but existing methods often trade off accuracy for efficiency. We propose textit{Interval-annealed Secant Alignment Density Ratio Estimation (ISA-DRE)}, a framework that enables accurate, any-step estimation without numerical integration. Instead of modeling infinitesimal tangents as in prior methods, ISA-DRE learns a global secant function, defined as the expectation of all tangents over an interval, with provably lower variance, making it more suitable for neural approximation. This is made possible by the emph{Secant Alignment Identity}, a self-consistency condition that formally connects the secant with its underlying tangent representations. To mitigate instability during early training, we introduce emph{Contraction Interval Annealing}, a curriculum strategy that gradually expands the alignment interval during training. This process induces a contraction mapping, which improves convergence and training stability. Empirically, ISA-DRE achieves competitive accuracy with significantly fewer function evaluations compared to prior methods, resulting in much faster inference and making it well suited for real-time and interactive applications.

ATHAR: A High-Quality and Diverse Dataset for Classical Arabic to English Translation

arXiv:2407.19835v2 Announce Type: replace-cross Abstract: Classical Arabic represents a significant era that encompasses the golden age of Arab culture, philosophy, and scientific literature. With a broad consensus on the importance of translating these literatures to enrich knowledge dissemination across communities, the advent of large language models (LLMs) and translation systems offers promising tools to facilitate this goal. However, we have identified a scarcity of translation datasets in Classical Arabic, which are often limited in scope and topics, hindering the development of high-quality translation systems. In response, we present the ATHAR dataset, which comprises 66,000 high-quality classical Arabic to English translation samples that cover a wide array of topics including science, culture, and philosophy. Furthermore, we assess the performance of current state-of-the-art LLMs under various settings, concluding that there is a need for such datasets in current systems. Our findings highlight how models can benefit from fine-tuning or incorporating this dataset into their pretraining pipelines. The dataset is publicly available on the HuggingFace Data Hub: https://huggingface.co/datasets/mohamed-khalil/ATHAR.

SparkUI-Parser: Enhancing GUI Perception with Robust Grounding and Parsing

arXiv:2509.04908v1 Announce Type: new Abstract: The existing Multimodal Large Language Models (MLLMs) for GUI perception have made great progress. However, the following challenges still exist in prior methods: 1) They model discrete coordinates based on text autoregressive mechanism, which results in lower grounding accuracy and slower inference speed. 2) They can only locate predefined sets of elements and are not capable of parsing the entire interface, which hampers the broad application and support for downstream tasks. To address the above issues, we propose SparkUI-Parser, a novel end-to-end framework where higher localization precision and fine-grained parsing capability of the entire interface are simultaneously achieved. Specifically, instead of using probability-based discrete modeling, we perform continuous modeling of coordinates based on a pre-trained Multimodal Large Language Model (MLLM) with an additional token router and coordinate decoder. This effectively mitigates the limitations inherent in the discrete output characteristics and the token-by-token generation process of MLLMs, consequently boosting both the accuracy and the inference speed. To further enhance robustness, a rejection mechanism based on a modified Hungarian matching algorithm is introduced, which empowers the model to identify and reject non-existent elements, thereby reducing false positives. Moreover, we present ScreenParse, a rigorously constructed benchmark to systematically assess structural perception capabilities of GUI models across diverse scenarios. Extensive experiments demonstrate that our approach consistently outperforms SOTA methods on ScreenSpot, ScreenSpot-v2, CAGUI-Grounding and ScreenParse benchmarks. The resources are available at https://github.com/antgroup/SparkUI-Parser.

Empowering Bridge Digital Twins by Bridging the Data Gap with a Unified Synthesis Framework

arXiv:2507.05814v3 Announce Type: replace-cross Abstract: As critical transportation infrastructure, bridges face escalating challenges from aging and deterioration, while traditional manual inspection methods suffer from low efficiency. Although 3D point cloud technology provides a new data-driven paradigm, its application potential is often constrained by the incompleteness of real-world data, which results from missing labels and scanning occlusions. To overcome the bottleneck of insufficient generalization in existing synthetic data methods, this paper proposes a systematic framework for generating 3D bridge data. This framework can automatically generate complete point clouds featuring component-level instance annotations, high-fidelity color, and precise normal vectors. It can be further extended to simulate the creation of diverse and physically realistic incomplete point clouds, designed to support the training of segmentation and completion networks, respectively. Experiments demonstrate that a PointNet++ model trained with our synthetic data achieves a mean Intersection over Union (mIoU) of 84.2% in real-world bridge semantic segmentation. Concurrently, a fine-tuned KT-Net exhibits superior performance on the component completion task. This research offers an innovative methodology and a foundational dataset for the 3D visual analysis of bridge structures, holding significant implications for advancing the automated management and maintenance of infrastructure.

Towards Ontology-Based Descriptions of Conversations with Qualitatively-Defined Concepts

arXiv:2509.04926v1 Announce Type: new Abstract: The controllability of Large Language Models (LLMs) when used as conversational agents is a key challenge, particularly to ensure predictable and user-personalized responses. This work proposes an ontology-based approach to formally define conversational features that are typically qualitative in nature. By leveraging a set of linguistic descriptors, we derive quantitative definitions for qualitatively-defined concepts, enabling their integration into an ontology for reasoning and consistency checking. We apply this framework to the task of proficiency-level control in conversations, using CEFR language proficiency levels as a case study. These definitions are then formalized in description logic and incorporated into an ontology, which guides controlled text generation of an LLM through fine-tuning. Experimental results demonstrate that our approach provides consistent and explainable proficiency-level definitions, improving transparency in conversational AI.

Unveiling the Response of Large Vision-Language Models to Visually Absent Tokens

arXiv:2509.03025v2 Announce Type: replace-cross Abstract: Large Vision-Language Models (LVLMs) generate contextually relevant responses by jointly interpreting visual and textual inputs. However, our finding reveals they often mistakenly perceive text inputs lacking visual evidence as being part of the image, leading to erroneous responses. In light of this finding, we probe whether LVLMs possess an internal capability to determine if textual concepts are grounded in the image, and discover a specific subset of Feed-Forward Network (FFN) neurons, termed Visual Absence-aware (VA) neurons, that consistently signal the visual absence through a distinctive activation pattern. Leveraging these patterns, we develop a detection module that systematically classifies whether an input token is visually grounded. Guided by its prediction, we propose a method to refine the outputs by reinterpreting question prompts or replacing the detected absent tokens during generation. Extensive experiments show that our method effectively mitigates the models' tendency to falsely presume the visual presence of text input and its generality across various LVLMs.