Archives AI News

Hell Is Us is a cryptic and ambitious meditation on the horrors of war

HiU GameplayReveal Screenshot 07

The opening hours of Hell Is Us are brilliantly confusing. The game tasks you with getting up to speed on a complicated civil war between the Palomists and Sabinians. A deluge of proper nouns is unleashed: Lymbic weaponry, Guardian Detectors, and more. But the clearest way the game communicates that you should feel utterly dumbfounded […]

StepFun AI Releases Step-Audio 2 Mini: An Open-Source 8B Speech-to-Speech AI Model that Surpasses GPT-4o-Audio

Screenshot 2025 08 31 at 11.17.51 PM 1 1024x560 1

The StepFun AI team has released Step-Audio 2 Mini, an 8B parameter speech-to-speech large audio language model (LALM) that delivers expressive, grounded, and real-time audio interaction. Released under the Apache 2.0 license, this open-source model achieves state-of-the-art performance across speech recognition, audio understanding, and speech conversation benchmarks—surpassing commercial systems such as GPT-4o-Audio. Key Features 1. […] The post StepFun AI Releases Step-Audio 2 Mini: An Open-Source 8B Speech-to-Speech AI Model that Surpasses GPT-4o-Audio appeared first on MarkTechPost.

Automated Clinical Problem Detection from SOAP Notes using a Collaborative Multi-Agent LLM Architecture

arXiv:2508.21803v1 Announce Type: new Abstract: Accurate interpretation of clinical narratives is critical for patient care, but the complexity of these notes makes automation challenging. While Large Language Models (LLMs) show promise, single-model approaches can lack the robustness required for high-stakes clinical tasks. We introduce a collaborative multi-agent system (MAS) that models a clinical consultation team to address this gap. The system is tasked with identifying clinical problems by analyzing only the Subjective (S) and Objective (O) sections of SOAP notes, simulating the diagnostic reasoning process of synthesizing raw data into an assessment. A Manager agent orchestrates a dynamically assigned team of specialist agents who engage in a hierarchical, iterative debate to reach a consensus. We evaluated our MAS against a single-agent baseline on a curated dataset of 420 MIMIC-III notes. The dynamic multi-agent configuration demonstrated consistently improved performance in identifying congestive heart failure, acute kidney injury, and sepsis. Qualitative analysis of the agent debates reveals that this structure effectively surfaces and weighs conflicting evidence, though it can occasionally be susceptible to groupthink. By modeling a clinical team's reasoning process, our system offers a promising path toward more accurate, robust, and interpretable clinical decision support tools.

NSPDI-SNN: An efficient lightweight SNN based on nonlinear synaptic pruning and dendritic integration

arXiv:2508.21566v1 Announce Type: cross Abstract: Spiking neural networks (SNNs) are artificial neural networks based on simulated biological neurons and have attracted much attention in recent artificial intelligence technology studies. The dendrites in biological neurons have efficient information processing ability and computational power; however, the neurons of SNNs rarely match the complex structure of the dendrites. Inspired by the nonlinear structure and highly sparse properties of neuronal dendrites, in this study, we propose an efficient, lightweight SNN method with nonlinear pruning and dendritic integration (NSPDI-SNN). In this method, we introduce nonlinear dendritic integration (NDI) to improve the representation of the spatiotemporal information of neurons. We implement heterogeneous state transition ratios of dendritic spines and construct a new and flexible nonlinear synaptic pruning (NSP) method to achieve the high sparsity of SNN. We conducted systematic experiments on three benchmark datasets (DVS128 Gesture, CIFAR10-DVS, and CIFAR10) and extended the evaluation to two complex tasks (speech recognition and reinforcement learning-based maze navigation task). Across all tasks, NSPDI-SNN consistently achieved high sparsity with minimal performance degradation. In particular, our method achieved the best experimental results on all three event stream datasets. Further analysis showed that NSPDI significantly improved the efficiency of synaptic information transfer as sparsity increased. In conclusion, our results indicate that the complex structure and nonlinear computation of neuronal dendrites provide a promising approach for developing efficient SNN methods.

QuadKAN: KAN-Enhanced Quadruped Motion Control via End-to-End Reinforcement Learning

arXiv:2508.19153v1 Announce Type: cross Abstract: We address vision-guided quadruped motion control with reinforcement learning (RL) and highlight the necessity of combining proprioception with vision for robust control. We propose QuadKAN, a spline-parameterized cross-modal policy instantiated with Kolmogorov-Arnold Networks (KANs). The framework incorporates a spline encoder for proprioception and a spline fusion head for proprioception-vision inputs. This structured function class aligns the state-to-action mapping with the piecewise-smooth nature of gait, improving sample efficiency, reducing action jitter and energy consumption, and providing interpretable posture-action sensitivities. We adopt Multi-Modal Delay Randomization (MMDR) and perform end-to-end training with Proximal Policy Optimization (PPO). Evaluations across diverse terrains, including both even and uneven surfaces and scenarios with static or dynamic obstacles, demonstrate that QuadKAN achieves consistently higher returns, greater distances, and fewer collisions than state-of-the-art (SOTA) baselines. These results show that spline-parameterized policies offer a simple, effective, and interpretable alternative for robust vision-guided locomotion. A repository will be made available upon acceptance.

QZhou-Embedding Technical Report

arXiv:2508.21632v1 Announce Type: cross Abstract: We present QZhou-Embedding, a general-purpose contextual text embedding model with exceptional text representation capabilities. Built upon the Qwen2.5-7B-Instruct foundation model, we designed a unified multi-task framework comprising specialized data transformation and training strategies. The data transformation scheme enables the incorporation of more diverse textual training datasets, while the task-specific training strategies enhance model learning efficiency. We developed a data synthesis pipeline leveraging LLM API, incorporating techniques such as paraphrasing, augmentation, and hard negative example generation to improve the semantic richness and sample difficulty of the training set. Additionally, we employ a two-stage training strategy, comprising initial retrieval-focused pretraining followed by full-task fine-tuning, enabling the embedding model to extend its capabilities based on robust retrieval performance. Our model achieves state-of-the-art results on the MTEB and CMTEB benchmarks, ranking first on both leaderboards (August 27 2025), and simultaneously achieves state-of-the-art performance on tasks including reranking, clustering, etc. Our findings demonstrate that higher-quality, more diverse data is crucial for advancing retrieval model performance, and that leveraging LLMs generative capabilities can further optimize data quality for embedding model breakthroughs. Our model weights are released on HuggingFace under Apache 2.0 license. For reproducibility, we provide evaluation code and instructions on GitHub.

Pep2Prob Benchmark: Predicting Fragment Ion Probability for MS$^2$-based Proteomics

arXiv:2508.21076v1 Announce Type: cross Abstract: Proteins perform nearly all cellular functions and constitute most drug targets, making their analysis fundamental to understanding human biology in health and disease. Tandem mass spectrometry (MS$^2$) is the major analytical technique in proteomics that identifies peptides by ionizing them, fragmenting them, and using the resulting mass spectra to identify and quantify proteins in biological samples. In MS$^2$ analysis, peptide fragment ion probability prediction plays a critical role, enhancing the accuracy of peptide identification from mass spectra as a complement to the intensity information. Current approaches rely on global statistics of fragmentation, which assumes that a fragment's probability is uniform across all peptides. Nevertheless, this assumption is oversimplified from a biochemical principle point of view and limits accurate prediction. To address this gap, we present Pep2Prob, the first comprehensive dataset and benchmark designed for peptide-specific fragment ion probability prediction. The proposed dataset contains fragment ion probability statistics for 608,780 unique precursors (each precursor is a pair of peptide sequence and charge state), summarized from more than 183 million high-quality, high-resolution, HCD MS$^2$ spectra with validated peptide assignments and fragmentation annotations. We establish baseline performance using simple statistical rules and learning-based methods, and find that models leveraging peptide-specific information significantly outperform previous methods using only global fragmentation statistics. Furthermore, performance across benchmark models with increasing capacities suggests that the peptide-fragmentation relationship exhibits complex nonlinearities requiring sophisticated machine learning approaches.

OptMark: Robust Multi-bit Diffusion Watermarking via Inference Time Optimization

arXiv:2508.21727v1 Announce Type: cross Abstract: Watermarking diffusion-generated images is crucial for copyright protection and user tracking. However, current diffusion watermarking methods face significant limitations: zero-bit watermarking systems lack the capacity for large-scale user tracking, while multi-bit methods are highly sensitive to certain image transformations or generative attacks, resulting in a lack of comprehensive robustness. In this paper, we propose OptMark, an optimization-based approach that embeds a robust multi-bit watermark into the intermediate latents of the diffusion denoising process. OptMark strategically inserts a structural watermark early to resist generative attacks and a detail watermark late to withstand image transformations, with tailored regularization terms to preserve image quality and ensure imperceptibility. To address the challenge of memory consumption growing linearly with the number of denoising steps during optimization, OptMark incorporates adjoint gradient methods, reducing memory usage from O(N) to O(1). Experimental results demonstrate that OptMark achieves invisible multi-bit watermarking while ensuring robust resilience against valuemetric transformations, geometric transformations, editing, and regeneration attacks.

Mixed membership estimation for categorical data with weighted responses

arXiv:2310.10989v2 Announce Type: replace-cross Abstract: The Grade of Membership (GoM) model, which allows subjects to belong to multiple latent classes, is a powerful tool for inferring latent classes in categorical data. However, its application is limited to categorical data with nonnegative integer responses, as it assumes that the response matrix is generated from Bernoulli or Binomial distributions, making it inappropriate for datasets with continuous or negative weighted responses. To address this, this paper proposes a novel model named the Weighted Grade of Membership (WGoM) model. Our WGoM is more general than GoM because it relaxes GoM's distribution constraint by allowing the response matrix to be generated from distributions like Bernoulli, Binomial, Normal, and Uniform as long as the expected response matrix has a block structure related to subjects' mixed memberships under the distribution. We show that WGoM can describe any response matrix with finite distinct elements. We then propose an algorithm to estimate the latent mixed memberships and other WGoM parameters. We derive the error bounds of the estimated parameters and show that the algorithm is statistically consistent. We also propose an efficient method for determining the number of latent classes $K$ for categorical data with weighted responses by maximizing fuzzy weighted modularity. The performance of our methods is validated through both synthetic and real-world datasets. The results demonstrate the accuracy and efficiency of our algorithm for estimating latent mixed memberships, as well as the high accuracy of our method for estimating $K$, indicating their high potential for practical applications.