In this paper, we provide a comprehensive examination of the evolution of graphics Application Programming Interfaces (APIs). We begin by exploring traditional graphics APIs, elucidating their distinct features and inherent challenges. This sets the stage for a detailed exploration of modern graphics APIs, with a focus on four critical design principles. These principles are further analyzed through specific case studies and categorical examinations. The paper then introduces MoerEngine, a bespoke rendering engine, as a practical case to demonstrate the real-world application of these modern principles in software engineering. In conclusion, the study offers insights into the potential future trajectory of graphics APIs, spotlighting emerging design patterns and technological innovations. It also ventures to predict the development trends and capabilities of next-generation graphics APIs.
The detection of steel surface anomalies has become an industrial challenge due to variations in production equipment, processes, and steel characteristics. To alleviate the problem, this paper proposes a detection and localization method combining 3D depth and 2D RGB features. The framework comprises three stages: defect classification, defect location, and warpage judgment. The first stage uses a data-efficient image Transformer model, the second stage utilizes reverse knowledge distillation, and the third stage performs feature fusion using 3D depth and 2D RGB features. Experimental results show that the proposed algorithm achieves relatively high accuracy and feasibility, and can be effectively used in industrial scenarios.
While neural radiance field (NeRF) methods have shown promising results in generating talking faces, existing studies primarily focus on the correlation between avatars and driving sources. However, these studies often overlook emotion modeling, resulting in the generation of emotionless or unnatural facial animations. In response, this paper introduces an audio-driven and emotion-editing dynamic NeRF (AED-NeRF) approach, designed for the real-time generation of expressive talking face avatars driven by audio inputs. Specifically, we integrate audio features into a grid-based NeRF to compensate for the lack of a deformation channel, successfully capturing lip dynamics and enabling end-to-end generation from audio-driven sources to talking face avatars. Emotion labels, comprising emotion categories and intensity levels, guide the proposed NeRF framework to implicitly model visual emotions, allowing for explicit control and editing of facial expressions. Extensive qualitative and quantitative experiments validate the effectiveness and advantages of our proposed method, demonstrating its ability to achieve real-time, photo-realistic talking face avatar generation across different audio and emotion scenarios.
Security and access control for data storage in 5G industrial Internet collaborative systems are facing significant challenges. The characteristics of 5G networks, such as low latency and high speed, facilitate data transmission in the industrial Internet but also increase vulnerability to attacks like theft and tampering. Moreover, in 5G industrial Internet collaborative system environments, data flows across multiple entities and links, which necessitates a flexible access control model to meet specific data access requirements. Traditional role-based and attribute-based access control mechanisms are difficult to apply in such dynamic application scenarios. To address these challenges, we propose a novel data storage solution for 5G industrial Internet collaborative systems. Similar to existing approaches, it provides integrity and confidentiality protection for transmitted data. In terms of security, only authenticated data owners and users can obtain file decryption keys, preventing malicious attackers from data forgery. Regarding access control, decryption is permitted only to authorized data users, safeguarding against unauthorized file access. Furthermore, by introducing an attribute-based encryption mechanism, only data users with specific attributes can decrypt files. In terms of efficiency, our approach utilizes bilinear and modular exponentiation operations solely during the authentication process. For handling substantial data loads, lightweight cryptographic algorithms are employed. Consequently, our solution achieves higher efficiency compared with other known methods. Experimental results demonstrate the feasibility of our approach in real-world applications.
This paper explores the key techniques and challenges in dynamic scene reconstruction with neural radiance fields (NeRF). As an emerging computer vision method, the NeRF has wide application potential, especially in excelling at 3D reconstruction. We first introduce the basic principles and working mechanisms of NeRFs, followed by an in-depth discussion of the technical challenges faced by 3D reconstruction in dynamic scenes, including problems in perspective and illumination changes of moving objects, recognition and modeling of dynamic objects, real-time requirements, data acquisition and calibration, motion estimation, and evaluation mechanisms. We also summarize current state-of-the-art approaches to address these challenges, as well as future research trends. The goal is to provide researchers with an in-depth understanding of the application of NeRFs in dynamic scene reconstruction, as well as insights into the key issues faced and future directions.
Vision-based measurement technology benefits high-quality manufacturers through improved dimensional precision, enhanced geometric tolerance, and increased product yield. The monocular 3D structured light visual sensing method is popular for detecting online parts since it can reach micron-meter depth accuracy. However, the line-of-sight requirement of a single viewpoint vision system often fails when hiding occurs due to the object’s surface structure, such as edges, slopes, and holes. To address this issue, a multi-view 3D structured light vision system is proposed in this paper to achieve high accuracy, i.e., Z-direction repeatability, and reduce hiding probability during mechanical dimension measurement. The main contribution of this paper includes the use of industrial cameras with high resolution and high frame rates to achieve high-precision 3D reconstruction. Moreover, a multi-wavelength (heterodyne) phase expansion method is employed for high-precision phase calculation. By leveraging multiple industrial cameras, the system overcomes field of view occlusions, thereby broadening the 3D reconstruction field of view. Finally, the system achieves a Z-axis repetition accuracy of 0.48 μm.
Three-dimensional reconstruction technology plays an important role in indoor scenes by converting objects and structures in indoor environments into accurate 3D models using multi-view RGB images. It offers a wide range of applications in fields such as virtual reality, augmented reality, indoor navigation, and game development. Existing methods based on multi-view RGB images have made significant progress in 3D reconstruction. These image-based reconstruction methods not only possess good expressive power and generalization performance, but also handle complex geometric shapes and textures effectively. Despite facing challenges such as lighting variations, occlusion, and texture loss in indoor scenes, these challenges can be effectively addressed through deep neural networks, neural implicit surface representations, and other techniques. The technology of indoor 3D reconstruction based on multi-view RGB images has a promising future. It not only provides immersive and interactive virtual experiences but also brings convenience and innovation to indoor navigation, interior design, and virtual tours. As the technology evolves, these image-based reconstruction methods will be further improved to provide higher quality and more accurate solutions to indoor scene reconstruction.
With the rapid popularization of mobile devices and the wide application of various sensors, scene perception methods applied to mobile devices occupy an important position in location-based services such as navigation and augmented reality (AR). The development of deep learning technologies has greatly improved the visual perception ability of machines to scenes. The basic framework of scene visual perception, related technologies and the specific process applied to AR navigation are introduced, and future technology development is proposed. An application (APP) is designed to improve the application effect of AR navigation. The APP includes three modules: navigation map generation, cloud navigation algorithm, and client design. The navigation map generation tool works offline. The cloud saves the navigation map and provides navigation algorithms for the terminal. The terminal realizes local real-time positioning and AR path rendering.