Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Semantic-driven Redundant Code Elimination for Ark Runtime

  

  • Published:2026-06-03

面向方舟运行时的语义驱动冗余代码消除

Abstract: In the process of compiler optimization for dynamically typed languages, a large number of runtime check nodes must be inserted due to the uncertainty of runtime types. Existing Redundant Code Elimination (RCE) algorithms based on reachability analysis generally treat all control flow nodes as potentially having side effects. As a result, computations and control flow structures associated with these check nodes are preserved during analysis, making it difficult to safely remove semantically redundant computations and control flows. To address this issue, a semantics-driven RCE method for the Ark runtime is proposed based on a systematic analysis of its compilation workflow and Intermediate Representation (IR) structure. The method begins with observable program semantics, where program behavior is abstracted as a sequence of observable events, including input/output operations, exception throwing, system calls, and termination with specific return values. Based on this abstraction, the RCE problem is formulated as the removal of IR subgraphs under the constraint that observable program semantics remain unchanged. On this basis, a criterion for eliminating runtime check nodes is introduced: a check node and its dependent computations can be safely removed if they produce no side effects and their results are not used by any node that affects observable program behavior. This criterion overcomes the interference introduced by runtime checks and allows the removal of computations that are preserved by traditional approaches despite being semantically redundant. Around this criterion, a semantics-constrained live node propagation mechanism is designed. The process starts with initializing an live node set containing side-effect nodes, followed by expanding this set along data dependency relations. Only nodes that may influence observable program behavior are retained in the set, enabling the identification and elimination of redundant computations and their associated check nodes. Furthermore, to address redundant control flows that cannot be handled by existing methods, control flow graph construction, dominance analysis, and loop structure identification algorithms are incorporated. A method for detecting and eliminating redundant loops and redundant branches is proposed, enabling the overall removal of structures such as empty loops and branches. The proposed method has been integrated into the Ark runtime compilation framework, achieving optimization of redundant computations and control flows at the IR level. Experimental results demonstrate that, in terms of executed instructions, the average reduction across all test cases reaches 3.4%, while representative cases containing redundant control flows achieve an average reduction of 27.4%, with a maximum of 98.26%. In terms of execution time, an average reduction of 3.4% is observed across all test cases, with representative cases achieving an average reduction of 26.4%, and up to 99.99% reduction in loop-intensive programs. Regarding compilation overhead, the execution time of the proposed algorithm accounts for only 2.28% of the total compilation time on average, indicating low additional cost. In overall performance evaluation, the total time of compilation and execution decreases in most representative cases, with a maximum reduction of 94.55%. In addition, validation on 913 runtime unit test cases and 19749 test262 standard test cases shows that no semantic deviation is introduced. Compared with source-level redundancy elimination approaches, further performance gains are achieved at the fine-grained computation and control flow levels, demonstrating the unique advantages of IR-level optimization. The proposed method effectively overcomes the limitations imposed by runtime checks on RCE in dynamically typed languages, significantly improves execution efficiency while preserving semantic equivalence, and maintains low compilation overhead, providing a new approach for compiler optimization in dynamic language runtimes.

摘要: 在动态类型语言的编译优化过程中,受运行时类型不确定性的影响,编译器需要插入大量合法性检查节点。现有的基于可达性分析的冗余代码消除(Redundant Code Elimination, RCE)算法通常将所有控制流节点视为可能产生副作用的节点,从而在分析过程中保留了与这些检查节点相关联的计算与控制流结构,导致部分语义上无效的计算与控制流难以被安全删除。针对这一问题,以方舟运行时为研究平台,系统分析其编译优化流程与中间表示(Intermediate Representation, IR)结构,提出了一种语义驱动的RCE方法。该方法首先从程序的可观测语义出发,构建形式化语义模型,将程序行为抽象为由输入输出、抛出异常、系统调用及以特定返回值退出构成的可观测事件序列,并据此将RCE问题转化为以不改变程序的可观测语义为前提的IR子图删除问题。在此基础上,提出合法性检查删除判定准则:当检查节点及其依赖的计算不产生副作用,且其结果不被任何影响程序可观测行为的节点使用时,该检查及相关计算可被安全删除。该准则突破了合法性检查对RCE的干扰,将部分被传统方法保留但语义上无效的计算及其检查视为可删除节点。围绕该准则,设计了基于语义约束的有效节点传播机制:首先初始化包含副作用节点的有效节点集合,然后沿数据依赖关系扩展该集合,在集合中保存可能影响程序可观测行为的节点,从而识别并删除冗余计算及其附属检查。进一步地,针对传统方法无法处理的冗余控制流问题,结合控制流图构建、支配关系分析与循环结构识别算法,提出冗余循环与冗余分支的检测与删除方法,实现对空循环、空分支等结构的整体消除。该方法已在方舟运行时编译框架中完成集成,在IR层实现了对冗余计算与控制流的优化。相关实验表明,在指令数量方面,优化后所有测试程序的平均执行指令减少比例为3.4%,在存在冗余控制流的典型用例中,平均减少27.4%,最高达到98.26%;在执行时间方面,各测试用例平均降低3.4%,典型用例平均降低26.4%,少数循环密集型程序的执行时间降低达99.99%;在编译开销方面,算法的运行时间平均仅占总编译时间的2.28%,其额外开销较低;在整体性能评估中,多数典型用例的编译与执行总时间均呈下降趋势,最高降低94.55%。此外,经过913个运行时单元测试用例及19749个test262标准测试验证,算法未引入语义变化。对比源码级RCE工具,该算法在更细粒度的计算与控制流层面仍能进一步获得性能收益,体现出了IR层优化的独特优势。该方法在保证程序语义等价的前提下,有效突破了动态类型语言中合法性检查对RCE的限制,显著提升了编译产物的执行效率,同时保持较低的编译开销,为动态类型语言运行时的编译优化提供了一种新的思路。