Soundness

 

Bugs

The fundamental mathematical theory underlying KT’s static analyzer generator, called CodeHawk™, is abstract interpretation, developed by Patrick and Radhia Cousot in the 1970’s. Abstract interpretation performs a symbolic simulation of the program. It is a fully automated process that operates directly on the source code; it discovers properties of the program that hold in all possible execution contexts.

Abstract interpretation allows sound analysis. This means that, given a certain property, for example, absence of buffer overflow, the analysis will indicate all possible buffer overflows. Typical bug-finding tools are not sound. The figure above illustrates the difference between the bugs reported by a typical bug-finding tool and CodeHawk. In CodeHawk each buffer access will be marked either Green, Orange, or Red (or Black, if it is unreachable). A green mark indicates that the buffer access is safe for all possible execution contexts; an orange mark indicates that it may violate the bounds; and a red mark indicates a definite bug. Sound analysis guarantees that a potential bug is never marked green. With CodeHawk there are no false negatives. Soundness is an absolute requirement for high assurance systems.

Having satisfied soundness, precision and performance are our next priorities. Looking at the figure, accuracy is increased by reducing the Orange set to just include the actual bugs, and increasing the Red set to cover all actual bugs (in effect eliminating the Orange Set). Achieving 100% accuracy in general is theoretically impossible due to well-known undecidability results. In practice, precision and performance are a trade-off: the higher the precision, the slower the analysis.

Our strategy to achieve high precision without sacrificing performance is to specialize the analyzer for the system architecture and property to be analyzed. CodeHawk is an analyzer generator: given a property and (optionally) a system structure, a customized analyzer is generated automatically. This analyzer can then be applied to the source code.

The type of properties that can be verified are Programming Language-based properties and Application-based properties. Programming language-based properties include

  • arithmetic properties,
  • array-bound compliance,
  • buffer overflow,
  • correct string termination,
  • null pointer dereferences,
  • uninitialized variables,
  • structural invariants, etc.

Application- based properties include

  • generic properties such as variable ranges at specific locations in the program,
  • interface compliance,
  • resource management,
  • data-flow related properties,
  • security properties such as confidentiality, integrity, availability, access control, authorized flow, and authorized actions.

These are just examples; in principle CodeHawk can generate analyzers for a wide variety of properties on program behavior.