| |
Introduction
The MFP (minimum fixed point) algorithm is used in dataflow analysis
to determine properties of a program at compile time without actually executing it.
The program is represented by its control flow graph. In this abstraction
of the program, the nodes correspond to the statements of the program
and the edges represent the possible control transitions.
This graph has a unique start node s and an end node e.
Every edge is associated with a transfer function
which specifies how information is propagated along this edge.
The control flow graph and the transfer functions form a recursive equation system,
which is solved iteratively by the MFP algorithm.
The MOP (merge-over-all-paths) solution computes for every node n
of the CFG (control flow graph) the incoming informations from all paths that lead
from the start node s to n.
These informations are combined and the result is used as input
by the transfer functions associated with the outgoing edges of n.
In this way, information is gathered for all nodes in the CFG and stored there.
Unfortunately the MOP solution is only computable in special cases.
To get a good approximation of the MOP solution, the MFP algorithm is used,
which works whenever the transfer functions are monotonic.
The difference between these two approaches is that the MFP algorithm only
combines the informations from the predecessors of a node without checking the
whole path from s to that node.
» The PAG implementation of the algorithm.
» An illustrative example of the algorithm.
Theoretical Background
The theory behind the MFP algorithm will be explained
via the following definitions:
- Control Flow Graph (CFG)
The control flow graph (CFG) is a directed graph G = (N,E,s,e)
with a finite set N of nodes,
a set E of edges which is a subset of NxN,
a start node s and an end node e,
where s and e are elements of N.
If there exists an edge (n,m),
then m is called successor of n and
n is called predecessor of m.
The start node s is required to have no predecessor and e no successor.
In addition, the graph has to be connected:
Every node must be reachable from s and e is reachable from any node.
- Path
A path p from node n0 to node nk
in a CFG (N,E,s,e) is a sequence of edges (elements of E) of the form
p = (n0,n1),
(n1,n2),
..., (nk-1,nk).
A path may contain the same edge many times.
Thus, there are infinitely many paths if the CFG contains circles (loops).
-
Complete Lattice
A partially ordered set
( A,  )
is called a complete lattice, iff every subset of A has a greatest
lower bound (  )
and a least upper bound
(  ).
The elements 
= A,

= A
are called bottom and top elements of A.
The application of the greatest lower bound to two elements
 { a, b}
is written as
a b.
Analogously,
 { a, b}
is written as
a b.
- Chain
An (ascending) chain (xi)i
is a sequence x0, x1, ...,
such that xj
xj+1 holds for all j.
A chain (xi)i is called strongly ascending,
iff for all j,
xj and xj+1 are different.
A chain eventually stabilizes iff there exists an index j such that
for all indexes n > j,
xj = xn holds.
Similar definitions apply to descending chains.
- Pointwise order, Monotone and Distributive Functions
Functions from a set A to a partially ordered set B
are usually ordered by the pointwise ordering:
f g iff
f(x) g(x) for all x in A.
Let A and B be complete lattices. A function f:
A B
is called monotone, iff for all elements x, y of A,
x y
implies f(x)
f(y) in B.
The function f is called distributive,
iff for all elements x, y of A the equality
f(x y)
= f(x) f(y)
holds. All distributive functions are monotone.
- Data Flow Problems
A data flow problem is a CFG together with a complete lattice D
of abstract values, a family of transfer functions,
and a start value i in D.
The transfer functions express the local (abstract) semantics of the CFG.
They assign a meaning function to every edge of the CFG:
tf:
E D D,
i.e.
tf(e):
D D
for all edges e in E.
The path semantics along some path is the composition
of the transfer functions tf along the path
(and the identity for the empty path):
- Merge Over All Path (MOP) Solution
The MOP solution of a data flow problem is the lub of the path semantics,
applied to the start value i:
Thus, all possible executions of a program which reach the node n are combined
with .
This combination ensures that the information is valid for every program execution
and therefore is independent of the input of the program.
Since the MOP solution is in general not computable
if there is an infinite number of paths and
the lattice D is infinite, the MFP solution is used instead.
- Minimum Fixed Point (MFP) Solution
The MFP solution is the least function (in the pointwise ordering)
MFP: N D
satisfying the following system of equations:
| MFP(s) | = i, |
| MFP(n) | =
{ tf(e)(MFP(m))
| e = (m,n) in E } |
for n different from s. |
For the computation of the MFP solution, see below.
- Example: The Difference between MOP and MFP
Consider the CFG with node set
N = {s, a1, a2, b, e}
and edge set
E = {(s,a1), (s,a2),
(a1,b), (a2,b),
(b,e)}.
Let D be any complete lattice, and i, u, and v
any three elements of D.
We take i as the start value, and the following transfer functions:
tf(s,a1) = cu,
the constant function with value u,
tf(s,a2) = cv,
the constant function with value v,
tf(a1,b) =
tf(a2,b) = id,
tf(b,e) = some function f.
In this case, the MOP solution is easily computable
since there are at most 2 paths from s to any node n.
The result is
MOP(s) = i,
MOP(a1) = u,
MOP(a2) = v,
MOP(b) =
u v,
MOP(e) =
f(u) f(v).
The value MOP(e) is the lub of the semantics of the two paths
(s,a1),(a1,b),(b,e)
and
(s,a2),(a2,b),(b,e)
from s to e.
The MFP solution is the least solution of the following system of equations:
MFP(s) = i,
MFP(a1) = cu(MFP(s)),
MFP(a2) = cv(MFP(s)),
MFP(b) =
MFP(a1) MFP(a2),
MFP(e) = f(MFP(b)).
Because of the absence of loops in the CFG,
this system has exactly one solution, namely
MFP(s) = i,
MFP(a1) = u,
MFP(a2) = v,
MFP(b) =
u v,
MFP(e) =
f(u v),
which is in general different from
MOP(e) =
f(u) f(v).
- The General Relationship between MOP and MFP
According to the coincidence theorem (Kam and Ullman 1997),
MOP MFP
holds if the transfer functions are monotone.
This means that MFP is a correct approximation of the MOP solution.
If the transfer functions are even distributive,
then MOP = MFP follows.
Hence in this special situation, the MOP solution is computable
by the MFP algorithm presented below.
- Computation of the MFP Solution (Fixed Point Iteration)
The MFP solution is computable by the following fixed point iteration
if all transfer functions are monotone
and every ascending chain of the lattice D eventually stabilizes.
The algorithm consists of the inductive computation of a sequence
(Fk)k of functions
Fk:
N D.
The function F0 maps
the start node s to the start value i,
and all other nodes n to
,
the least element of the complete lattice D.
The function Fk+1 is computed from
Fk as follows using the equation system for MFP:
| Fk+1(s) | =
i, |
| Fk+1(n) | =
{ tf(e)(Fk(m))
| e = (m,n) in E } |
for n different from s. |
The algorithm stops once Fk+1 = Fk.
In this case, Fk is the MFP solution.
Monotonicity of the transfer functions implies
Fk Fk+1
for all k.
Hence, (Fk(n))k
is an ascending chain for every node n.
If ascending chains in D eventually stabilize,
the chain (Fk)k of functions
eventually stabilizes as well, i.e. the algorithm terminates.
The fixed point algorithm can be modified in several ways.
In the computation of Fk+1(n),
the value of Fk+1(m) may be used
if it is already available.
This will change the individual functions Fk,
but not the final result.
In this case, it is useful to traverse the nodes in an order
which follows the edges of the CFG as far as possible.
Moreover, it is useful to recompute the value at a node
only if the value at one of his predecessors has changed.
All these ideas are incorporated in
the PAG/WWW implementation of the fixed point algorithm.
Search
| |