Range Analysis is a forward DFA. The range
of a variable
We restrict ourselves to integers to avoid issues with the boundaries of ranges, specifically intervals with open end(s) and division. Integer division results are very well defined and are neither implementation nor hardware dependent.
Am I handling integer overflow and wrap around?
- Inseparability
- The data flow values of different variables are dependent on each other.
\begin{align} In_n &= \begin{cases} \lbrace \langle\, y \mapsto \texttt{undef}\, \rangle\, |\: y ∈ \mathbb{V}\text{ar} \rbrace & \quad \text{if}\, n \,\text{is start} \[0.5em] \displaystyle \bigsqcapp \:∈ Pred(n) In_p & \quad \text{otherwise} \end{cases} \label{eq:in} \[1em] Out_n &= f_n(In_n) \label{eq:out} \end{align}
We define a Flow Function range
of
expression
\begin{align}
f_n(X) &= \begin{cases}
X \big[ u \mapsto (-∞, ∞) \big] & \quad u ∈ \mathbb{V}\text{ar}, u \;\text{is read from file}
X \big[ u \mapsto [k, k] \big] & \quad u ∈ \mathbb{V}\text{ar}, u = k, k\ \text{is constant} \
X \big[ u \mapsto X[v] \big] & \quad u,v ∈ \mathbb{V}\text{ar}, u = v \
X \big[ u \mapsto \text{eval}(e, X) \big] & \quad u ∈ \mathbb{V}\text{ar}, u = e, e \;\text{is an expression}
\end{cases} \label{eq:flow}\
\text{eval}(e, X) &= \begin{cases}
X[a] ⊕ X[b] & \quad a,b ∈ \text{Opd}(e) ∩ \mathbb{V}\text{ar}, \text{Op} = + \
X[a] \ominus X[b] & \quad a,b ∈ \text{Opd}(e) ∩ \mathbb{V}\text{ar}, \text{Op} = - \
X[a] ⊗ X[b] & \quad a,b ∈ \text{Opd}(e) ∩ \mathbb{V}\text{ar}, \text{Op} = × \
X[a] ø X[b] & \quad a,b ∈ \text{Opd}(e) ∩ \mathbb{V}\text{ar}, \text{Op} = ÷ \
\end{cases} \label{eq:eval}
\end{align}
Interval arithmetic is surprisingly tricky, wrt. division and multiplication (9 cases!). citenum:popova98 discusses ways to implement a fast interval multiplier in modern pipelined superscalar FPUs. Here is a handy web interval arithmetic calculator to verify (my) implementations.
\begin{align}
[a,b] ⊕ [x,y] &= [a+x, b+y] \label{eq:oplus}
[a,b] \ominus [x,y] &= [a-y, b-x] \label{eq:ominus} \
[a,b] ⊗ [x,y] &= [\text{min}(ax, ay, bx, by),\ \text{max}(ax, ay, bx, by)] \label{eq:otimes} \
[a,b] ø [x,y] &= \begin{cases}
[a, b] ⊗ [1/y, 1/x] & \quad 0 ∉ [x, y] \
[-∞, ∞] & \quad 0 ∈ [a, b] ∧ 0 ∈ [x, y] \
[b/x, ∞] & \quad b < 0 ∧ y = 0 \
[-∞, b/y] ∪ [b/x, ∞] & \quad b < 0 ∧ x < 0 < y \
[-∞, b/y] & \quad b < 0 ∧ x = 0 \
[-∞, a/x] & \quad a > 0 ∧ y = 0 \
[-∞, a/x] ∪ [a/y, ∞] & \quad a > 0 ∧ x < 0 < y \
[a/y, ∞] & \quad a > 0 ∧ x = 0 \
∅ & \quad 0 ∉ [a, b] ∧ c = d = 0
\end{cases} \label{eq:oslash}
\end{align}
All interval arithmetic operations and algorithms are compiled in citeauthor:hickey_inter_arith_base. Additional operators can be defined such as exponentiation and logarithm.
A range is a tuple,
We will be merging ranges in ref:eq:in, so we need to define a new closed
&= \big( \text{min}(l_1, l_2),\; \text{max}(r_1, r_2) \big)
\end{split}
\end{equation}
This definition results in imprecision upon application of
$\sqcap$ in ref:eq:in, but makes the result computable. If we were to use the natural and precise$∪$ (which gives a union of disjoint sets), the result of ref:eq:in can become arbitrarily large.
Some boolean operators are also defined, \begin{equation} a_1 \subseteq a_2 = (l_1 ≥ l_2\ ∧\ r_1 ≤ r_2), \quad a_1,a_2 ∈ \mathcal{R} \label{eq:comp-lattice-subset} \end{equation}
undef
in our analysis.
\begin{equation} \hat{L} = (\mathcal{A}, \supseteq) \label{eq:comp-lattice} \end{equation}
Thus,
\begin{equation} a\ \hat{\sqcap}\ b = \hat{∪} \quad a,b ∈ \mathcal{R} \label{eq:comp-lattice-meet} \end{equation}
I have to decide if I’m gonna stay in Z or move to R. This is especially important with my lattice definition, as
$A$ may rather be the set of all ranges in =[-MAX_INT, +MAX_INT]= in case of Z.
The best resources to soot
are packaged with it in the /tutorial
directory. Run make
in it and
enjoy the material.
Most (simple) analyses compute 1 bit
information about some code entity (variable or expression,
etc). Constant Propagation computes bit
information per variable and also has an unbounded
flow function like Range Analysis.
For such analyses, a HashMap
implementation suits best to represent FlowSets
because we
frequently lookup the computed information of other entities – mere use
or def
doesn’t serve us
much. Since Java is weakly typed, I found it hard to implement a generic HashMapFlowSet
and just
went ahead with a much simpler RangeFlowSet
.
clone()
clear()
isEmpty()
copy (FlowSet dest)
union (FlowSet other, FlowSet dest)
intersection (FlowSet other, FlowSet dest)
difference (FlowSet other, FlowSet dest)
We define a Range
object. An instance is created at the following locations in code, as defined in
ref:eq:flow:
- Assignments (constant, expression, other local)
- Read from
stdin
bibliography:/share/documents/bibliography/references.bib
- Installed IDE, chose the JDK (1.7) and added a library (soot-soot-2.5.0). This gave me code completion,.
- Compilation?
Can’t use it as a standalone thing on CLI. Some problem with classpaths.