forked from riscv-non-isa/riscv-trace-spec
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathfutures.tex
executable file
·68 lines (54 loc) · 3.52 KB
/
futures.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
\chapter{Future directions} \label{Future}
The current focus is the compressed branch trace, however there a
number of other types of processor trace that would be useful
(detailed below in no particular order). These
should be considered as possible features that maybe added in future,
once the current scope has been completed.
\section{Data trace}
The trace encoder will outputs packets to communicate information
about loads and stores to an off-chip decoder. To reduce the amount
of bandwidth required, reporting data values will be optional, and
both address and data will be able to be encoded differentially when
it is beneficial to do so. This entails outputting the difference
between the new value and the previous value of the same transfer
size, irrespective of transfer direction.
Unencoded values will be used for synchronisation and at other times.
\section{Fast profiling}
In this mode the encoder will provide a non-intrusive alternative to
the traditional method of profiling, which requires the processor to
be halted periodically so that the program counter can be sampled.
The encoder will issue packets when an exception, call or return is
detected, to report the next instruction executed (i.e. the
destination instruction). Optionally, the encoder will also be able to
report the current instruction (i.e. the source instruction).
\section{Inter-instruction cycle counts}
In this mode the encoder will trace where the CPU is stalling by
reporting the number of cycles between successive instruction
retirements.
\section{Using a jump target cache to further improve efficiency}
The encoder could include a small cache of uninferable jump targets, managed using a
least-recently-used (LRU) algorithm. When an uninferable PC discontinuity occurs, if
the target address is present in the cache, report the index number of the cache
entry (typically just a few bits) rather than the target address itself. The decoder
would need to model the cache in order to know the target address associated with
each cache entry.
\textbf{DISCUSSION POINT}:
This mode needs more analysis before we commit. The primary concern is whether it will
result in an overall gain in efficiency. Packet formats 0 and 2 are used to report
differential addresses with and without branch history information respectively. Format
0 is currently reserved. This could be redefined as being equivalent to either format 0, or
format 2, but reporting a jump target cache index instead of a differential address.
However, ideally we would want the ability to output the equivalent of both format 1 and
format 2 with a jump target cache index. In order to do that we will need to add an extra bit
somewhere. These are the options:
\begin{itemize}
\item Define format 0 as jump target cache index without branch information, and add a bit to
format 1 to indicate whether it contains an address or a jump target cache index;
\item Define format 0 as jump target cache index with branch information, and add a bit to
format 2 to indicate whether it contains an address or a jump target cache index;
\item Leave format 0 reserved, and add a bit to both formats 0 and 2 to indicate whether
they contain an address or a jump target cache index.
\end{itemize}
For this mode to be useful, the overall efficiency gain achieved as a result of jump target cache indexes
requiring fewer bits to encode than differential addresses needs to be enough to overcome the efficiency
loss of adding a bit to every packet.