ABSTRACT
Shared last-level cache (LLC) management is a critical design issue for heterogeneous multi-cores. In this paper, we observe two major challenges: the contribution of LLC latency to overall performance varies among applications/cores and also across time; overlooking the off-chip latency factor often leads to adverse effects on over- all performance. Hence, we propose a Latency Sensitivity-based Cache Partitioning (LSP) framework, including a lightweight run- time mechanism to quantify the latency-sensitivity and a new cost function to guide the LLC partitioning. Results show that LSP im- proves the overall throughput by 8% on average (27% at most), compared with the state-of-the-art partitioning mechanism, TAP.
Keywords
Heterogeneous System Architecture; Cache Partitioning; GPU.
这篇文章主要是讲了异构多处理器下的LLC的partition问题;之前的partition方法TAP只是关注于提高cache sensitive application的cache hit rate(一般就是尽可能多的给cache insensitive application多分配cache resource);这种情况并没有考虑到cache insensitive application的off-chip latency,从而可能会导致cache insensitive application的off-chip memory access成为系统的bottleneck;所以作者提出无脑给cache sensitive application分更多的cache而不考虑cache insensitive application的off-chip latency并不是最好的partition方法,而是应该结合cache insensitive application的off-chip latency;这其实也挺符合之前学习cache时算cache命中率对系统影响的时候既要算cache命中时间也要考虑cache失效损失,对总体性能分析更加全面。
论文实验的是在full-system simulator上做的;最后评价performace的指标是IPC相对于baseline LLC的提升比例(We report the geometric mean of IPC speedup over a baseline LLC of each application as the performance evaluation. )。(有一点小疑问,不知道为什么要用几何平均数)
用几何平均数是因为避免一个值对整体的平均值的影响太大。