<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Un garçon pas comme les autres (Bayes)</title>
<link>https://dansblog.netlify.app/</link>
<atom:link href="https://dansblog.netlify.app/index.xml" rel="self" type="application/rss+xml"/>
<description>A blog about statistics, I guess.</description>
<image>
<url>https://dansblog.netlify.app/better.JPG</url>
<title>Un garçon pas comme les autres (Bayes)</title>
<link>https://dansblog.netlify.app/</link>
</image>
<generator>quarto-1.4.553</generator>
<lastBuildDate>Wed, 04 Sep 2024 14:00:00 GMT</lastBuildDate>
<item>
  <title>Random C++ Part 2: Sparse partial inverses in Eigen</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2024-09-05-partial-inverse/partial-inverse.html</link>
  <description><![CDATA[ 





<div class="callout callout-style-simple callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Acknowledgements
</div>
</div>
<div class="callout-body-container callout-body">
<p>The code in this post is indebted (and in some cases wholly ripped off from) work by the glorious <a href="https://www.maths.ed.ac.uk/~flindgre/">Finn Lindgren</a>, who emailed me some code to do this probably a decade ago. Yes I am behind on my emails.</p>
<p>Finn’s code can be found <a href="https://github.com/inlabru-org/fmesher/blob/devel/src/qtool.h">here</a> as part of the glorious INLAbru project.</p>
</div>
</div>
<div class="callout callout-style-simple callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Code availability
</div>
</div>
<div class="callout-body-container callout-body">
<p>The code from this post can be found in my <a href="https://github.com/dpsimpson/blog/tree/master/posts/2024-09-05-partial-inverse">github repo</a>.</p>
</div>
</div>
<p>The time has come once more to make a blog post truly untethered from context. This time, I’m going to show you how to compute entries of the inverse of a sparse symmetric positive definite matrix that correspond to the non-zero elements of the original matrix. And I am going to once again pull out that rusty spoon that is my C++ skill to do it.</p>
<section id="a-little-bit-of-motivation" class="level2">
<h2 class="anchored" data-anchor-id="a-little-bit-of-motivation">A little bit of motivation</h2>
<p>Computing certain elements of the inverse of a matrix isn’t necessarily the most useless thing possible. It actually comes up quite a lot in statistical applications. For instance, if you are computing the score function while doing maximum likelihood estimation for a multivariate Gaussian you’re gonna need those values. Or, less specifically, if you happen to have a multivariate Gaussian <img src="https://latex.codecogs.com/png.latex?N(0,%20Q%5E%7B-1%7D)"> parameterized by its precision (inverse covariance) matrix <img src="https://latex.codecogs.com/png.latex?Q">, if you are interested in the variance of each coordinate, you need the diagonal of <img src="https://latex.codecogs.com/png.latex?Q%5E%7B-1%7D">.</p>
<p>A very real problem with computing <img src="https://latex.codecogs.com/png.latex?Q%5E%7B-1%7D"> is that it is, infamously, quite expensive. The only really practical way to do it is to solve <img src="https://latex.codecogs.com/png.latex?n"> linear systems, where <img src="https://latex.codecogs.com/png.latex?n"> is the number of rows/columns in <img src="https://latex.codecogs.com/png.latex?Q">. When <img src="https://latex.codecogs.com/png.latex?n"> is big, this is going to be a bit of a computational disaster!</p>
<p>Thankfully, there is a convenient set of recursions due to Takahashi, Fagan, and Chen<sup>1</sup> that allow us to compute these elements directly and cheaply from the Cholesky factorization of <img src="https://latex.codecogs.com/png.latex?Q">.</p>
<p>In fact, I have <a href="https://dansblog.netlify.app/posts/2022-05-20-to-catch-a-derivative-first-youve-got-to-think-like-a-derivative/to-catch-a-derivative-first-youve-got-to-think-like-a-derivative#primitive-three-the-dreaded-log-determinant">blogged about this before</a>.</p>
<p>Essentially, we need to implement the following pseudocode.</p>
<pre><code> for i = n-1, ..., 0
   for j = n-1, ..., i
   if (L[j,i] not known to be 0)
      Sigma[j,i] = Sigma[i,j] = (I(i==j)/L[i,i] 
        - sum_{k=i+1}^{n-1} L[k,i] Sigma[k,j] ) / L[i,i]</code></pre>
<p>This is not going to be terribly complicated, but it does require a bit of C++ plumbing and dealing with the internal Eigen representation of the Choleksy factor. It’s always so fun to read documentation!</p>
</section>
<section id="making-this-work-in-c" class="level2">
<h2 class="anchored" data-anchor-id="making-this-work-in-c">Making this work in C++</h2>
<p>One of the things about working with a library like Eigen is that we really want to use the official API for its functions as much as possible. Even when we itch to use the undocumented internal structure, we should desist: the API is, usually, pretty stable and it is considerably less likely that an Eigen update will materially break our code if we hold them to the promises they actually make rather than the ones we wish they made.</p>
<p>It might look like you need three iterators to build our algorithm, but we actually need four. Because the matrix is stored in column-major order, we are going to need a new iterator for every distinct column index. In this case, that is 1. A reverse iterator going up column <code>i</code> of <code>Sigma</code> 2. A reverse iterator going up column <code>i</code> of <code>L</code> 3. A reverse iterator going up column <code>j</code> of <code>Sigma</code> 4. A reverse iterator going up column <code>i</code> in sync with iterator 3.</p>
<p>The C++ code is pretty straightforward after that, you just need to keep your iterators in sync.</p>
<p>One wrinkle that I forgot about the first time I coded this is that there are a few things that I need to be true: firstly, I need the output to be the lower-triangle of a symmetric matrix, and secondly I need that matrix to have the same sparsity pattern as <img src="https://latex.codecogs.com/png.latex?Q">. To do this, I wrote a RAII helper class, mainly because if I’m going to manipulate raw pointers I’m gonna want some safety.</p>
<p>This helper class is a <em>functor</em>, meaning that its objects are callable with similar syntax to functions. Is this strictly necessary? Of course not. But mummy I love him.</p>
<div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode numberSource cpp number-lines code-with-copy"><code class="sourceCode cpp"><span id="cb2-1"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;Eigen/SparseCore&gt;</span></span>
<span id="cb2-2"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;Eigen/SparseCholesky&gt;</span></span>
<span id="cb2-3"></span>
<span id="cb2-4"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typedef</span> Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>SparseMatrix<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">double</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;::</span>StorageIndex StorageIndex<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-5"></span>
<span id="cb2-6"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">template</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typename</span> SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">class</span> MatchPattern <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb2-7">    <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">using</span> T <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typename</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">base_type</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;::</span>type<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-8">    StorageIndex<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_outer</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-9">    StorageIndex<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_inner</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-10">    T<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_val</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-11">    StorageIndex <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_cols</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-12">    StorageIndex <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_nnz</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-13"></span>
<span id="cb2-14">    <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">public</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span></span>
<span id="cb2-15"></span>
<span id="cb2-16">    MatchPattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">const</span> SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> A<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">const</span> SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> pattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb2-17">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/**</span></span>
<span id="cb2-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  MatchPattern(const SpMat&amp; A, const SpMat&amp; pattern)</span></span>
<span id="cb2-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  Constructs functor class designed to construct a sparse matrix with</span></span>
<span id="cb2-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  the same non-zero pattern as `pattern` and the same non-zero values </span></span>
<span id="cb2-21"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  as `A`.</span></span>
<span id="cb2-22"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  </span></span>
<span id="cb2-23"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  This function assumes that the sparsity pattern of `pattern` is a SUBSET</span></span>
<span id="cb2-24"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  of the sparsity pattern of `A`. Weird things will happen if this does not</span></span>
<span id="cb2-25"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  hold.</span></span>
<span id="cb2-26"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     * </span></span>
<span id="cb2-27"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  Usage:</span></span>
<span id="cb2-28"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  ```</span></span>
<span id="cb2-29"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  typedef Eigen::SparseMatrix</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;double&gt;</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"> SpMatrixd;</span></span>
<span id="cb2-30"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  SpMatrixd A_pattern = MatchPattern</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;SpMatrixd&gt;</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(A, pattern)();</span></span>
<span id="cb2-31"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  ```</span></span>
<span id="cb2-32"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    * */</span></span>
<span id="cb2-33">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_cols</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>cols<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">();</span></span>
<span id="cb2-34">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_nnz</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>nonZeros<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">();</span></span>
<span id="cb2-35"></span>
<span id="cb2-36">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_outer</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">new</span> StorageIndex<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_cols</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">];</span></span>
<span id="cb2-37">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>copy<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>pattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>outerIndexPtr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(),</span> pattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>outerIndexPtr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_cols</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_outer</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span> </span>
<span id="cb2-38">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_inner</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">new</span> StorageIndex<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_nnz</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">];</span></span>
<span id="cb2-39">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>copy<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>pattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>innerIndexPtr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(),</span> pattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>innerIndexPtr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_nnz</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_inner</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb2-40">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_val</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">new</span> T<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_nnz</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">];</span></span>
<span id="cb2-41"></span>
<span id="cb2-42"></span>
<span id="cb2-43">        T<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> valptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_val</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-44">        <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_cols</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span>j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb2-45">            <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typename</span> SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>InnerIterator Acol<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>A<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb2-46">            <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typename</span> SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>InnerIterator pattern_col<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>pattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb2-47">                pattern_col<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span>pattern_col<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb2-48">                    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">while</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>Acol <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>Acol<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>row<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> pattern_col<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>row<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">())){</span></span>
<span id="cb2-49">                        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span>Acol<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-50">                    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb2-51">                    valptr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Acol<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>value<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">();</span></span>
<span id="cb2-52">                    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span>Acol<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-53">                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb2-54">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb2-55">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb2-56"></span>
<span id="cb2-57">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Specialization for rank-1 matrices A = bc^T</span></span>
<span id="cb2-58">    MatchPattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span></span>
<span id="cb2-59">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">const</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typename</span> Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>Matrix<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>T<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span>Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>Dynamic<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;&amp;</span> b<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> </span>
<span id="cb2-60">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">const</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typename</span> Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>Matrix<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>T<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span>Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>Dynamic<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> c<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> </span>
<span id="cb2-61">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">const</span> SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> pattern</span>
<span id="cb2-62">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb2-63">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/**</span></span>
<span id="cb2-64"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  MatchPattern(typename Eigen::Vector</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;T&gt;</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp; b, typename Eigen::Vector</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;T&gt;</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp; c, const SpMat&amp; pattern)</span></span>
<span id="cb2-65"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  A specialization of the MatchPattern class where the matrix to be matched </span></span>
<span id="cb2-66"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  is a rank one matrix of the form $A = bc^T$.</span></span>
<span id="cb2-67"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     * </span></span>
<span id="cb2-68"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  Usage:</span></span>
<span id="cb2-69"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  ```</span></span>
<span id="cb2-70"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  typedef Eigen::SparseMatrix</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;double&gt;</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"> SpMatrixd;</span></span>
<span id="cb2-71"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  SpMatrixd A_pattern = MatchPattern</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;SpMatrixd&gt;</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(b, c, pattern)();</span></span>
<span id="cb2-72"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     *  ```</span></span>
<span id="cb2-73"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     * */</span></span>
<span id="cb2-74">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_cols</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>cols<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">();</span></span>
<span id="cb2-75">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_nnz</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>nonZeros<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">();</span></span>
<span id="cb2-76">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_outer</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">new</span> StorageIndex<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_cols</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">];</span></span>
<span id="cb2-77">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>copy<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>pattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>outerIndexPtr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(),</span> pattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>outerIndexPtr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_cols</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_outer</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span> </span>
<span id="cb2-78">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_inner</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">new</span> StorageIndex<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_nnz</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">];</span></span>
<span id="cb2-79">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>copy<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>pattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>innerIndexPtr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(),</span> pattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>innerIndexPtr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_nnz</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_inner</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb2-80">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_val</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">new</span> T<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_nnz</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">];</span></span>
<span id="cb2-81"></span>
<span id="cb2-82">        T<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> valptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_val</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-83">        <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_cols</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span>j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb2-84">            <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typename</span> SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>InnerIterator pattern_col<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>pattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb2-85">                pattern_col<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span>pattern_col<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb2-86">                    valptr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> b<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>coeff<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>pattern_col<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>row<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">())</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> c<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>coeff<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb2-87">                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb2-88">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb2-89">        </span>
<span id="cb2-90">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb2-91"></span>
<span id="cb2-92">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>MatchPattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb2-93">        <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">delete</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[]</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_inner</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-94">        <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">delete</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[]</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_outer</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-95">        <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">delete</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[]</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_val</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-96">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb2-97"></span>
<span id="cb2-98">    SpMat <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">operator</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb2-99">        <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typename</span> SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>Map<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span></span>
<span id="cb2-100">            <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_cols</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb2-101">            <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_cols</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb2-102">            <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_nnz</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb2-103">            <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_outer</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb2-104">            <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_inner</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb2-105">            <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_val</span></span>
<span id="cb2-106">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb2-107">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb2-108"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">};</span></span></code></pre></div>
<p>The first thing to note here is that I am, fundamentally, quite lazy. As such I have made the convenient assumption that the target sparsity pattern is always a subset of the sparsity pattern of interest. This is true for the application that I have in mind, but you should probably be careful if you’re adapting this code to anything else.</p>
<p>The second thing you may have noticed is that there a second constructor that is not needed here at all. This is really a gift to future me and avoids me having to re-write this code at some point in the future. Nothing to see here.</p>
<p>With all of this in hand, we can jump over to the code indebted to<sup>2</sup> Finn.</p>
<div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode numberSource cpp number-lines code-with-copy"><code class="sourceCode cpp"><span id="cb3-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">template</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typename</span> SpChol<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typename</span> SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span></span>
<span id="cb3-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typename</span> SpChol<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>MatrixType partial_inverse<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span></span>
<span id="cb3-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">const</span> SpChol<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> llt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb3-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">const</span> SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> pattern</span>
<span id="cb3-5"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb3-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/**</span></span>
<span id="cb3-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"> *  Input:</span></span>
<span id="cb3-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"> *  - `llt`: a Sparse Cholesky factorization of a matrix `Q`.</span></span>
<span id="cb3-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"> *  - `pattern`: a sparse matrix with the target sparsity</span></span>
<span id="cb3-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"> *  Assumptions:</span></span>
<span id="cb3-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"> *  - `pattern` has the same sparsity pattern as `Q` or is a subset of that pattern</span></span>
<span id="cb3-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"> *  Output: </span></span>
<span id="cb3-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"> *  - A sparse matrix with the same sparsity pattern as `pattern` who's non-zero</span></span>
<span id="cb3-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"> *    elements correspond to the non-zero elements of $Q^{-1}$.</span></span>
<span id="cb3-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"> **/</span></span>
<span id="cb3-16">    <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typedef</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typename</span> SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>ReverseInnerIterator reverse_it<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb3-17">    StorageIndex ncols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> llt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>cols<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">();</span></span>
<span id="cb3-18">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">const</span> SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> llt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>matrixL<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">();</span></span>
<span id="cb3-19">    SpMat Qinv <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">template</span> selfadjointView<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>Lower<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;();</span></span>
<span id="cb3-20"></span>
<span id="cb3-21">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> i <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ncols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> i <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--</span>i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb3-22">        reverse_it QinvcolI<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>Qinv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb3-23">        <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>reverse_it LcolI_slow<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>L<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span> LcolI_slow<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--</span>LcolI_slow<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb3-24">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// inner sum iterators</span></span>
<span id="cb3-25">            reverse_it LcolI<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>L<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb3-26">            reverse_it QinvcolJ<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>Qinv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> LcolI_slow<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>row<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">());</span></span>
<span id="cb3-27">            </span>
<span id="cb3-28">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Initialize Qinv[j,i]</span></span>
<span id="cb3-29">            QinvcolI<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>valueRef<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb3-30"></span>
<span id="cb3-31">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Inner-most sum</span></span>
<span id="cb3-32">            <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">while</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>LcolI<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>row<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb3-33">                <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// First up, sync the iterators</span></span>
<span id="cb3-34">                <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">while</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span> QinvcolJ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;&amp;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>LcolI<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>row<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> QinvcolJ<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>row<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">())){</span></span>
<span id="cb3-35">                    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--</span>QinvcolJ<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb3-36">                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb3-37">                <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>QinvcolJ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;&amp;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>QinvcolJ<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>row<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> LcolI<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>row<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()))</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb3-38">                    QinvcolI<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>valueRef<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-=</span> LcolI<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>value<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> QinvcolJ<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>value<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">();</span></span>
<span id="cb3-39">                    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--</span>QinvcolJ<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb3-40">                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb3-41">                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--</span>LcolI<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb3-42">            <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb3-43">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// At this point LcolI is the diagonal value</span></span>
<span id="cb3-44">            <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>i <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> LcolI_slow<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>row<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">())</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb3-45">                QinvcolI<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>valueRef<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span>  <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> LcolI<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>value<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">();</span></span>
<span id="cb3-46">                QinvcolI<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>valueRef<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/=</span>  LcolI<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>value<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">();</span></span>
<span id="cb3-47">            <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb3-48">                QinvcolI<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>valueRef<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/=</span>  LcolI<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>value<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">();</span></span>
<span id="cb3-49">                <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Set Qinv[i,j] = Qinv[j,i]</span></span>
<span id="cb3-50">                <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">while</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>QinvcolJ<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>row<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb3-51">                    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--</span>QinvcolJ<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb3-52">                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb3-53">                QinvcolJ<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>valueRef<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> QinvcolI<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>value<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">();</span></span>
<span id="cb3-54">            <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb3-55">            <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--</span>QinvcolI<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb3-56">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb3-57">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb3-58"></span>
<span id="cb3-59">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Undo the permutation</span></span>
<span id="cb3-60">    Qinv <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Qinv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>twistedBy<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>llt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>permutationP<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">().</span>inverse<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">());</span></span>
<span id="cb3-61"></span>
<span id="cb3-62">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Return the non-zero elements of Qinv corresponding to the non-zero</span></span>
<span id="cb3-63">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// elements of Q</span></span>
<span id="cb3-64">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> MatchPattern<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>Qinv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> Q<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)();</span></span>
<span id="cb3-65"></span>
<span id="cb3-66"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div>
<p>You’ll probably notice that there are far fewer template shenanigans here than in the block matrix code from yesterday. That is because this only needs to work with scalar types and doesn’t need to be part of the <code>math</code> API. If needed, I guess we could always work out what the derivative of the partial inverse is and implement its reverse-mode specialization in Stan, but frankly why<sup>3</sup> bother.</p>
<p>The other thing you may notice is the line</p>
<div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode numberSource cpp number-lines code-with-copy"><code class="sourceCode cpp"><span id="cb4-1">Qinv <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Qinv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>twistedBy<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>llt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>permutationP<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">().</span>inverse<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">());</span></span></code></pre></div>
<p>This exists because the Cholesky factor is not actually performed on <img src="https://latex.codecogs.com/png.latex?Q"> but rather on a permuted matrix <img src="https://latex.codecogs.com/png.latex?PQP%5ET"> for some permutation matrix <img src="https://latex.codecogs.com/png.latex?P">. This line basically undoes the permutation and puts everything back into its right place.</p>
</section>
<section id="a-quick-test" class="level2">
<h2 class="anchored" data-anchor-id="a-quick-test">A quick test</h2>
<p>Finally, we need to make sure this works. The easiest way to do this is with a simple example, which is a 25x25 sparse matrix. Everything is hard coded in, because why not.</p>
<div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode numberSource cpp number-lines code-with-copy"><code class="sourceCode cpp"><span id="cb5-1"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;iostream&gt;</span></span>
<span id="cb5-2"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">"partial_inverse.hpp"</span></span>
<span id="cb5-3"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">"Eigen/SparseCore"</span></span>
<span id="cb5-4"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">"Eigen/Dense"</span></span>
<span id="cb5-5"></span>
<span id="cb5-6"></span>
<span id="cb5-7"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> main<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb5-8">    <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">using</span> SparseMatrix <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>SparseMatrix<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">double</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>ColMajor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;;</span></span>
<span id="cb5-9"></span>
<span id="cb5-10">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> Q_inner<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[]</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">11</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb5-11">                    <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">11</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">11</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">11</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb5-12">                    <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">17</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">18</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">19</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">11</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">17</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb5-13">                    <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">17</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">18</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">22</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">17</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">18</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">19</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">18</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">19</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">24</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb5-14">                    <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">22</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">17</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">22</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">18</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">22</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">24</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">19</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">24</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">};</span></span>
<span id="cb5-15">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> Q_outer<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[]</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">11</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">18</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">22</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">27</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">32</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">37</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">41</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">45</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">55</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">60</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">64</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">68</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">73</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">78</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">83</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb5-16">                    <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">87</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">90</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">94</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">98</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">102</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">105</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">};</span> </span>
<span id="cb5-17">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">double</span> Q_val<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[]</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb5-18">                    <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb5-19">                    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb5-20">                    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb5-21">                    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb5-22">                    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb5-23">                    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb5-24">                    <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb5-25">                    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">};</span></span>
<span id="cb5-26">    Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>VectorXd Qinv_true<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">105</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb5-27">    Qinv_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.220593295593296</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.051483238983239</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.051483238983239</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.051483238983239</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.233306970806971</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0547785547785548</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0602730602730603</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0547785547785548</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.234139471639472</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0547785547785548</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0611402486402486</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0547785547785548</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.233306970806971</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.051483238983239</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0602730602730603</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.051483238983239</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.220593295593296</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.051483238983239</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.051483238983239</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.233306970806971</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0602730602730603</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0547785547785548</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0602730602730603</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0602730602730603</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.25021645021645</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0652680652680653</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0652680652680653</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0611402486402486</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0652680652680653</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.251621989121989</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0652680652680653</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0664335664335664</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0602730602730603</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0652680652680653</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.25021645021645</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0602730602730603</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0652680652680653</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.051483238983239</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0602730602730603</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.233306970806971</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0547785547785548</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0547785547785548</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.234139471639472</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0611402486402486</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0547785547785548</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0652680652680653</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0611402486402486</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.251621989121989</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0664335664335664</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0652680652680653</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0664335664335664</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0664335664335664</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.253146853146853</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0664335664335664</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0664335664335664</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0652680652680653</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0664335664335664</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.251621989121989</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0611402486402486</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0652680652680653</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0547785547785548</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0611402486402486</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.234139471639472</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0547785547785548</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0547785547785548</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.233306970806971</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0602730602730603</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.051483238983239</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0652680652680653</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0602730602730603</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.25021645021645</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0652680652680653</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0602730602730603</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0664335664335664</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0652680652680653</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.251621989121989</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0652680652680653</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0611402486402486</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0652680652680653</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0652680652680653</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.25021645021645</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0602730602730603</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0602730602730603</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0547785547785548</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0602730602730603</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.233306970806971</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.051483238983239</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.051483238983239</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.220593295593296</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.051483238983239</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0602730602730603</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.051483238983239</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.233306970806971</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0547785547785548</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0611402486402486</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0547785547785548</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.234139471639472</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0547785547785548</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0602730602730603</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0547785547785548</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.233306970806971</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.051483238983239</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.051483238983239</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.051483238983239</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.220593295593296</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb5-28">    </span>
<span id="cb5-29">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> Q_ncol <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">25</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb5-30">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> Q_nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">105</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb5-31"></span>
<span id="cb5-32">    SparseMatrix Q <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>Map<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>SparseMatrix<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;(</span>Q_ncol<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> Q_ncol<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> Q_nnz<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> Q_outer<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> Q_inner<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> Q_val<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb5-33"></span>
<span id="cb5-34"></span>
<span id="cb5-35">    <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">auto</span> llt <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>SimplicialLLT<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>SparseMatrix<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;(</span>Q<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb5-36"></span>
<span id="cb5-37">    SparseMatrix Qinv <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> partial_inverse<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>llt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> Q<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb5-38">    Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>VectorXd Qinv_val <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>Map<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>VectorXd<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;(</span>Qinv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>valuePtr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(),</span> Q_nnz<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb5-39"></span>
<span id="cb5-40">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>cout <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"The error in the partial inverse is "</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>Qinv_val <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> Qinv_true<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">).</span>norm<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"!"</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>endl<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb5-41"></span>
<span id="cb5-42"></span>
<span id="cb5-43"></span>
<span id="cb5-44"></span>
<span id="cb5-45"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div>
<p>The output is</p>
<pre><code>The error in the partial inverse is 1.25852e-15!</code></pre>
<p>All good here.</p>
<p>That’s the end for this blog post. Hopefully I’ll be back soon-ish with a more interesting post that actually uses all of this stuff.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Takahashi, K., Fagan, J., Chen, M.S., 1973. Formation of a sparse bus impedance matrix and its application to short circuit study. In: Eighth PICA Conference Proceedings.IEEE Power Engineering Society, pp.&nbsp;63–69 (Papers Presented at the 1973 Power Industry Computer Application Conference in Minneapolis, MN)↩︎</p></li>
<li id="fn2"><p>stolen from↩︎</p></li>
<li id="fn3"><p>One reason would be to use gradient descent on the score function for a Gaussian MLE. Another is that this might be useful inside the <code>generated quantities</code> block to compute things like the marginal variances of the model, but, as the great lady said, not today Satan.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2024,
  author = {Simpson, Dan},
  title = {Random {C++} {Part} 2: {Sparse} Partial Inverses in {Eigen}},
  date = {2024-09-05},
  url = {https://dansblog.netlify.app/posts/2024-05-08-laplace/laplace.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2024" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2024. <span>“Random C++ Part 2: Sparse Partial Inverses in
Eigen.”</span> September 5, 2024. <a href="https://dansblog.netlify.app/posts/2024-05-08-laplace/laplace.html">https://dansblog.netlify.app/posts/2024-05-08-laplace/laplace.html</a>.
</div></div></section></div> ]]></description>
  <category>Stan</category>
  <category>Sparse matrices</category>
  <category>Autodiff</category>
  <category>Eigen</category>
  <guid>https://dansblog.netlify.app/posts/2024-09-05-partial-inverse/partial-inverse.html</guid>
  <pubDate>Wed, 04 Sep 2024 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2024-09-05-partial-inverse/reba.JPG" medium="image"/>
</item>
<item>
  <title>Random C++ Part 1: Building a block sparse matrix in Eigen</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2024-09-04-block-matrices/blocks.html</link>
  <description><![CDATA[ 





<div class="callout callout-style-simple callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Code availability
</div>
</div>
<div class="callout-body-container callout-body">
<p>The code from this post can be found in my <a href="https://github.com/dpsimpson/blog/tree/master/posts/2024-09-04-block-matrices">github repo</a>.</p>
</div>
</div>
<p>I’ll be honest with y’all. I was writing something else. It was really long and was getting annoying to edit and was probably never going to be finished. So instead of doing that, I am just going to post this. It’s about making a block matrix in a Stan-compatible way. Why?? Because I wanted to be able to do this.</p>
<p>There is no context forthcoming. There are no good jokes. Just building one sparse matrix.</p>
<p>Enjoy</p>
<section id="c-plumbing-building-a-2x2-block-sparse-matrix-from-a-sparse-11-block-and-two-dense-matrices" class="level2">
<h2 class="anchored" data-anchor-id="c-plumbing-building-a-2x2-block-sparse-matrix-from-a-sparse-11-block-and-two-dense-matrices">C++ Plumbing: Building a 2x2 block sparse matrix from a sparse (1,1) block and two dense matrices</h2>
<p>The first thing that we need to do is build a block-sparse matrix. We know that this matrix is symmetric so we only need to store the lower-triangle.</p>
<p>In general, this is not the most difficult task in the world. We have <a href="https://dansblog.netlify.app/posts/2022-03-23-getting-jax-to-love-sparse-matrices/getting-jax-to-love-sparse-matrices#so-how-do-we-store-a-sparse-matrix">already talked about how we store sparse matrices</a> and, in particular, have had some fun with the Compressed Column Storage (CCS) scheme, which stores sparse matrices column-by-column. In the lingo, we call this <em>column major</em> storage.</p>
<p>When any array of numbers is stored in memory by a program, it is stored as a long vector and when you index into it (using something like <code>A[i,j]</code>) this is just some syntactic sugar for finding the correct value in that long vector.</p>
<p>Some languages, such as Fortran and Matlab, and libraries, such as <a href="https://gitlab.com/libeigen/eigen">Eigen</a>, store arrays in column major order. Others, like C/C++ and Python, use row-major storage. Stan is written in C++ but all of its linear algebra is done using Eigen, so we are going to use column-major storage.</p>
<p>It may seem catastrophically nerdy to be talking about internal storage orders for arrays in different languages, but I promise you this is <em>incredibly</em> important. If you want to write any sort of performant code, it’s extremely important that your algorithms are aligned with the internal storage order. That means that we need to prefer algorithms that run down columns of matrices over ones that run across rows.</p>
<p>This is because computers are clever and when you ask them for, eg <code>A[0,0]</code>, the CPU will actually load the first few entries of the 0th <em>column</em><sup>1</sup> of <code>A</code> in anticipation<sup>2</sup> that you will need <code>A[0,1]</code> and its friends next. If you instead next ask for <code>A[1,0]</code>, the CPU has to throw its pre-loaded stuff out, reach out to some potentially distant memory and try again. When an array has a lot of rows, these cache misses<sup>3</sup> noticeably degrade the performance of a program.</p>
<p>All of that is to say that this is actually not too too hard to implement because we are just interleaving some contiguous chunks of a vector. While the main loop is pretty straightforward, C++ is truly a journey. So it’s gonna be like 100 lines of code.</p>
<p>The structure is</p>
<ol type="1">
<li><p>Allocate 3 arrays to store the outer index (which column?), the inner index (which row?), and the value.</p></li>
<li><p>Iterate through each column of the matrix, only storing the lower triangle.</p></li>
<li><p>Return an <code>Eigen::SparseMatrix&lt;double&gt;</code> built from those arrays.</p></li>
</ol>
<p>There are essentially two challenges in doing this. Firstly, the number of columns and then number of non-zeros are not known at compile time so we need to allocate dynamic memory on the heap. This is always a risky proposition in C++ as it’s pretty easy to screw up and end up with a memory leak. To get around this, I’m using the RAII (resource acquisition is instantiation) pattern, which basically encapsulates all the memory usage inside a functor, who’s call method return a sparse symmetric matrix.</p>
<p>The second challenge is that the Eigen API demands raw pointers. So this is going to have that good old fashioned <code>*ptr++</code> action.</p>
<p>Without further ado, here is the code. I’ll explain some key bits after.</p>
<div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode numberSource cpp number-lines code-with-copy"><code class="sourceCode cpp"><span id="cb1-1"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;stan/math/prim/meta/is_eigen_sparse_base.hpp&gt;</span></span>
<span id="cb1-2"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;stan/math/prim/meta/is_eigen.hpp&gt;</span></span>
<span id="cb1-3"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;stan/math/prim/meta/is_stan_scalar.hpp&gt;</span></span>
<span id="cb1-4"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;stan/math/prim/meta/base_type.hpp&gt;</span></span>
<span id="cb1-5"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;stan/math/prim/err/check_size_match.hpp&gt;</span></span>
<span id="cb1-6"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;stan/math/prim/fun/to_ref.hpp&gt;</span></span>
<span id="cb1-7"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;Eigen/SparseCore&gt;</span></span>
<span id="cb1-8"></span>
<span id="cb1-9"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">namespace</span> stan <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-10"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">namespace</span> math <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-11"></span>
<span id="cb1-12"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typedef</span> Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>SparseMatrix<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">double</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;::</span>StorageIndex StorageIndex<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-13"></span>
<span id="cb1-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// The require_ statements are defined in the first #include</span></span>
<span id="cb1-15"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">template</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typename</span> SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typename</span> EigMat1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typename</span> EigMat2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> </span>
<span id="cb1-16"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">require_eigen_sparse_base_t</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;*</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">nullptr</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb1-17"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">require_all_eigen_t</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>EigMat1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> EigMat2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;*</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">nullptr</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb1-18"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">require_all_stan_scalar_t</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">base_type_t</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;,</span></span>
<span id="cb1-19">                          <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">base_type_t</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>EigMat1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;,</span></span>
<span id="cb1-20">                          <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">base_type_t</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>EigMat2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;&gt;*</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">nullptr</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span>  </span>
<span id="cb1-21"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">class</span> Block_sparse_lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-22">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/* </span></span>
<span id="cb1-23"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    A RAII functor class because Jesus hates memory leaks</span></span>
<span id="cb1-24"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Make this encapsulate the whole thing.</span></span>
<span id="cb1-25"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    You may be asking why I'm using arrays and pointers</span></span>
<span id="cb1-26"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    like I'm writing in C, and the answer is </span></span>
<span id="cb1-27"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    "that's the interface to Map". The dream of the </span></span>
<span id="cb1-28"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    C-90 is alive and well in the eigen code base.</span></span>
<span id="cb1-29"></span>
<span id="cb1-30"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Anyway, `operator ()` returns a sparseMatrixMap</span></span>
<span id="cb1-31"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    */</span></span>
<span id="cb1-32">    <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">using</span> T <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typename</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">base_type</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;::</span>type<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-33">   </span>
<span id="cb1-34">    StorageIndex<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_outer</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-35">    StorageIndex<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_inner</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-36">    T<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_val</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-37">    StorageIndex <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_cols</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-38">    StorageIndex <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_nnz</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-39"></span>
<span id="cb1-40">    <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">public</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span></span>
<span id="cb1-41"></span>
<span id="cb1-42">    Block_sparse_lower<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span></span>
<span id="cb1-43">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">const</span> SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> top_left<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> </span>
<span id="cb1-44">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">const</span> EigMat1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> bottom_left<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> </span>
<span id="cb1-45">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">const</span> EigMat2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> bottom_right</span>
<span id="cb1-46">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> </span>
<span id="cb1-47">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-48">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// only eval once</span></span>
<span id="cb1-49">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">const</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">auto</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> tl_ref <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> to_ref<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>top_left<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb1-50">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">const</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">auto</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> bl_ref <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> to_ref<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>bottom_left<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb1-51">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">const</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">auto</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> br_ref <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> to_ref<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>bottom_right<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb1-52"></span>
<span id="cb1-53">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Get sizes.</span></span>
<span id="cb1-54">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// NB tmp_nnz is an upper bound. Will only be correct if `top_left` is lower </span></span>
<span id="cb1-55">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// triangular. We will compute the real value on the fly.</span></span>
<span id="cb1-56">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">const</span> StorageIndex ncols_tl <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tl_ref<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>cols<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">();</span></span>
<span id="cb1-57">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">const</span> StorageIndex ncols_br <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> br_ref<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>cols<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">();</span></span>
<span id="cb1-58">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">const</span> StorageIndex tmp_nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>tl_ref<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>nonZeros<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> ncols_tl <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> ncols_br </span>
<span id="cb1-59">                                        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>ncols_br <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> ncols_br <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb1-60"></span>
<span id="cb1-61">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// check sizes</span></span>
<span id="cb1-62">        check_size_match<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Block_sparse_lower"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Columns of "</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"top_left "</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> tl_ref<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>cols<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(),</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Columns of "</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bottom Left"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> bl_ref<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>cols<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">());</span></span>
<span id="cb1-63">        check_size_match<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Block_sparse_lower"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Rows of "</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bottom-left "</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> bl_ref<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rows<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(),</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Rows of "</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bottom-right"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> br_ref<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>rows<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">());</span></span>
<span id="cb1-64">        </span>
<span id="cb1-65">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Allocate!</span></span>
<span id="cb1-66">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_cols</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ncols_tl <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> ncols_br<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-67"></span>
<span id="cb1-68">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_outer</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">new</span> StorageIndex<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_cols</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">];</span></span>
<span id="cb1-69">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_outer</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">]</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>top_left<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>outerIndexPtr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">();</span></span>
<span id="cb1-70">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_inner</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">new</span> StorageIndex<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span>tmp_nnz<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">];</span></span>
<span id="cb1-71">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_val</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">new</span> T<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span>tmp_nnz<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">];</span></span>
<span id="cb1-72">        </span>
<span id="cb1-73">        T<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> p_val <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_val</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-74">        StorageIndex<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> p_inner <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_inner</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-75">        StorageIndex out_nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-76">        </span>
<span id="cb1-77">        <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>StorageIndex j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> ncols_tl<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span>j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-78">            StorageIndex col_cnt <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-79">            <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typename</span> SpMat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>InnerIterator it<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>tl_ref<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span> it<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span>it<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-80">                <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>it<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>row<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">continue</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// lower triangle only</span></span>
<span id="cb1-81">                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>p_val<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> it<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>value<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">();</span></span>
<span id="cb1-82">                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>p_inner<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> it<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>row<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">();</span></span>
<span id="cb1-83">                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span>out_nnz<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-84">                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span>col_cnt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-85">            <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb1-86"></span>
<span id="cb1-87">            <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>StorageIndex i <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> i <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> ncols_br<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span>i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-88">                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>p_val<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bl_ref<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>coeff<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb1-89">                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>p_inner<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ncols_tl <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-90">                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span>out_nnz<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-91">                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span>col_cnt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-92">            <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb1-93">        </span>
<span id="cb1-94">            <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_outer</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span>j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">]</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_outer</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span>j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">]</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> col_cnt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-95">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb1-96">        </span>
<span id="cb1-97">        <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>StorageIndex j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> ncols_br<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span>j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-98">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// only need lower triangle</span></span>
<span id="cb1-99">            <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>StorageIndex i <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> i <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> ncols_br<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span>i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-100">                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>p_val<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> br_ref<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>coeff<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span>j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb1-101">                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>p_inner<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ncols_tl <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-102">                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span>out_nnz<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-103">            <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb1-104">            <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_outer</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span>ncols_tl<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span>j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">]</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_outer</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span>ncols_tl <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">]</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> ncols_br <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-105">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb1-106">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_nnz</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> out_nnz<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-107">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// constructor</span></span>
<span id="cb1-108"></span>
<span id="cb1-109">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>Block_sparse_lower<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-110">        <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">delete</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[]</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_outer</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-111">        <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">delete</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[]</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_inner</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-112">        <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">delete</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[]</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_val</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-113">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// destructor</span></span>
<span id="cb1-114"></span>
<span id="cb1-115">    Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>SparseMatrix<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>T<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">operator</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-116">        <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">typename</span> Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>SparseMatrix<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>T<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;::</span>Map<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span></span>
<span id="cb1-117">            <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_cols</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> </span>
<span id="cb1-118">            <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_cols</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb1-119">            <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_nnz</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb1-120">            <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_outer</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb1-121">            <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_inner</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb1-122">            <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">m_val</span></span>
<span id="cb1-123">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span>   </span>
<span id="cb1-124">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//operator ()</span></span>
<span id="cb1-125"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">};</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Block_sparse_lower</span></span>
<span id="cb1-126"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// namespace math</span></span>
<span id="cb1-127"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// namespace stan</span></span></code></pre></div>
<p>The first thing you probably noticed was all the templates. Templates are a beautiful<sup>4</sup> feature of C++ and pretty much all that bit just allows us to have any matrix and sparse matrix from Eigen as long as they contain scalars (as opposed to autodiff variables). They also allow us to hack together a pre-C++20 version of <a href="https://en.wikipedia.org/wiki/Concepts_(C%2B%2B)">concepts</a>, which is all of the <code>require_</code> statements.</p>
<p>Once we are actually in the class, it has three methods. The constructor takes in the three matrices, one sparse and two dense. It checks at compile time that they are all column-major and then starts doing its work. There’s nothing too exciting happening here. Some size checking, and then we run through the loop stacking the relevant vector parts onto each other.</p>
<p>The destructor frees the allocated memory (a core part of the RAII pattern).</p>
<p>Finally, we need to actually get access to this sparse matrix, which I implemented as a call operator. It returns a self-adjoint view (aka it will pretend to be symmetric when doing operations even though only the lower triangle is filled) of a <code>Map</code> of the three pointers. <code>Map</code>s are a nice way for Eigen to tell its internal <code>SparseMatrix</code> representation to look at the pieces of memory defined in this class when it is looking for inner indices, outer indices, or values. This doesn’t create a copy so it’s memory efficient.</p>
<p>So let’s test it. I’m going to run the following code .</p>
<div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode numberSource cpp number-lines code-with-copy"><code class="sourceCode cpp"><span id="cb2-1"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;iostream&gt;</span></span>
<span id="cb2-2"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">"sp_block.hpp"</span></span>
<span id="cb2-3"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">"Eigen/SparseCore"</span></span>
<span id="cb2-4"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">"Eigen/Dense"</span></span>
<span id="cb2-5"></span>
<span id="cb2-6"></span>
<span id="cb2-7"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> main<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb2-8">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>cout <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"-----------matrix test---------"</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>endl<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-9">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">double</span> values<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[]</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4.</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">};</span></span>
<span id="cb2-10">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> inner<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[]</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">};</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// nonzero row indices</span></span>
<span id="cb2-11">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> outer<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[]</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">};</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// start index per column + 1 for last col</span></span>
<span id="cb2-12">    Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>SparseMatrix<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">double</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>SparseMatrix<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">double</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;::</span>Map<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span></span>
<span id="cb2-13">        <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/*rows*/</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/*cols*/</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/*nonzeros*/</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> outer<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> inner<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> values<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb2-14">    </span>
<span id="cb2-15">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>cout <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>MatrixXd<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>A<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>endl <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>endl<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-16"></span>
<span id="cb2-17">    Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>Matrix<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">double</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> B<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-18">    B <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4.</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4.</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-19">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>cout <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> B <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>endl <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>endl<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-20">    Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>Matrix<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">double</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> C<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-21">    C <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-22">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>cout <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> C <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>endl <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>endl<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-23">    </span>
<span id="cb2-24">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>cout <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"   -------ans-------"</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>endl<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-25">    Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>SparseMatrix<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">double</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> D <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> </span>
<span id="cb2-26">        stan<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>math<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>Block_sparse_lower<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">decltype</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>A<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">),</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">decltype</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>B<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">),</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">decltype</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>C<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)&gt;(</span></span>
<span id="cb2-27">            A<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>triangularView<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>Lower<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;(),</span> B<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> C<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>triangularView<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>Lower<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;())();</span></span>
<span id="cb2-28">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>cout <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>MatrixXd<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>D<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>endl <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>endl<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-29"></span>
<span id="cb2-30">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>cout <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"-----------to_ref test---------"</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>endl<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-31">    Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>SparseMatrix<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">double</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> E <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> A<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-32">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>cout <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>MatrixXd<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>A<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>endl <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>endl<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-33">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>cout <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"   -------ans-------"</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>endl<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-34">    Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>SparseMatrix<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">double</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> F <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> </span>
<span id="cb2-35">        stan<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>math<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>Block_sparse_lower<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">decltype</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>E<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">),</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">decltype</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>B<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">),</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">decltype</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>C<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)&gt;(</span>A<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> B<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> C<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)();</span></span>
<span id="cb2-36">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>cout <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> Eigen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>MatrixXd<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>F<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>endl <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">std::</span>endl<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-37"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div>
<p>After compilation, the output is</p>
<pre><code>-----------matrix test---------
0 0 0 0 0
0 0 0 4 0
0 0 3 0 0
0 2 0 0 0
1 0 0 0 0

1 2 3 4 5
1 2 3 4 5

1 1
1 1

   -------ans-------
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 3 0 0 0 0
0 2 0 0 0 0 0
1 0 0 0 0 0 0
1 2 3 4 5 1 0
1 2 3 4 5 1 1

-----------to_ref test---------
0 0 0 0 0
0 0 0 4 0
0 0 3 0 0
0 2 0 0 0
1 0 0 0 0

   -------ans-------
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 3 0 0 0 0
0 2 0 0 0 0 0
1 0 0 0 0 0 0
1 2 3 4 5 1 0
1 2 3 4 5 1 1
</code></pre>
<p>This is exactly what we expect! Hooray.</p>
<p>And that’s it. A symmetric 2x2 block sparse matrix in C++. Who knows what I’ll do next.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Or row if it’s row major↩︎</p></li>
<li id="fn2"><p>Let’s anthropomorphize. I don’t want to write a blog about caches.↩︎</p></li>
<li id="fn3"><p>Drag name: Cache Mx↩︎</p></li>
<li id="fn4"><p>Until you’re rooting around a seventy page compiler error that really just means you forgot a typename on the final <code>return</code>.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2024,
  author = {Simpson, Dan},
  title = {Random {C++} {Part} 1: {Building} a Block Sparse Matrix in
    {Eigen}},
  date = {2024-09-04},
  url = {https://dansblog.netlify.app/posts/2024-09-04-block-matrices/blocks.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2024" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2024. <span>“Random C++ Part 1: Building a Block Sparse
Matrix in Eigen.”</span> September 4, 2024. <a href="https://dansblog.netlify.app/posts/2024-09-04-block-matrices/blocks.html">https://dansblog.netlify.app/posts/2024-09-04-block-matrices/blocks.html</a>.
</div></div></section></div> ]]></description>
  <category>Stan</category>
  <category>Sparse matrices</category>
  <category>Autodiff</category>
  <category>Eigen</category>
  <guid>https://dansblog.netlify.app/posts/2024-09-04-block-matrices/blocks.html</guid>
  <pubDate>Tue, 03 Sep 2024 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2024-09-04-block-matrices/ravens.JPEG" medium="image"/>
</item>
<item>
  <title>An unexpected detour into partially symbolic, sparsity-expoiting autodiff; or Lord won’t you buy me a Laplace approximation</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2024-05-08-laplace/laplace.html</link>
  <description><![CDATA[ 





<p>I am, once again, in a bit of a mood. And the only thing that will fix my mood is a good martini and a Laplace approximation. And I’m all out of martinis.</p>
<p>To be honest I started writing this post in February 2023, but then got distracted by visas and jobs and all that jazz. But I felt the desire to finish it, so here we are. I wonder how much I will want to re-write<sup>1</sup></p>
<p>The post started as a pedagogical introduction to Laplace approximations (for reasons I don’t fully remember), but it rapidly went off the rails. So strap yourself in<sup>2</sup> for a tour through the basics of sparse autodiff and a tour through manipulating the <code>jaxpr</code> intermediate representation in order to make one very simple logistic regression produce autodiff code that is almost as fast as a manually programmed gradient.</p>
<section id="the-laplace-approximation" class="level2">
<h2 class="anchored" data-anchor-id="the-laplace-approximation">The Laplace approximation</h2>
<p>One of the simplest approximations to a distribution is the Laplace approximation. It be defined as the Gaussian distribution that matches the location and the curvature at the mode of the target distribution. It lives its best life when the density is of the form <img src="https://latex.codecogs.com/png.latex?%0Ap(x)%20%5Cpropto%20%5Cexp(-nf_n(x)),%0A"> where <img src="https://latex.codecogs.com/png.latex?f_n"> is a sequence of functions<sup>3</sup>. Let’s imagine that we want to approximate the normalized density <img src="https://latex.codecogs.com/png.latex?p(x)"> near the mode <img src="https://latex.codecogs.com/png.latex?x%5E*">. We can do this by taking the second order Taylor expansion of <img src="https://latex.codecogs.com/png.latex?f_n"> around <img src="https://latex.codecogs.com/png.latex?x=x_0">, which is <img src="https://latex.codecogs.com/png.latex?%0Af_n%20=%20f_n(x%5E*)%20+%20(x-x%5E*)%5ETH(x%5E*)(x-x%5E*)%20%20+%20%5Cmathcal%7BO%7D((x-x%5E*)%5E3),%0A"> where<sup>4</sup> <img src="https://latex.codecogs.com/png.latex?%0A%5BH(x%5E*)%5D_%7Bij%7D%20=%20%5Cfrac%7B%5Cpartial%5E2%20f_n%7D%7B%5Cpartial%20x_i%20%5Cpartial%20x_j%7D%0A"> is the Hessian matrix.</p>
<p>If we replace <img src="https://latex.codecogs.com/png.latex?f_n"> by its quadratic approximation we get <img src="https://latex.codecogs.com/png.latex?%0Ap(x)%20%5Capprox%20%20C%5Cexp(-n(x-%20x%5E*)%5ETH(x%5E*)(x-x%5E*)),%0A"> where <img src="https://latex.codecogs.com/png.latex?C"> is a constant.</p>
<p>After normalizing the approximation to make sure that we get a proper density, we get the Laplace approximation <img src="https://latex.codecogs.com/png.latex?%0Ap(x)%20%5Capprox%20N(x%5E*,%20n%5E%7B-1%7DH(x%5E*)%5E%7B-1%7D).%0A"></p>
<p>The Laplace approximation can be justified rigorously and has a well-studied error and it’s known to work quite well when <img src="https://latex.codecogs.com/png.latex?p(x)"> is a) unimodal<sup>5</sup> and b) isn’t tooooo non-Gaussian.</p>
<p>In practice, people have found that Laplace approximations do a reasonable<sup>6</sup> job quantifying uncertainty <a href="https://arxiv.org/abs/2106.14806">even in complex neural network models</a> and it is at the heart of any number of classical estimators in statistics.</p>
<p>From an implementation perspective, the Laplace approximation is pretty simple. It’s just a two step process:</p>
<div class="algorithm">
<ol type="1">
<li><p>Find the mode <img src="https://latex.codecogs.com/png.latex?x%5E*%20=%20%5Carg%20%5Cmax_x%20f_n(x)"> using your favorite optimizer</p></li>
<li><p>Compute the Hessian <img src="https://latex.codecogs.com/png.latex?H(x%5E*)">.</p></li>
</ol>
</div>
<p>In a Bayesian context, we typically take <img src="https://latex.codecogs.com/png.latex?%0Af_n(x)%20=%20%5Cfrac%7B1%7D%7Bn%7D%20%5Csum_%7Bi=1%7D%5En%20%5Clog%20p(y_i%20%5Cmid%20x)%20+%20%5Cfrac%7B1%7D%7Bn%7D%20%5Clog%20p(x),%0A"> which will lead to a Gaussian approximation to the posterior distribution. But this post really isn’t about Bayes. It’s about Laplace approximations.</p>
<section id="computing-the-laplace-approximation-in-jax" class="level3">
<h3 class="anchored" data-anchor-id="computing-the-laplace-approximation-in-jax">Computing the Laplace approximation in JAX</h3>
<p>This is a two step process and, to be honest, all of the steps are pretty standard. So (hopefully) this will not be too tricky to implement. For simplicity, I’m not going to bother with the dividing and multiplying by <img src="https://latex.codecogs.com/png.latex?n">, although for very large data it could be quite</p>
<div id="4cea8f03" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> jax.numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> jnp</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax.scipy.optimize <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> minimize</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax.scipy.special <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> expit</span>
<span id="cb1-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> jacfwd, grad</span>
<span id="cb1-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Array</span>
<span id="cb1-6"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> typing <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Callable, Tuple, List, Set, Dict</span>
<span id="cb1-7"></span>
<span id="cb1-8"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> laplace(f: Callable, x0: Array) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> Array:</span>
<span id="cb1-9">    nx <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> x0.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb1-10">    mode, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>details <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> minimize(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>f(x), x0, method <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"BFGS"</span>)</span>
<span id="cb1-11">    H <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>  <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jacfwd(grad(f))(mode)</span>
<span id="cb1-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> mode, H</span></code></pre></div>
</div>
<p>There are a few things worth noting here. There’s not really much in this code, except to note that <code>jax.scipy.optimize.minimize</code> finds the minimum of <img src="https://latex.codecogs.com/png.latex?f">, so I had to pass through the negative of the function. This change also propagates to the computation of the Hessian, which is computed as the Jacobian of the gradient of f.&nbsp;</p>
<p>Depending on what needs to be done with the Laplace approximation, it might be more appropriate to output the log-density rather than just the mode and the Hessian, but for the moment we will keep this signature.</p>
<p>Let’s try it out. First of all, I’m going to generate some random data from a logistic regression model. This is going to use <a href="https://jax.readthedocs.io/en/latest/jax-101/05-random-numbers.html">Jax’s slightly odd random number system where you need to manually update the state of the pseudo-random number generator</a>. This is beautifully repeatable<sup>7</sup> unlike, say, R or standard numpy, where you’ve got to pay <em>a lot</em> of attention to the state of the random number generator to avoid oddities.</p>
<div id="938fe91c" class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> random <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> jrandom</span>
<span id="cb2-2"></span>
<span id="cb2-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> make_data(key, n: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>, p: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> Tuple[Array, Array]:</span>
<span id="cb2-4">  key, sub <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jrandom.split(key)</span>
<span id="cb2-5">  X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jrandom.normal(sub, shape <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (n,p)) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>jnp.sqrt(p)</span>
<span id="cb2-6"></span>
<span id="cb2-7">  key, sub <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jrandom.split(key)</span>
<span id="cb2-8">  beta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jrandom.normal(sub, shape <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (p,))</span>
<span id="cb2-9">  key, sub <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jrandom.split(key)</span>
<span id="cb2-10">  beta0 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jrandom.normal(sub)</span>
<span id="cb2-11"></span>
<span id="cb2-12"></span>
<span id="cb2-13">  key, sub <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jrandom.split(key)</span>
<span id="cb2-14">  y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jrandom.bernoulli(sub, expit(beta0 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> beta))</span>
<span id="cb2-15"></span>
<span id="cb2-16">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (y, X)</span></code></pre></div>
</div>
<p>An interesting side-note here is that I’ve generated the design matrix <img src="https://latex.codecogs.com/png.latex?X"> to have standard Gaussian columns. This is <em>not</em> a benign choice as <img src="https://latex.codecogs.com/png.latex?n"> gets big. With <em>very</em> high probability, the columns of <img src="https://latex.codecogs.com/png.latex?X"> will be almost<sup>8</sup> orthonormal, which means that this is the best possible case for logistic regression. Generally speaking, design matrices from real<sup>9</sup> data have a great deal of co-linearity in them and so algorithms that perform well on random design matrices may perform less well on real data.</p>
<p>Ok, so let’s fit the model! I’m just going to use <img src="https://latex.codecogs.com/png.latex?N(0,1)"> priors on all of the <img src="https://latex.codecogs.com/png.latex?%5Cbeta">s.</p>
<div id="58eec67f" class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> functools <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> partial</span>
<span id="cb3-2">n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span></span>
<span id="cb3-3">p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb3-4"></span>
<span id="cb3-5">key <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jrandom.PRNGKey(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30127</span>)</span>
<span id="cb3-6">y, X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_data(key, n, p)</span>
<span id="cb3-7"></span>
<span id="cb3-8"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> log_posterior(beta: Array, X: Array, y: Array) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> Array:</span>
<span id="cb3-9">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> beta.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> X.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb3-10"></span>
<span id="cb3-11">    prob <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> expit(beta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> beta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:])</span>
<span id="cb3-12">    </span>
<span id="cb3-13">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (</span>
<span id="cb3-14">      jnp.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jnp.log(prob) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb3-15">      (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>y) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jnp.log1p(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>prob)) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> </span>
<span id="cb3-16">      <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jnp.dot(beta, beta)</span>
<span id="cb3-17">    )</span>
<span id="cb3-18"></span>
<span id="cb3-19"></span>
<span id="cb3-20">post_mean, H <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> laplace(</span>
<span id="cb3-21">  partial(log_posterior, X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> X, y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> y),</span>
<span id="cb3-22">  x0 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>jnp.zeros(X.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb3-23">)</span>
<span id="cb3-24"></span>
<span id="cb3-25">post_cov <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.linalg.inv(H)</span></code></pre></div>
</div>
<p>Let’s see how this performs relative to MCMC. To do that, I’m going to build and equivalent PyMC model.</p>
<div id="28ec7812" class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb4-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pymc <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pm</span>
<span id="cb4-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb4-4"></span>
<span id="cb4-5"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">with</span> pm.Model() <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> logistic_reg:</span>
<span id="cb4-6">  beta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pm.Normal(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'beta'</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, shape <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (p<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,))</span>
<span id="cb4-7">  linpred <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> beta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> pm.math.dot(np.array(X), beta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:])</span>
<span id="cb4-8">  </span>
<span id="cb4-9">  pm.Bernoulli(</span>
<span id="cb4-10">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"y"</span>, </span>
<span id="cb4-11">    p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pm.math.invlogit(linpred),</span>
<span id="cb4-12">    observed <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.array(y)</span>
<span id="cb4-13">  )</span>
<span id="cb4-14">  posterior <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pm.sample(</span>
<span id="cb4-15">    tune<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>, </span>
<span id="cb4-16">    draws<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>, </span>
<span id="cb4-17">    chains<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, </span>
<span id="cb4-18">    cores <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb4-19"></span>
<span id="cb4-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># I would like to apologize for the following pandas code.</span></span>
<span id="cb4-21">tmp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pm.summary(posterior)</span>
<span id="cb4-22">tmp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tmp.assign(</span>
<span id="cb4-23">  laplace_mean <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> post_mean, </span>
<span id="cb4-24">  laplace_sd <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.sqrt(np.diag(post_cov)), </span>
<span id="cb4-25">  Variable <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tmp.index</span>
<span id="cb4-26">)[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Variable"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mean"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"laplace_mean"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sd"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"laplace_sd"</span>]]</span>
<span id="cb4-27"></span>
<span id="cb4-28"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">with</span> pd.option_context(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'display.precision'</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>):</span>
<span id="cb4-29">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(tmp)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (4 chains in 1 job)
NUTS: [beta]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 5 seconds.</code></pre>
</div>
<div class="cell-output cell-output-display">

<style>
    /* Turns off some styling */
    progress {
        /* gets rid of default border in Firefox and Opera. */
        border: none;
        /* Needs to be in here for Safari polyfill so background images work as expected. */
        background-size: auto;
    }
    progress:not([value]), progress:not([value])::-webkit-progress-bar {
        background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);
    }
    .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
        background: #F44336;
    }
</style>
</div>
<div class="cell-output cell-output-display">

    <div>
      <progress value="2000" class="" max="2000" style="width:300px; height:20px; vertical-align: middle;"></progress>
      100.00% [2000/2000 00:01&lt;00:00 Sampling chain 0, 0 divergences]
    </div>
    
</div>
<div class="cell-output cell-output-display">

<style>
    /* Turns off some styling */
    progress {
        /* gets rid of default border in Firefox and Opera. */
        border: none;
        /* Needs to be in here for Safari polyfill so background images work as expected. */
        background-size: auto;
    }
    progress:not([value]), progress:not([value])::-webkit-progress-bar {
        background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);
    }
    .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
        background: #F44336;
    }
</style>
</div>
<div class="cell-output cell-output-display">

    <div>
      <progress value="2000" class="" max="2000" style="width:300px; height:20px; vertical-align: middle;"></progress>
      100.00% [2000/2000 00:01&lt;00:00 Sampling chain 1, 0 divergences]
    </div>
    
</div>
<div class="cell-output cell-output-display">

<style>
    /* Turns off some styling */
    progress {
        /* gets rid of default border in Firefox and Opera. */
        border: none;
        /* Needs to be in here for Safari polyfill so background images work as expected. */
        background-size: auto;
    }
    progress:not([value]), progress:not([value])::-webkit-progress-bar {
        background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);
    }
    .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
        background: #F44336;
    }
</style>
</div>
<div class="cell-output cell-output-display">

    <div>
      <progress value="2000" class="" max="2000" style="width:300px; height:20px; vertical-align: middle;"></progress>
      100.00% [2000/2000 00:01&lt;00:00 Sampling chain 2, 0 divergences]
    </div>
    
</div>
<div class="cell-output cell-output-display">

<style>
    /* Turns off some styling */
    progress {
        /* gets rid of default border in Firefox and Opera. */
        border: none;
        /* Needs to be in here for Safari polyfill so background images work as expected. */
        background-size: auto;
    }
    progress:not([value]), progress:not([value])::-webkit-progress-bar {
        background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);
    }
    .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
        background: #F44336;
    }
</style>
</div>
<div class="cell-output cell-output-display">

    <div>
      <progress value="2000" class="" max="2000" style="width:300px; height:20px; vertical-align: middle;"></progress>
      100.00% [2000/2000 00:01&lt;00:00 Sampling chain 3, 0 divergences]
    </div>
    
</div>
<div class="cell-output cell-output-stdout">
<pre><code>        Variable   mean  laplace_mean     sd  laplace_sd
beta[0]  beta[0]  0.249         0.234  0.235       0.229
beta[1]  beta[1] -0.964        -0.914  0.435       0.428
beta[2]  beta[2] -1.710        -1.616  0.490       0.470
beta[3]  beta[3] -0.975        -0.926  0.423       0.416
beta[4]  beta[4] -0.739        -0.716  0.470       0.457
beta[5]  beta[5]  0.637         0.609  0.481       0.475</code></pre>
</div>
</div>
<p>Well that’s just dandy! Everything is pretty<sup>10</sup> close. With 1000 observations it’s identical to within 3 decimal places.</p>
</section>
<section id="speeding-up-the-computation" class="level3">
<h3 class="anchored" data-anchor-id="speeding-up-the-computation">Speeding up the computation</h3>
<p>So that is all well and dandy. Let’s see how long it takes. I am interested in big models, so for this demonstration, I’m going to take <img src="https://latex.codecogs.com/png.latex?p%20=%205000">. That said, I’m not enormously interested in seeing how this scales in <img src="https://latex.codecogs.com/png.latex?n"> (linearly), so I’m going to keep that at the fairly unrealistic value of <img src="https://latex.codecogs.com/png.latex?n=1000">.</p>
<div id="4fda5153" class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb7-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> timeit</span>
<span id="cb7-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> hess_test(key, n, p):</span>
<span id="cb7-3">  y, X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_data(key, n , p)</span>
<span id="cb7-4">  inpu <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.ones(p<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb7-5">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> hess():</span>
<span id="cb7-6">    f <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> partial(log_posterior, X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> X, y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> y)</span>
<span id="cb7-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jacfwd(grad(f))(inpu)</span>
<span id="cb7-8">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> hess</span>
<span id="cb7-9"></span>
<span id="cb7-10">n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span></span>
<span id="cb7-11">p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5000</span></span>
<span id="cb7-12">key, sub <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jrandom.split(key)</span>
<span id="cb7-13">hess <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> hess_test(sub, n , p)</span>
<span id="cb7-14">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(hess, number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb7-15"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Autodiff: The average time with p = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>p<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>mean(times)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .3f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">(+/-</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>std(times)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .3f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">)"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Autodiff: The average time with p = 5000 is  3.222(+/- 0.379)</code></pre>
</div>
</div>
<p>That doesn’t seem too bad, but the thing is that I know quite a lot about logistic regression. It is, after all, logistic regression. In particular, I know that the Hessian has the form <img src="https://latex.codecogs.com/png.latex?%0AH%20=%20X%5ET%20D(%5Cbeta)%20X,%0A"> where <img src="https://latex.codecogs.com/png.latex?D(%5Cbeta)"> is a <em>diagonal</em> <img src="https://latex.codecogs.com/png.latex?n%20%5Ctimes%20n"> matrix that has a known form.</p>
<p>This means that the appropriate comparison is between the speed of the autodiff Hessian and how long it takes to compute <img src="https://latex.codecogs.com/png.latex?X%5ETDX"> for some diagonal matrix X.</p>
<p>Now you might be worried here that I didn’t explicitly save <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?y">, so the comparison might not be fair. But my friends, I have good news! All of that awkward <code>key, sub = jrandom.split(key)</code> malarkey has the singular advantage that if I pass the same key into <code>make_data</code> that I used for <code>hess_test</code>, I will get <em>the exact same generated data</em>! So let’s do that. For <img src="https://latex.codecogs.com/png.latex?D"> I’m just going to pick a random matrix. This will give a <em>minimum</em> achievable time for computing the Hessian (as it doesn’t do the extra derivatives to compute <img src="https://latex.codecogs.com/png.latex?D"> properly).</p>
<p>If you look at that code and say <em>but Daniel you used the wrong multiplication operator</em>, you can convince yourself that <code>X * d[:, None]</code> gives the same result as <code>jnp.diag(d) @ X</code>. But it will be faster. And it uses such beautiful<sup>11</sup> broadcasting rules.</p>
<div id="252d7828" class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb9-1">y, X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_data(key, n , p)</span>
<span id="cb9-2">key, sub <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jrandom.split(key)</span>
<span id="cb9-3">d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jrandom.normal(sub, shape <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (n,))</span>
<span id="cb9-4">mm <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: X.T <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> (X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> d[:, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>])</span>
<span id="cb9-5">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(mm, number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb9-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Symbolic (minimum possible): The average time with p = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>p<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>mean(times)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .3f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">(+/-</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>std(times)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .3f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">)"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Symbolic (minimum possible): The average time with p = 5000 is  0.766(+/- 0.014)</code></pre>
</div>
</div>
<p>Oh dear. The symbolic derivative<sup>12</sup> is <em>a lot</em> faster.</p>
<p>Speeding this up is going to take a little work. The first thing we can try is to explicitly factor out the linear transformation. Instead of passing in the function <img src="https://latex.codecogs.com/png.latex?f">, we could pass in <img src="https://latex.codecogs.com/png.latex?g"> such that <img src="https://latex.codecogs.com/png.latex?%0Af(x)%20=%20g(Ax),%0A"> for some matrix <img src="https://latex.codecogs.com/png.latex?A">. In our case <img src="https://latex.codecogs.com/png.latex?g"> would have a diagonal Hessian. Let’s convince ourselves of that with a small example. As well as dropping the intercept, I’ve also dropped the prior term.</p>
<div id="3a085e1e" class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb11-1">g <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> prob: jnp.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jnp.log(prob) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>y) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jnp.log1p(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>prob))</span>
<span id="cb11-2">key, sub2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jrandom.split(key)</span>
<span id="cb11-3">y, X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_data(sub2, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span>
<span id="cb11-4">b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> jnp.ones(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span>
<span id="cb11-5">D <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jacfwd(grad(g))(b)</span>
<span id="cb11-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(D, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[[0.7 0.  0.  0.  0. ]
 [0.  3.7 0.  0.  0. ]
 [0.  0.  7.8 0.  0. ]
 [0.  0.  0.  4.9 0. ]
 [0.  0.  0.  0.  0.3]]</code></pre>
</div>
</div>
<p>Wonderfully diagonal!</p>
<div id="9afd020c" class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb13-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> hess2(g, A, x):</span>
<span id="cb13-2">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># </span></span>
<span id="cb13-3">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> x</span>
<span id="cb13-4">  D <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jacfwd(grad(g))(b)</span>
<span id="cb13-5">  H <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A.T <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> (A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jnp.diag(D)[:, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>])</span>
<span id="cb13-6">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> H</span>
<span id="cb13-7"></span>
<span id="cb13-8">y, X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_data(sub, n, p)</span>
<span id="cb13-9">g <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> prob: jnp.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jnp.log(prob) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>y) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jnp.log1p(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>prob))</span>
<span id="cb13-10">x0 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.ones(p)</span>
<span id="cb13-11">h2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: hess2(g, X, x0)</span>
<span id="cb13-12">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(h2, number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb13-13"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Separated Hessian: The average time with p = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>p<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>mean(times)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .3f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">(+/-</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>std(times)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .3f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">)"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Separated Hessian: The average time with p = 5000 is  0.975(+/- 0.163)</code></pre>
</div>
</div>
<p>Well that’s definitely better.</p>
<p>Now, we might be able to do even better than that if we notice that if we <em>know</em> that <img src="https://latex.codecogs.com/png.latex?D"> is diagonal, then we don’t need to compute the entire Hessian, we can simply compute the Hessian-vector product <img src="https://latex.codecogs.com/png.latex?%0A%5Coperatorname%7Bdiag%7D(H)%20=%20H%201%20%5Cqquad%20%5Ctext%7Biff%20%7DH%5Ctext%7B%20is%20diagonal%7D,%0A"> where <img src="https://latex.codecogs.com/png.latex?1"> is the vector of ones. Just as we computed the Hessian by computing the Jacobian of the gradient, it turns out that we can compute a Hessian-vector product by computing a Jacobian-vector product <code>jvp</code> of the gradient. The syntax in JAX is, honestly, a little bit gross here<sup>13</sup>, but if you want to read up about how it works <a href="https://jax.readthedocs.io/en/latest/notebooks/autodiff_cookbook.html#hessian-vector-products-using-both-forward-and-reverse-mode">the docs are really nice</a><sup>14</sup>.</p>
<p>This observation is going to be useful because <code>jacfwd</code> computes the Jacobian by computing <img src="https://latex.codecogs.com/png.latex?n"> Jacobian-vector products. So this observation is saving us <em>a lot</em> of work.</p>
<div id="279128f8" class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb15-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> jvp</span>
<span id="cb15-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> hess3(g, A, x):</span>
<span id="cb15-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># </span></span>
<span id="cb15-4">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> x</span>
<span id="cb15-5">  D <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jvp(grad(g), (b,), (jnp.ones(n),))[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb15-6">  H <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A.T <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> (A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> D[:, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>])</span>
<span id="cb15-7">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> H</span>
<span id="cb15-8"></span>
<span id="cb15-9">h3 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: hess3(g, X, x0)</span>
<span id="cb15-10">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(h3, number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb15-11"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Compressed Hessian: The average time with p = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>p<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>mean(times)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .3f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">(+/-</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>std(times)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .3f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">)"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Compressed Hessian: The average time with p = 5000 is  0.879(+/- 0.082)</code></pre>
</div>
</div>
<p>This is very nearly as fast as the lower bound for the symbolic Hessian. There must be a way to use this.</p>
</section>
</section>
<section id="can-we-automate-this-parsing-jax-expressions" class="level2">
<h2 class="anchored" data-anchor-id="can-we-automate-this-parsing-jax-expressions">Can we automate this? Parsing JAX expressions</h2>
<p>So that was all lovely and shiny. But the problem is that it was very labor intensive. I had to recognize both that you could write <img src="https://latex.codecogs.com/png.latex?f(x)%20=%20g(Ax)"> <em>and</em> that <img src="https://latex.codecogs.com/png.latex?g"> would have a diagonal Hessian. That is, frankly, hard to do in general.</p>
<p>If I was building a system like <a href="https://bambinos.github.io/bambi/"><code>bambi</code></a> or <a href="https://paul-buerkner.github.io/brms/"><code>brms</code></a> or <a href="https://www.r-inla.org/">INLA</a><sup>15</sup>, where the model classes are relatively constrained, it’s possible to automate both of these steps by analyzing the formula. But all I get is a function. So I need to work out how I can automatically parse the code for <img src="https://latex.codecogs.com/png.latex?f"> to find <img src="https://latex.codecogs.com/png.latex?g"> and <img src="https://latex.codecogs.com/png.latex?A"> (if they exist) and to determine if <img src="https://latex.codecogs.com/png.latex?g"> would have a sparse Hessian.</p>
<p>We can’t do this easily with a standard Python program, but we can do it with JAX because it traces through the code and provides an <em>intermediate representation</em> (IR)of the code. This is, incidentally, the first step that any code compiler uses. The beauty of an IR is that it abstracts away all of the specific user choices and provides a clean, logical representation of the program that can then be executed or, in our case, manipulated. These manipulations are, for example, key to how JAX computes gradients, how it JIT-compiles code, and how it does <code>vmap</code> and <code>pmap</code> operations.</p>
<p>But we can do more types of manipulations. In particular, we can take the IR and transform it into another IR that produces the same output in a more efficient way. Anyone who’s familiar with compiled programming languages should know that this happens under that hood. They also probably know that compiler writers are small gods and I’m definitely not going to approach anywhere near that level of complexity in a blog post.</p>
<p>So what are our tasks. First of all we need to trace our way through the JAX code. We can do this by using the intermediate representation that JAX uses when transforming functions: the <code>jaxpr</code>s.</p>
<section id="getting-to-know-jaxprs" class="level3">
<h3 class="anchored" data-anchor-id="getting-to-know-jaxprs">Getting to know jaxprs</h3>
<p>A <code>jaxpr</code> is a transformation of the python code for evaluating a JAX function into a human-readable language that maps types primitives through the code. We can view it using the <code>jax.make_jaxpr</code> function.</p>
<p>Let’s look at the log-posterior function after partial evaluation to make it a single-input function.</p>
<div id="bd5d6df2" class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb17-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> make_jaxpr</span>
<span id="cb17-2"></span>
<span id="cb17-3">lp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> partial(log_posterior, X<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>X, y<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>y)</span>
<span id="cb17-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(make_jaxpr(lp)(jnp.ones(p<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>{ lambda a:f32[1000,5000] b:bool[1000]; c:f32[5001]. let
    d:f32[1] = dynamic_slice[slice_sizes=(1,)] c 0
    e:f32[] = squeeze[dimensions=(0,)] d
    f:f32[5000] = dynamic_slice[slice_sizes=(5000,)] c 1
    g:f32[1000] = dot_general[dimension_numbers=(([1], [0]), ([], []))] a f
    h:f32[1000] = add e g
    i:f32[1000] = logistic h
    j:f32[1000] = log i
    k:f32[1000] = convert_element_type[new_dtype=float32 weak_type=False] b
    l:f32[1000] = mul k j
    m:i32[1000] = convert_element_type[new_dtype=int32 weak_type=True] b
    n:i32[1000] = sub 1 m
    o:f32[1000] = neg i
    p:f32[1000] = log1p o
    q:f32[1000] = convert_element_type[new_dtype=float32 weak_type=False] n
    r:f32[1000] = mul q p
    s:f32[1000] = add l r
    t:f32[] = reduce_sum[axes=(0,)] s
    u:f32[] = dot_general[dimension_numbers=(([0], [0]), ([], []))] c c
    v:f32[] = mul 0.5 u
    w:f32[] = sub t v
  in (w,) }</code></pre>
</div>
</div>
<p>This can be a bit tricky to read the first time you see it, but it’s waaaay easier that X86-Assembly or the LLVM-IR. Basically it says that to compute <code>lp(jnp.ones(p+1))</code> you need to run through this program. The first line gives the inputs (with types and shapes). Then after the <code>let</code> statement, there are a the commands that need to be executed in order. A single execution looks like</p>
<pre><code>d:f32[1] = dynamic_slice[slice_sizes=(1,)] c 0</code></pre>
<p>This can be read as <em>take a slice of vector <code>c</code> starting at <code>0</code> of shape <code>(1,)</code> and store it in <code>d</code>, which is a 1-dimensional 32bit float array</em>. (The line after turns it into a scalar.)</p>
<p>All of the other lines can be read similarly. A good trick, if you don’t recognize the primitive<sup>16</sup>, is to <a href="https://jax.readthedocs.io/en/latest/jax.lax.html">look it up</a> in the <code>jax.lax</code> sub-module.</p>
<p>Even a cursory read of this suggests that we could probably save a couple of tedious operations by passing in an integer <code>y</code>, rather than a Boolean <code>y</code>, but hey. That really shouldn’t cost much.</p>
<p>While the <code>jaxpr</code> is lovely, it’s a whole lot easier to reason about if you see it graphically. We can plot the <em>expression graph</em> using<sup>17</sup> the <code>haiku</code><sup>18</sup> package from DeepMind.</p>
<div id="bf3d6463" class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb20-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> haiku.experimental <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> to_dot</span>
<span id="cb20-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> graphviz</span>
<span id="cb20-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> re</span>
<span id="cb20-4">f <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> partial(log_posterior, X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> X, y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> y)</span>
<span id="cb20-5">dot <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> to_dot(f)(jnp.ones(p<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb20-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#Strip out an obnoxious autogen title</span></span>
<span id="cb20-7">dot <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> re.sub(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;&lt;.*&gt;&gt;;"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\"</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\"</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>, dot, count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, flags<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>re.DOTALL)</span>
<span id="cb20-8">graphviz.Source(dot)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="11">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2024-05-08-laplace/laplace_files/figure-html/cell-12-output-1.svg" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>To understand this graph, the orange-y boxes represent the input for <code>lp</code>. In this case it’s an array of floating point digits with <img src="https://latex.codecogs.com/png.latex?p+1%20=%205001">. The purple boxes are constants that are used in the function. Some of these are signed integers (s32), there’s a matrix (f32[1000, 5000]), and there is even a literal (0.5). The blue box is the output. That leaves the yellow boxes, which have all of the operations, with inward arrows indicating the inputs and outward arrows indicating the outputs.</p>
</section>
<section id="splitting-the-expression-graph-into-linear-and-non-linear-subgraphs" class="level3">
<h3 class="anchored" data-anchor-id="splitting-the-expression-graph-into-linear-and-non-linear-subgraphs">Splitting the expression graph into linear and non-linear subgraphs</h3>
<p>Looking at the graph, we can split it into three sub-graphs. The first sub-graph can be found by tracing an input value through the graph until it hits either a non-linear operation or the end of the graph. The sub-graph is created by making the penultimate node in that sequence an output node. This sub-graph represents a linear transformations.</p>
<div id="a5a99eed" class="cell" data-execution_count="12">
<div class="cell-output cell-output-display" data-execution_count="12">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2024-05-08-laplace/laplace_files/figure-html/cell-13-output-1.svg" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Once we have reached the end of the linear portion, we can link the output from this operation to the input of the non-linear sub-graph.</p>
<div id="56993076" class="cell" data-execution_count="13">
<div class="cell-output cell-output-display" data-execution_count="13">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2024-05-08-laplace/laplace_files/figure-html/cell-14-output-1.svg" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Finally, we have one more trace of <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> through the graph that is non-linear. We could couple this into the non-linear graph at the cost of having to reason about a bivariate Hessian (which will become complex).</p>
<div id="96b1b043" class="cell" data-execution_count="14">
<div class="cell-output cell-output-display" data-execution_count="14">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2024-05-08-laplace/laplace_files/figure-html/cell-15-output-1.svg" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>The two non-linear portions of the graph are merged through a trivial linear combination.</p>
<div id="b473bcd8" class="cell" data-execution_count="15">
<div class="cell-output cell-output-display" data-execution_count="15">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2024-05-08-laplace/laplace_files/figure-html/cell-16-output-1.svg" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="step-right-up-to-play-the-game-of-the-year-is-it-linear" class="level3">
<h3 class="anchored" data-anchor-id="step-right-up-to-play-the-game-of-the-year-is-it-linear">Step right up to play the game of the year: Is it linear?</h3>
<p>So we need to trace through these jaxprs and keep a record of which of the sub-graphs they are in (and we do not know how many sub-graphs there will be!). We also need to note if an operation is linear or not. This is not something that is automatically provided. We need to store this information ourselves.</p>
<p>The only way I can think to do this is to make a set of all of the JAX operations that I know to be linear. Many of them are just index or type stuff. Unfortunately, there is a more complex class of operation, which are only <em>sometimes</em> linear.</p>
<p>The first example we see of this is</p>
<pre><code>g:f32[1000] = dot_general[
      dimension_numbers=(((1,), (0,)), ((), ()))
      precision=None
      preferred_element_type=None
    ] a f</code></pre>
<p>This line represents the general tensor dot product between <code>a</code> and <code>f</code>. In this case, <code>a</code> is constant input (the matrix <img src="https://latex.codecogs.com/png.latex?X">) while <code>f</code> is a linear transformation of the input (<code>beta[1:]</code>), so the resulting step is linear. However, there is a second <code>dot_general</code> in the code, which occurs at</p>
<pre><code>u:f32[] = dot_general[
      dimension_numbers=(((0,), (0,)), ((), ()))
      precision=None
      preferred_element_type=None
    ] c c</code></pre>
<p>In this case, <code>c</code> is a linear transformation of the input (it’s just <code>beta</code>), but <code>dot(c,c)</code> is a quadratic function. Hence in this case, <code>dot_general</code> is not linear.</p>
<p>We are going to need to work out how to handle this case. In the folded code is a partial<sup>19</sup> list of the <code>jax.lax</code> primitives that are linear or occasionally linear. All in all there are 69 linear or no-op primitives and 7 sometimes linear primitives.</p>
<div id="de3a709f" class="cell" data-execution_count="16">
<details class="code-fold">
<summary>jax.lax linear and sometimes linear primitives</summary>
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb23-1">jax_linear <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {</span>
<span id="cb23-2">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'add'</span>,</span>
<span id="cb23-3">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'bitcast_convert_type'</span>,</span>
<span id="cb23-4">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'broadcast'</span>,</span>
<span id="cb23-5">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'broadcast_in_dim'</span>,</span>
<span id="cb23-6">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'broadcast_shapes'</span>,</span>
<span id="cb23-7">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'broadcast_to_rank'</span>,</span>
<span id="cb23-8">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'clz'</span>,</span>
<span id="cb23-9">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'collapse'</span>,</span>
<span id="cb23-10">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'complex'</span>,</span>
<span id="cb23-11">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'concatenate'</span>,</span>
<span id="cb23-12">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'conj'</span>,</span>
<span id="cb23-13">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'convert_element_type'</span>,</span>
<span id="cb23-14">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'dtype'</span>,</span>
<span id="cb23-15">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'dtypes'</span>,</span>
<span id="cb23-16">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'dynamic_slice'</span>,</span>
<span id="cb23-17">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'expand_dims'</span>,</span>
<span id="cb23-18">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'full'</span>,</span>
<span id="cb23-19">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'full_like'</span>,</span>
<span id="cb23-20">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'imag'</span>,</span>
<span id="cb23-21">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'neg'</span>,</span>
<span id="cb23-22">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pad'</span>,</span>
<span id="cb23-23">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'padtype_to_pads'</span>,</span>
<span id="cb23-24">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'real'</span>,</span>
<span id="cb23-25">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'reduce'</span>,</span>
<span id="cb23-26">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'reshape'</span>,</span>
<span id="cb23-27">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rev'</span>,</span>
<span id="cb23-28">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rng_bit_generator'</span>,</span>
<span id="cb23-29">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rng_uniform'</span>,</span>
<span id="cb23-30">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'select'</span>,</span>
<span id="cb23-31">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'select_n'</span>,</span>
<span id="cb23-32">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'squeeze'</span>,</span>
<span id="cb23-33">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sub'</span>,</span>
<span id="cb23-34">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'transpose'</span>,</span>
<span id="cb23-35">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'zeros_like_array'</span>,</span>
<span id="cb23-36">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'GatherDimensionNumbers'</span>,</span>
<span id="cb23-37">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'GatherScatterMode'</span>,</span>
<span id="cb23-38">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ScatterDimensionNumbers'</span>,</span>
<span id="cb23-39">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'dynamic_index_in_dim'</span>,</span>
<span id="cb23-40">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'dynamic_slice'</span>,</span>
<span id="cb23-41">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'dynamic_slice_in_dim'</span>,</span>
<span id="cb23-42">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'dynamic_update_index_in_dim'</span>,</span>
<span id="cb23-43">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'dynamic_update_slice'</span>,</span>
<span id="cb23-44">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'dynamic_update_slice_in_dim'</span>,</span>
<span id="cb23-45">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'gather'</span>,</span>
<span id="cb23-46">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'index_in_dim'</span>,</span>
<span id="cb23-47">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'index_take'</span>,</span>
<span id="cb23-48">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'reduce_sum'</span>,</span>
<span id="cb23-49">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'scatter'</span>,</span>
<span id="cb23-50">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'scatter_add'</span>,</span>
<span id="cb23-51">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'slice'</span>,</span>
<span id="cb23-52">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'slice_in_dim'</span>,</span>
<span id="cb23-53">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'conv'</span>,</span>
<span id="cb23-54">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'conv_dimension_numbers'</span>,</span>
<span id="cb23-55">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'conv_general_dilated'</span>,</span>
<span id="cb23-56">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'conv_general_permutations'</span>,</span>
<span id="cb23-57">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'conv_general_shape_tuple'</span>,</span>
<span id="cb23-58">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'conv_shape_tuple'</span>,</span>
<span id="cb23-59">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'conv_transpose'</span>,</span>
<span id="cb23-60">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'conv_transpose_shape_tuple'</span>,</span>
<span id="cb23-61">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'conv_with_general_padding'</span>,</span>
<span id="cb23-62">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'cumsum'</span>,</span>
<span id="cb23-63">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'fft'</span>,</span>
<span id="cb23-64">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'all_gather'</span>,</span>
<span id="cb23-65">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'all_to_all'</span>,</span>
<span id="cb23-66">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'axis_index'</span>,</span>
<span id="cb23-67">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ppermute'</span>,</span>
<span id="cb23-68">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pshuffle'</span>,</span>
<span id="cb23-69">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'psum'</span>,</span>
<span id="cb23-70">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'psum_scatter'</span>,</span>
<span id="cb23-71">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pswapaxes'</span>,</span>
<span id="cb23-72">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'xeinsum'</span></span>
<span id="cb23-73">}</span>
<span id="cb23-74"></span>
<span id="cb23-75">jax_sometimes_linear <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> { </span>
<span id="cb23-76">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'batch_matmul'</span>,</span>
<span id="cb23-77">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'dot'</span>,</span>
<span id="cb23-78">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'dot_general'</span>,</span>
<span id="cb23-79">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mul'</span></span>
<span id="cb23-80"> }</span>
<span id="cb23-81">jax_first_linear <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {</span>
<span id="cb23-82">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'div'</span></span>
<span id="cb23-83"> }</span>
<span id="cb23-84">jax_last_linear <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {</span>
<span id="cb23-85">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'custom_linear_solve'</span>,</span>
<span id="cb23-86">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'triangular_solve'</span>,</span>
<span id="cb23-87">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'tridiagonal_solve'</span></span>
<span id="cb23-88"> }</span></code></pre></div>
</details>
</div>
<p>All of the <em>sometimes linear</em> operations are linear as long as only one of their arguments depends on the function inputs. For both <code>div</code> and the various linear solves, the position of the input-dependent argument is restricted to one of the two positions.</p>
<div class="callout callout-style-simple callout-note">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>A more JAX-native way to deal with this is to think of how the <code>transpose</code> operation works. Essentially, it has the same dimension as the function argument, but evaluates to <code>None</code> when the operation isn’t linear in that variable. But I had already done all of this before I got there and at some point truly you’ve gotta stop making your blog post more complicated.</p>
</div>
</div>
</div>
</section>
<section id="tracing-through-the-jaxprs" class="level3">
<h3 class="anchored" data-anchor-id="tracing-through-the-jaxprs">Tracing through the jaxprs</h3>
<p>In order to split our graph into appropriate sub-graphs we need to trace through the <code>jaxpr</code> and keep track of every variable and if it depends on linear or non-linear parts.</p>
<p>For simplicity, consider the following expression graph for computing <code>lambda x, y: 0.5*(x+y)</code>.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2024-05-08-laplace/execution_graph.png" class="img-fluid figure-img"></p>
<figcaption>An expression graph for computing <code>lambda x, y: 0.5*(x+y)</code>. The blue rectangles are input variables, the rectangle square is a literal constants, and the green oval is the output node. (Yes I know the haiku colours are different. Sue me.)</figcaption>
</figure>
</div>
<p>This figure corresponds roughly to the jaxpr</p>
<div id="2d3dd5e6" class="cell" data-execution_count="17">
<div class="cell-output cell-output-stdout">
<pre><code>{ lambda ; a:f32[] b:f32[]. let c:f32[] = add a b; d:f32[] = mul 0.5 c in (d,) }</code></pre>
</div>
</div>
<p>For each node, the graph tells us</p>
<ul>
<li>its unique identifier (internally<sup>20</sup> JAX uses integers)</li>
<li>which equation generated the value</li>
<li>which nodes are its parents in the graph (the input(s) to the equation)</li>
<li>whether or not this node depends on the inputs. This is useful for ignoring non-linearities that just apply to the constants bound to the jaxpr.</li>
</ul>
<p>We can record this information in a dataclass.</p>
<div id="ebcee419" class="cell" data-execution_count="18">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb25-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> dataclasses <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> dc</span>
<span id="cb25-2"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@dc.dataclass</span></span>
<span id="cb25-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">class</span> Node:</span>
<span id="cb25-4">  number: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span></span>
<span id="cb25-5">  eqn: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span></span>
<span id="cb25-6">  parents: List[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dc.field(default_factory<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>)</span>
<span id="cb25-7">  depends_on_input: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">bool</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span></code></pre></div>
</div>
<p>Now we can build up our graph with all of the side information we need. The format of a <code>jaxpr</code> places the constant inputs in the first node, followed by the non-constant inputs (which I’m calling the input variables). For simplicity, I am assuming that there is only one input variable.</p>
<div class="callout callout-style-simple callout-note">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>You’re going to look at this code and say <em>girl why are you using a dictionary, this is clearly a list</em>. And you would be correct except for one little thing: I can’t guarantee that the <code>count</code> variables begin at <code>0</code>. They usually do. But one time they didn’t. What is <em>probably</em> true is that we could subtract off the first count from <code>constvars</code> or <code>invars</code> and we would have an ordinary list with the <code>count</code> variable corresponding to the input. But I’m not spelunking in the source code to ensure that <code>Literal</code> <code>Var</code>s can’t be reused etc. And anyway, this is not a performance-critical data structure.</p>
<p>I’m also relying heavily on dictionaries remembering key entry order, as the nodes are topographically sorted.</p>
</div>
</div>
</div>
<div id="39e76b59" class="cell" data-execution_count="19">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb26-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> jax.core <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> jcore</span>
<span id="cb26-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> make_jaxpr</span>
<span id="cb26-3"></span>
<span id="cb26-4">jpr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_jaxpr(lp)(jnp.ones(p<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb26-5"></span>
<span id="cb26-6">node_list <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {</span>
<span id="cb26-7">  const.count: Node(</span>
<span id="cb26-8">    number<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>const.count, </span>
<span id="cb26-9">    depends_on_input<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span></span>
<span id="cb26-10">  ) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> const <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> jpr.jaxpr.constvars</span>
<span id="cb26-11">}</span>
<span id="cb26-12"></span>
<span id="cb26-13">node_list <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|=</span> {</span>
<span id="cb26-14">  inval.count: Node(number<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>inval.count) </span>
<span id="cb26-15">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> inval <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> jpr.jaxpr.invars</span>
<span id="cb26-16">}</span>
<span id="cb26-17"></span>
<span id="cb26-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## For later, we need to know the node numbers that correspond</span></span>
<span id="cb26-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## to the constants and inputs</span></span>
<span id="cb26-20"></span>
<span id="cb26-21">consts_and_inputs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {node.number <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> node <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> node_list.values()}</span>
<span id="cb26-22"></span>
<span id="cb26-23">node_list <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|=</span> {</span>
<span id="cb26-24">  node.count: Node(</span>
<span id="cb26-25">    number<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>node.count,</span>
<span id="cb26-26">    eqn<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>j,</span>
<span id="cb26-27">    parents<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[</span>
<span id="cb26-28">      invar.count <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> invar <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> eqn.invars <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">isinstance</span>(invar, jcore.Literal)</span>
<span id="cb26-29">    ],</span>
<span id="cb26-30">  )</span>
<span id="cb26-31">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j, eqn <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(jpr.jaxpr.eqns)</span>
<span id="cb26-32">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> node <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> eqn.outvars</span>
<span id="cb26-33">}</span>
<span id="cb26-34"></span>
<span id="cb26-35"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> node <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> node_list.values():</span>
<span id="cb26-36">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(node.parents) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>:</span>
<span id="cb26-37">    node.depends_on_input <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>  <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">any</span>(</span>
<span id="cb26-38">      node_list[i].depends_on_input <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> node.parents</span>
<span id="cb26-39">    )</span>
<span id="cb26-40"></span>
<span id="cb26-41">node_list</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="19">
<pre><code>{0: Node(number=0, eqn=None, parents=[], depends_on_input=False),
 1: Node(number=1, eqn=None, parents=[], depends_on_input=False),
 2: Node(number=2, eqn=None, parents=[], depends_on_input=True),
 3: Node(number=3, eqn=0, parents=[2], depends_on_input=True),
 4: Node(number=4, eqn=1, parents=[3], depends_on_input=True),
 5: Node(number=5, eqn=2, parents=[2], depends_on_input=True),
 6: Node(number=6, eqn=3, parents=[0, 5], depends_on_input=True),
 7: Node(number=7, eqn=4, parents=[4, 6], depends_on_input=True),
 8: Node(number=8, eqn=5, parents=[7], depends_on_input=True),
 9: Node(number=9, eqn=6, parents=[8], depends_on_input=True),
 10: Node(number=10, eqn=7, parents=[1], depends_on_input=False),
 11: Node(number=11, eqn=8, parents=[10, 9], depends_on_input=True),
 12: Node(number=12, eqn=9, parents=[1], depends_on_input=False),
 13: Node(number=13, eqn=10, parents=[12], depends_on_input=False),
 14: Node(number=14, eqn=11, parents=[8], depends_on_input=True),
 15: Node(number=15, eqn=12, parents=[14], depends_on_input=True),
 16: Node(number=16, eqn=13, parents=[13], depends_on_input=False),
 17: Node(number=17, eqn=14, parents=[16, 15], depends_on_input=True),
 18: Node(number=18, eqn=15, parents=[11, 17], depends_on_input=True),
 19: Node(number=19, eqn=16, parents=[18], depends_on_input=True),
 20: Node(number=20, eqn=17, parents=[2, 2], depends_on_input=True),
 21: Node(number=21, eqn=18, parents=[20], depends_on_input=True),
 22: Node(number=22, eqn=19, parents=[19, 21], depends_on_input=True)}</code></pre>
</div>
</div>
<p>Now let’s identify which equations are linear and which aren’t.</p>
<div id="9cfeb635" class="cell" data-execution_count="20">
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb28-1">linear_eqn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(jpr.jaxpr.eqns)</span>
<span id="cb28-2"></span>
<span id="cb28-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> node <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> node_list.values():</span>
<span id="cb28-4">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> node.eqn <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>:</span>
<span id="cb28-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">continue</span></span>
<span id="cb28-6"></span>
<span id="cb28-7">  prim <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jpr.jaxpr.eqns[node.eqn].primitive.name</span>
<span id="cb28-8">  </span>
<span id="cb28-9">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> prim <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> jax_linear:</span>
<span id="cb28-10">    linear_eqn[node.eqn] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb28-11">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">elif</span> prim <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> jax_sometimes_linear:</span>
<span id="cb28-12">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># this is a check for being called once</span></span>
<span id="cb28-13">    linear_eqn[node.eqn] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (</span>
<span id="cb28-14">      <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(</span>
<span id="cb28-15">        node_list[i].depends_on_input <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> node.parents</span>
<span id="cb28-16">      ) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb28-17">    )</span>
<span id="cb28-18">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">elif</span> prim <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> jax_first_linear:</span>
<span id="cb28-19">    linear_eqn[node.eqn] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (</span>
<span id="cb28-20">      node_list[node.parents[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]].depends_on_input </span>
<span id="cb28-21">      <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">and</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">any</span>(node_list[pa].depends_on_input <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> pa <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> node.parents[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:])</span>
<span id="cb28-22">    )</span>
<span id="cb28-23">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">elif</span> prim <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> jax_last_linear:</span>
<span id="cb28-24">    linear_eqn[node.eqn] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (</span>
<span id="cb28-25">      node_list[node.parents[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]].depends_on_input </span>
<span id="cb28-26">      <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">and</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">any</span>(node_list[pa].depends_on_input <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> pa <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> node.parents[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb28-27">    )</span>
<span id="cb28-28">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">elif</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> node_list[i].depends_on_input <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> node.parents):</span>
<span id="cb28-29">    linear_eqn[node.eqn] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Constants are linear</span></span></code></pre></div>
</div>
<p>The only messy thing<sup>21</sup> in here is dealing with the sometimes linear primitives. If I was sure that every JAX primitive was guaranteed to have only two inputs, this could be simplified, but sadly I don’t know that.</p>
</section>
<section id="partitioning-the-graph" class="level3">
<h3 class="anchored" data-anchor-id="partitioning-the-graph">Partitioning the graph</h3>
<p>Now it’s time for the fun: partitioning the problem into sub-graphs. To do this, we need to think about what rules we want to encode.</p>
<p>The <em>first rule</em> is that every input for an equation or sub-graph needs to be either a constant, the function input, or the output of some other sub-graph that has already been computed. This means that if we find an equation with an input that doesn’t satisfy these conditions, we need to split the sub-graph that it’s in into two sub-graphs.</p>
<p>The <em>second rule</em> is the only exception to the first rule. A sub-graph can have inputs from non-linear sub-graphs if an only if it contains a sequence of <code>sum</code> or <code>sub</code> terms and it finishes with the terminal node. This covers the common case where the function we are taking the Hessian of is a linear combination of independent functions. For instance, <code>log_posterior(beta) = log_likelihood(beta) + log_prior(beta)</code>. In this case we can compute the Hessians for the non-linear sub-expressions and then combine them.</p>
<p>The <em>third rule</em> is that every independent use of the function input is the opportunity to start a new tree. (It may merge with a known tree.)</p>
<p>And that’s it. Should be simple enough to implement.</p>
<p>I’m feeling like running this bad boy backwards, so let’s do that. One of the assumption we have made is that the function we are tracing has a single output and that is always in the last node and defined in the last equation. So first off, lets get our terminal combination expressions.</p>
<div id="bf20c90a" class="cell" data-execution_count="21">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb29-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Find the terminal combination expressions</span></span>
<span id="cb29-2">terminal_expressions <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sum"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sub"</span>}</span>
<span id="cb29-3">comb_eqns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb29-4"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> eqn <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> jpr.jaxpr.eqns[::<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]:</span>
<span id="cb29-5">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">any</span>(</span>
<span id="cb29-6">    node_list[a.count].depends_on_input </span>
<span id="cb29-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> a <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> eqn.invars </span>
<span id="cb29-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">isinstance</span>(a, jcore.Literal)</span>
<span id="cb29-9">  )  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">and</span> (</span>
<span id="cb29-10">    eqn.primitive.name <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> terminal_expressions</span>
<span id="cb29-11">  ):</span>
<span id="cb29-12">    comb_eqns.append(eqn)</span>
<span id="cb29-13">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span>:</span>
<span id="cb29-14">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">break</span></span>
<span id="cb29-15"></span>
<span id="cb29-16"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(comb_eqns)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[a:f32[] = sub b c]</code></pre>
</div>
</div>
<p>Now for each of the terminal combination expressions, we will trace their parent back until we run out of tree. While we are doing this, we can also keep track of runs of linear operations. We also have to visit each equation once, so we need to keep track of our visited equations. This is, whether we like it or not, a depth-first search. It’s always a bloody depth-first search, isn’t it.</p>
<p>So what we are going to do is go through each of the combiner nodes and trace the graph down from it and note the path and it’s parent. If we run into a portion of the graph we have already traced, we will note that for later. These paths will either be merged or, if the ancestral path from that point is all linear, will be used as a linear sub-graph.</p>
<div id="d1c69405" class="cell" data-execution_count="22">
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb31-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> dfs(visited, graph, subgraph, to_check, node):</span>
<span id="cb31-2">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> node <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> visited:</span>
<span id="cb31-3">    to_check.add(node)</span>
<span id="cb31-4">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span>:</span>
<span id="cb31-5">    visited.add(node)</span>
<span id="cb31-6">    subgraph.add(graph[node].eqn)</span>
<span id="cb31-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> neighbour <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> graph[node].parents:</span>
<span id="cb31-8">      dfs(visited, graph, subgraph, to_check, neighbour)</span>
<span id="cb31-9">  </span>
<span id="cb31-10"></span>
<span id="cb31-11">visited <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> consts_and_inputs</span>
<span id="cb31-12">to_check <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>()</span>
<span id="cb31-13">subgraphs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb31-14"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> ce <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> comb_eqns:</span>
<span id="cb31-15">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> v <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> (a <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> a <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> ce.invars <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">isinstance</span>(a, jcore.Literal)):</span>
<span id="cb31-16">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> v.count <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> visited:</span>
<span id="cb31-17">      subgraphs.append(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>())</span>
<span id="cb31-18">      dfs(visited, node_list, subgraphs[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], to_check, v.count)</span>
<span id="cb31-19"></span>
<span id="cb31-20">to_check <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> to_check.difference(consts_and_inputs)</span>
<span id="cb31-21"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Subgraphs: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>subgraphs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb31-22"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Danger nodes: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>to_check<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Subgraphs: [{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}, {17, 18}]
Danger nodes: set()</code></pre>
</div>
</div>
<p>The <code>to_check</code> nodes are only dangerous insofar as we need to make sure that if they are in one of the linear sub-graphs they are terminal nodes of a sub-graph. To that end, let’s make the linear sub-graphs.</p>
<div id="594c17e8" class="cell" data-execution_count="23">
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb33-1">linear_subgraph <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb33-2">nonlin_subgraph <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb33-3">n_eqns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(jpr.jaxpr.eqns)</span>
<span id="cb33-4"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> subgraph <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> subgraphs:</span>
<span id="cb33-5">  <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(subgraph)</span>
<span id="cb33-6">  split <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">next</span>(</span>
<span id="cb33-7">    (</span>
<span id="cb33-8">      i <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i, lin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(linear_eqn) </span>
<span id="cb33-9">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> lin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">and</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> subgraph</span>
<span id="cb33-10">    )</span>
<span id="cb33-11">  )</span>
<span id="cb33-12">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">any</span>(chk <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> subgraph <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> chk <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> to_check):</span>
<span id="cb33-13">    split <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>(</span>
<span id="cb33-14">      split, </span>
<span id="cb33-15">      <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>(chk <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> chk <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> to_check <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> chk <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> subgraph)</span>
<span id="cb33-16">    )</span>
<span id="cb33-17"></span>
<span id="cb33-18">  linear_subgraph.append(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(subgraph.intersection(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(split)))))</span>
<span id="cb33-19">  nonlin_subgraph.append(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(subgraph.intersection(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(split, n_eqns)))))</span>
<span id="cb33-20"></span>
<span id="cb33-21"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Linear subgraphs: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>linear_subgraph<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb33-22"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Nonlinear subgraphs: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>nonlin_subgraph<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
{17, 18}
Linear subgraphs: [[0, 1, 2, 3, 4], []]
Nonlinear subgraphs: [[5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], [17, 18]]</code></pre>
</div>
</div>
<p>The only interesting thing here is making sure that if there is a linear node in the graph that was visited twice, it is the terminal node of the linear graph. The better thing would be to actually split the linear graph, but I’m getting a little bit sick of this post and I don’t really want to deal with multiple linear sub-graphs. So I shan’t. But hopefully it’s relatively clear how you would do that.</p>
<p>In this case it’s pretty clear that we are ok.</p>
<div id="c986a424" class="cell" data-execution_count="24">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb35-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">any</span>(linear_eqn[node_list[j].eqn] <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> to_check)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="24">
<pre><code>False</code></pre>
</div>
</div>
</section>
<section id="putting-it-together" class="level3">
<h3 class="anchored" data-anchor-id="putting-it-together">Putting it together</h3>
<p>Well that’s a nice script that does what I want. Now let’s put it together in a function. I’m going to give it the <em>very</em> unspecific name <code>transform_jaxpr</code> because sometimes you’ve gotta annoy your future self.</p>
<div id="c0f4a607" class="cell" data-execution_count="25">
<details class="code-fold">
<summary>Show the code</summary>
<div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb37-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> transform_jaxpr(</span>
<span id="cb37-2">  jaxpr: jcore.ClosedJaxpr</span>
<span id="cb37-3">) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> Tuple[List[Set[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>]], List[Set[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>]], List[jcore.JaxprEqn]]:</span>
<span id="cb37-4">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(jpr.in_avals) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb37-5">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(jpr.out_avals) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb37-6"></span>
<span id="cb37-7">  <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> core <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> jcore</span>
<span id="cb37-8"></span>
<span id="cb37-9">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## 1. Extract the tree and its relevant behavior</span></span>
<span id="cb37-10">  node_list <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {</span>
<span id="cb37-11">    const.count: Node(</span>
<span id="cb37-12">      number<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>const.count, </span>
<span id="cb37-13">      depends_on_input<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span></span>
<span id="cb37-14">    ) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> const <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> jpr.jaxpr.constvars</span>
<span id="cb37-15">  }</span>
<span id="cb37-16"></span>
<span id="cb37-17">  node_list <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|=</span> {</span>
<span id="cb37-18">    inval.count: Node(number<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>inval.count) </span>
<span id="cb37-19">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> inval <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> jpr.jaxpr.invars</span>
<span id="cb37-20">  }</span>
<span id="cb37-21"></span>
<span id="cb37-22">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## For later, we need to know the node numbers that correspond</span></span>
<span id="cb37-23">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## to the constants and inputs</span></span>
<span id="cb37-24"></span>
<span id="cb37-25">  consts_and_inputs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {node.number <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> node <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> node_list.values()}</span>
<span id="cb37-26"></span>
<span id="cb37-27">  node_list <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|=</span> {</span>
<span id="cb37-28">    node.count: Node(</span>
<span id="cb37-29">      number<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>node.count,</span>
<span id="cb37-30">      eqn<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>j,</span>
<span id="cb37-31">      parents<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[</span>
<span id="cb37-32">        invar.count <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> invar <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> eqn.invars <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">isinstance</span>(invar, jcore.Literal)</span>
<span id="cb37-33">      ],</span>
<span id="cb37-34">    )</span>
<span id="cb37-35">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j, eqn <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(jpr.jaxpr.eqns)</span>
<span id="cb37-36">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> node <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> eqn.outvars</span>
<span id="cb37-37">  }</span>
<span id="cb37-38"></span>
<span id="cb37-39">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> node <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> node_list.values():</span>
<span id="cb37-40">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(node.parents) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>:</span>
<span id="cb37-41">      node.depends_on_input <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>  <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">any</span>(</span>
<span id="cb37-42">        node_list[i].depends_on_input <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> node.parents</span>
<span id="cb37-43">      )</span>
<span id="cb37-44"></span>
<span id="cb37-45">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## 2. Identify which equations are linear_eqn</span></span>
<span id="cb37-46"></span>
<span id="cb37-47">  linear_eqn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(jpr.jaxpr.eqns)</span>
<span id="cb37-48"></span>
<span id="cb37-49">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> node <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> node_list.values():</span>
<span id="cb37-50">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> node.eqn <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>:</span>
<span id="cb37-51">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">continue</span></span>
<span id="cb37-52"></span>
<span id="cb37-53">    prim <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jpr.jaxpr.eqns[node.eqn].primitive.name</span>
<span id="cb37-54">    </span>
<span id="cb37-55">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> prim <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> jax_linear:</span>
<span id="cb37-56">      linear_eqn[node.eqn] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb37-57">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">elif</span> prim <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> jax_sometimes_linear:</span>
<span id="cb37-58">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># this is a check for being called once</span></span>
<span id="cb37-59">      linear_eqn[node.eqn] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (</span>
<span id="cb37-60">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(</span>
<span id="cb37-61">          node_list[i].depends_on_input <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> node.parents</span>
<span id="cb37-62">        ) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb37-63">      )</span>
<span id="cb37-64">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">elif</span> prim <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> jax_first_linear:</span>
<span id="cb37-65">      linear_eqn[node.eqn] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (</span>
<span id="cb37-66">        node_list[node.parents[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]].depends_on_input </span>
<span id="cb37-67">        <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">and</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">any</span>(node_list[pa].depends_on_input <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> pa <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> node.parents[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:])</span>
<span id="cb37-68">      )</span>
<span id="cb37-69">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">elif</span> prim <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> jax_last_linear:</span>
<span id="cb37-70">      linear_eqn[node.eqn] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (</span>
<span id="cb37-71">        node_list[node.parents[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]].depends_on_input </span>
<span id="cb37-72">        <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">and</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">any</span>(node_list[pa].depends_on_input <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> pa <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> node.parents[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb37-73">      )</span>
<span id="cb37-74">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">elif</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> node_list[i].depends_on_input <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> node.parents):</span>
<span id="cb37-75">      linear_eqn[node.eqn] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Constants are linear</span></span>
<span id="cb37-76"></span>
<span id="cb37-77">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">##3. Find all the terminal expressions</span></span>
<span id="cb37-78">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Find the terminal combination expressions</span></span>
<span id="cb37-79">  terminal_expressions <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sum"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sub"</span>}</span>
<span id="cb37-80">  comb_eqns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb37-81">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> eqn <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> jpr.jaxpr.eqns[::<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]:</span>
<span id="cb37-82">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">any</span>(</span>
<span id="cb37-83">      node_list[a.count].depends_on_input </span>
<span id="cb37-84">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> a <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> eqn.invars </span>
<span id="cb37-85">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">isinstance</span>(a, jcore.Literal)</span>
<span id="cb37-86">    )  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">and</span> (</span>
<span id="cb37-87">      eqn.primitive.name <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> terminal_expressions</span>
<span id="cb37-88">    ):</span>
<span id="cb37-89">      comb_eqns.append(eqn)</span>
<span id="cb37-90">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span>:</span>
<span id="cb37-91">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">break</span></span>
<span id="cb37-92">  </span>
<span id="cb37-93">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## 4. Identify the sub-graphs </span></span>
<span id="cb37-94">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> dfs(visited, graph, subgraph, to_check, node):</span>
<span id="cb37-95">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> node <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> visited:</span>
<span id="cb37-96">      to_check.add(node)</span>
<span id="cb37-97">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span>:</span>
<span id="cb37-98">      visited.add(node)</span>
<span id="cb37-99">      subgraph.add(graph[node].eqn)</span>
<span id="cb37-100">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> neighbour <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> graph[node].parents:</span>
<span id="cb37-101">        dfs(visited, graph, subgraph, to_check, neighbour)</span>
<span id="cb37-102">    </span>
<span id="cb37-103"></span>
<span id="cb37-104">  visited <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> consts_and_inputs</span>
<span id="cb37-105">  to_check <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>()</span>
<span id="cb37-106">  subgraphs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb37-107">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> ce <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> comb_eqns:</span>
<span id="cb37-108">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> v <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> (a <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> a <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> ce.invars <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">isinstance</span>(a, jcore.Literal)):</span>
<span id="cb37-109">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> v.count <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> visited:</span>
<span id="cb37-110">        subgraphs.append(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>())</span>
<span id="cb37-111">        dfs(visited, node_list, subgraphs[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], to_check, v.count)</span>
<span id="cb37-112"></span>
<span id="cb37-113">  to_check <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> to_check.difference(consts_and_inputs)</span>
<span id="cb37-114"></span>
<span id="cb37-115">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## 5. Find the linear sub-graphs</span></span>
<span id="cb37-116">  linear_subgraph <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb37-117">  nonlin_subgraph <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb37-118">  n_eqns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(jaxpr.eqns)</span>
<span id="cb37-119">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> subgraph <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> subgraphs:</span>
<span id="cb37-120">    split <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">next</span>(</span>
<span id="cb37-121">      (</span>
<span id="cb37-122">        i <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i, lin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(linear_eqn) </span>
<span id="cb37-123">        <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> lin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">and</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> subgraph</span>
<span id="cb37-124">      )</span>
<span id="cb37-125">    )</span>
<span id="cb37-126">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">any</span>(chk <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> subgraph <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> chk <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> to_check):</span>
<span id="cb37-127">      split <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>(</span>
<span id="cb37-128">        split, </span>
<span id="cb37-129">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>(chk <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> chk <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> to_check <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> chk <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> subgraph)</span>
<span id="cb37-130">      )</span>
<span id="cb37-131"></span>
<span id="cb37-132">    linear_subgraph.append(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(subgraph.intersection(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(split)))))</span>
<span id="cb37-133">    nonlin_subgraph.append(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(subgraph.intersection(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(split, n_eqns)))))</span>
<span id="cb37-134">  </span>
<span id="cb37-135">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (linear_subgraph, nonlin_subgraph, comb_eqns)</span></code></pre></div>
</details>
</div>
<p>For one final sense check, let’s compare these outputs to the original jaxpr.</p>
<div id="22f15cd4" class="cell" data-execution_count="26">
<div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb38-1"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j, lin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(linear_subgraph):</span>
<span id="cb38-2">  <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Linear: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>j<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb38-3">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> lin:</span>
<span id="cb38-4">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(jpr.eqns[i])</span>
<span id="cb38-5"></span>
<span id="cb38-6"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j, nlin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(nonlin_subgraph):</span>
<span id="cb38-7">  <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Nonlinear: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>j<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb38-8">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> nlin:</span>
<span id="cb38-9">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(jpr.eqns[i])</span>
<span id="cb38-10"></span>
<span id="cb38-11"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Combination equations"</span>)</span>
<span id="cb38-12"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> eqn <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> comb_eqns:</span>
<span id="cb38-13">  <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(eqn)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Linear: 0
a:f32[1] = dynamic_slice[slice_sizes=(1,)] b 0
a:f32[] = squeeze[dimensions=(0,)] b
a:f32[5000] = dynamic_slice[slice_sizes=(5000,)] b 1
a:f32[1000] = dot_general[dimension_numbers=(([1], [0]), ([], []))] b c
a:f32[1000] = add b c
Linear: 1
Nonlinear: 0
a:f32[1000] = logistic b
a:f32[1000] = log b
a:f32[1000] = convert_element_type[new_dtype=float32 weak_type=False] b
a:f32[1000] = mul b c
a:i32[1000] = convert_element_type[new_dtype=int32 weak_type=True] b
a:i32[1000] = sub 1 b
a:f32[1000] = neg b
a:f32[1000] = log1p b
a:f32[1000] = convert_element_type[new_dtype=float32 weak_type=False] b
a:f32[1000] = mul b c
a:f32[1000] = add b c
a:f32[] = reduce_sum[axes=(0,)] b
Nonlinear: 1
a:f32[] = dot_general[dimension_numbers=(([0], [0]), ([], []))] b b
a:f32[] = mul 0.5 b
Combination equations
a:f32[] = sub b c</code></pre>
</div>
</div>
<p>Comparing to the original jaxpr, we see it has the same information (the formatting is a bit unfortunate, as the original <code>__repr__</code> keeps track of the links between things, but what can you do?).</p>
<div id="c85d5f8b" class="cell" data-execution_count="27">
<div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb40-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(jpr)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>{ lambda a:f32[1000,5000] b:bool[1000]; c:f32[5001]. let
    d:f32[1] = dynamic_slice[slice_sizes=(1,)] c 0
    e:f32[] = squeeze[dimensions=(0,)] d
    f:f32[5000] = dynamic_slice[slice_sizes=(5000,)] c 1
    g:f32[1000] = dot_general[dimension_numbers=(([1], [0]), ([], []))] a f
    h:f32[1000] = add e g
    i:f32[1000] = logistic h
    j:f32[1000] = log i
    k:f32[1000] = convert_element_type[new_dtype=float32 weak_type=False] b
    l:f32[1000] = mul k j
    m:i32[1000] = convert_element_type[new_dtype=int32 weak_type=True] b
    n:i32[1000] = sub 1 m
    o:f32[1000] = neg i
    p:f32[1000] = log1p o
    q:f32[1000] = convert_element_type[new_dtype=float32 weak_type=False] n
    r:f32[1000] = mul q p
    s:f32[1000] = add l r
    t:f32[] = reduce_sum[axes=(0,)] s
    u:f32[] = dot_general[dimension_numbers=(([0], [0]), ([], []))] c c
    v:f32[] = mul 0.5 u
    w:f32[] = sub t v
  in (w,) }</code></pre>
</div>
</div>
</section>
<section id="making-sub-functions" class="level3">
<h3 class="anchored" data-anchor-id="making-sub-functions">Making sub-functions</h3>
<p>Now that we have the graph partitioned, let’s make our sub-functions. We do this by manipulating the <code>jaxpr</code> and then <em>closing</em> over the literals.</p>
<p>There are a few ways we can do this. We could build completely new <a href="https://github.com/google/jax/blob/c3f5af7d46b803da346aa7644cbeea3cb73b4c10/jax/_src/core.py#L297"><code>JaxprEqn</code></a> objects from the existing Jaxpr. But honestly, that is just annoying, so instead I’m just going to modify the <a href="https://jax.readthedocs.io/en/latest/notebooks/Writing_custom_interpreters_in_Jax.html">basic, but incomplete, parser</a><sup>22</sup>.</p>
<p>The only modification from the standard <code>eval_jaxpr</code> is that we are explicitly specifying the <code>invars</code> in order to overwrite the standard ones. This relies on the lexicographic ordering of the jaxpr expression graph.</p>
<div id="20360fac" class="cell" data-execution_count="28">
<div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb42-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> typing <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Callable</span>
<span id="cb42-2"></span>
<span id="cb42-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> core <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> jcore</span>
<span id="cb42-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> lax</span>
<span id="cb42-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax._src.util <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> safe_map</span>
<span id="cb42-6"></span>
<span id="cb42-7"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> eval_subjaxpr(</span>
<span id="cb42-8">  <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>args,</span>
<span id="cb42-9">  jaxpr: jcore.Jaxpr, </span>
<span id="cb42-10">  consts: List[jcore.Literal], </span>
<span id="cb42-11">  subgraph: List[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>], </span>
<span id="cb42-12">  invars: List[jcore.Var]</span>
<span id="cb42-13">):</span>
<span id="cb42-14"></span>
<span id="cb42-15"></span>
<span id="cb42-16">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(invars) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(args)</span>
<span id="cb42-17">  </span>
<span id="cb42-18">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Mapping from variable -&gt; value</span></span>
<span id="cb42-19">  env <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {}</span>
<span id="cb42-20">  </span>
<span id="cb42-21">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> read(var):</span>
<span id="cb42-22">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Literals are values baked into the Jaxpr</span></span>
<span id="cb42-23">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(var) <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> jcore.Literal:</span>
<span id="cb42-24">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> var.val</span>
<span id="cb42-25"></span>
<span id="cb42-26">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> env[var]</span>
<span id="cb42-27"></span>
<span id="cb42-28">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> write(var, val):</span>
<span id="cb42-29">    env[var] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb42-30"></span>
<span id="cb42-31">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># We need to bind the input to the sub-function</span></span>
<span id="cb42-32">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># to the environment.</span></span>
<span id="cb42-33">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># We only need to write the consts that appear</span></span>
<span id="cb42-34">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># in our sub-graph, but that's more bookkeeping</span></span>
<span id="cb42-35">  safe_map(write, invars, args)</span>
<span id="cb42-36">  safe_map(write, jaxpr.constvars, consts)</span>
<span id="cb42-37"></span>
<span id="cb42-38">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Loop through equations and evaluate primitives using `bind`</span></span>
<span id="cb42-39">  outvars <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb42-40">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> subgraph:</span>
<span id="cb42-41">    eqn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jaxpr.eqns[j]</span>
<span id="cb42-42">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Read inputs to equation from environment</span></span>
<span id="cb42-43">    invals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> safe_map(read, eqn.invars)  </span>
<span id="cb42-44">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># `bind` is how a primitive is called</span></span>
<span id="cb42-45">    outvals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> eqn.primitive.bind(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>invals, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span>eqn.params)</span>
<span id="cb42-46">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Primitives may return multiple outputs or not</span></span>
<span id="cb42-47">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> eqn.primitive.multiple_results: </span>
<span id="cb42-48">      outvals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [outvals]</span>
<span id="cb42-49">    outvars <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> [eqn.outvars]</span>
<span id="cb42-50">    safe_map(write, eqn.outvars, outvals) </span>
<span id="cb42-51">  </span>
<span id="cb42-52">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> safe_map(read, outvars[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span></code></pre></div>
</div>
<p>The final thing we should do is combine our transformation with this evaluation module to convert a function into a sequence of callable sub-functions. I am making <em>liberal</em> uses of <code>lambda</code>s to close over variables that the user should never see (like the sub-graph!). Jesus loves closures and so do I.</p>
<!-- There is some cost here. Because I am too lazy to work out the minimal set of 
inputs for each sub-expression, I'm going to just make sure that all of the computed
values are available to every function. This is obviously inefficient, but sometimes
you just need to write blog-quality code.

The other thing that comes out a bit tricky^[And relies _very_ heavily on the topological ordering of the equations!] here is that each returned here 
has a different number of arguments. The first linear function takes `jpr.invars` as 
its input. The second takes those _and_ the output of the first linear function.
For each subsequent function, this list becomes longer. This is partly unavoidable,
but with some more clever bookkeeping it wouldn't be too hard to produce minimal
input sets. But once again: blog code. 

But if you're going to write code that does weird shit like this, the least you can
do is remember to catch it and throw a useful error down the line. -->
<div id="827d3dec" class="cell" data-execution_count="29">
<div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb43-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> decompose(fun: Callable, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>args) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> Tuple[List[Callable], List[Callable], List[jcore.Var]]:</span>
<span id="cb43-2">  <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> functools <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> partial</span>
<span id="cb43-3">  <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> make_jaxpr</span>
<span id="cb43-4"></span>
<span id="cb43-5">  jpr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_jaxpr(fun)(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>args)</span>
<span id="cb43-6">  linear_subgraph, nonlin_subgraph, comb_eqns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> transform_jaxpr(jpr)</span>
<span id="cb43-7"></span>
<span id="cb43-8">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(linear_subgraph) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(nonlin_subgraph)</span>
<span id="cb43-9">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(jpr.jaxpr.invars) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Functions must only have one input"</span></span>
<span id="cb43-10"></span>
<span id="cb43-11">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> get_invars(sub: List[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>]) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> List[jcore.Var]:</span>
<span id="cb43-12">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># There is an implicit assumption everywhere in this post </span></span>
<span id="cb43-13">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># that each sub-function only has one non-constant input</span></span>
<span id="cb43-14">    </span>
<span id="cb43-15">    min_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jpr.jaxpr.eqns[sub[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]].outvars[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].count</span>
<span id="cb43-16">    literal_ceil <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jpr.jaxpr.invars[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].count</span>
<span id="cb43-17">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> sub:</span>
<span id="cb43-18">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> v <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> jpr.jaxpr.eqns[j].invars:</span>
<span id="cb43-19">        <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> (</span>
<span id="cb43-20">          <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">isinstance</span>(v, jcore.Literal) <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">and</span></span>
<span id="cb43-21">          v.count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> literal_ceil <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">and</span> </span>
<span id="cb43-22">          v.count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> min_count</span>
<span id="cb43-23">        ):</span>
<span id="cb43-24">          <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> [v]</span>
<span id="cb43-25">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">raise</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">Exception</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Somehow you can't find any invars"</span>)</span>
<span id="cb43-26">    </span>
<span id="cb43-27"></span>
<span id="cb43-28">  lin_funs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb43-29">  nlin_funs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb43-30">  nlin_inputs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb43-31">  lin_outputs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb43-32">  nlin_outputs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb43-33"></span>
<span id="cb43-34">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> lin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> linear_subgraph:</span>
<span id="cb43-35">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(lin) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>:</span>
<span id="cb43-36">      lin_funs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> [<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>]</span>
<span id="cb43-37">      lin_outputs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> [jpr.jaxpr.invars[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].count]</span>
<span id="cb43-38">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">elif</span> jpr.jaxpr.eqns[lin[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]].primitive.multiple_results:</span>
<span id="cb43-39">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">raise</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">Exception</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"This code doesn't deal with multiple outputs from subgraph </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>lin<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb43-40">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span>:</span>
<span id="cb43-41">      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># find </span></span>
<span id="cb43-42">      lin_outputs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> [jpr.jaxpr.eqns[lin[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]].outvars[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].count]</span>
<span id="cb43-43">      lin_funs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> [</span>
<span id="cb43-44">        partial(eval_subjaxpr,</span>
<span id="cb43-45">          jaxpr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jpr.jaxpr, </span>
<span id="cb43-46">          consts <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jpr.literals, </span>
<span id="cb43-47">          subgraph <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lin, </span>
<span id="cb43-48">          invars <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_invars(lin)</span>
<span id="cb43-49">        )</span>
<span id="cb43-50">      ]</span>
<span id="cb43-51">      </span>
<span id="cb43-52">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> nlin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> nonlin_subgraph:</span>
<span id="cb43-53">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(nlin) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>:</span>
<span id="cb43-54">      nlin_funs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> [<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>]</span>
<span id="cb43-55">      nlin_inputs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> [<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb43-56">      nlin_outputs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> [<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>]</span>
<span id="cb43-57">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">elif</span> jpr.jaxpr.eqns[nlin[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]].primitive.multiple_results:</span>
<span id="cb43-58">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">raise</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">Exception</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"This code doesn't deal with multiple outputs from subgraph </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>nlin<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb43-59">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span>:</span>
<span id="cb43-60">      invar <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_invars(nlin)</span>
<span id="cb43-61">      nlin_inputs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> [lin_outputs.index(invar.count) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> invar.count <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> lin_outputs <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb43-62">      nlin_outputs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> [jpr.jaxpr.eqns[nlin[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]].outvars[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].count]</span>
<span id="cb43-63">      nlin_funs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> [</span>
<span id="cb43-64">        partial(eval_subjaxpr,</span>
<span id="cb43-65">          jaxpr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jpr.jaxpr, </span>
<span id="cb43-66">          consts <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jpr.literals, </span>
<span id="cb43-67">          subgraph <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> nlin, </span>
<span id="cb43-68">          invars <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_invars(nlin)</span>
<span id="cb43-69">        )</span>
<span id="cb43-70">      ]</span>
<span id="cb43-71"></span>
<span id="cb43-72">  combine <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(linear_subgraph)</span>
<span id="cb43-73">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># print(combine)</span></span>
<span id="cb43-74">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> eqn <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> comb_eqns:</span>
<span id="cb43-75">    combine[nlin_outputs.index(eqn.invars[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].count)] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span></span>
<span id="cb43-76">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> eqn.primitive.name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sub"</span>:</span>
<span id="cb43-77">      combine[nlin_outputs.index(eqn.invars[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>].count)] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span></span>
<span id="cb43-78">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span>:</span>
<span id="cb43-79">      combine[nlin_outputs.index(eqn.invars[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>].count)] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span></span>
<span id="cb43-80"></span>
<span id="cb43-81"></span>
<span id="cb43-82">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> lin_funs, nlin_funs, nlin_inputs, combine</span></code></pre></div>
</div>
</section>
</section>
<section id="making-the-hessian" class="level2">
<h2 class="anchored" data-anchor-id="making-the-hessian">Making the Hessian</h2>
<p>After <em>all</em> of this work, we can finally make a function that builds a Hessian!</p>
<p>We remember that if <img src="https://latex.codecogs.com/png.latex?f(x)%20=%20g(h(x))">, where <img src="https://latex.codecogs.com/png.latex?h(x)"> is linear and <img src="https://latex.codecogs.com/png.latex?g(x)"> is nonlinear, then the hessian of <img src="https://latex.codecogs.com/png.latex?f"> is</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AH_f(x)%20=%20J_h%5ET%20H_g%20J_h,%0A"> where <img src="https://latex.codecogs.com/png.latex?J_h"> is the Jacobian of <img src="https://latex.codecogs.com/png.latex?h">.</p>
<div id="d7f21525" class="cell" data-execution_count="30">
<div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb44-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> smarter_hessian(fun: Callable) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> Callable:</span>
<span id="cb44-2">  <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> jacfwd</span>
<span id="cb44-3">  <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> hessian</span>
<span id="cb44-4">  <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> jnp</span>
<span id="cb44-5">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> hess(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>args):</span>
<span id="cb44-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(args) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"This only works for functions with one input"</span></span>
<span id="cb44-7">    </span>
<span id="cb44-8">    lin_funs, nlin_funs, nlin_inputs, combine <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> decompose(fun, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>args)</span>
<span id="cb44-9">    n_in <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> args[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb44-10">    part <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.zeros((n_in, n_in))</span>
<span id="cb44-11"></span>
<span id="cb44-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> lin, nlin, nimp, comb <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(lin_funs, nlin_funs, nlin_inputs, combine):</span>
<span id="cb44-13">      </span>
<span id="cb44-14">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> lin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>:</span>
<span id="cb44-15">        lin_val <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lin(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>args)</span>
<span id="cb44-16">        jac <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jacfwd(lin)(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>args)</span>
<span id="cb44-17">      </span>
<span id="cb44-18"></span>
<span id="cb44-19">      h_args <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (lin_val,) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> lin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span> args</span>
<span id="cb44-20">      hess <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> hessian(nlin)(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>h_args) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> nlin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span></span>
<span id="cb44-21"></span>
<span id="cb44-22">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> lin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">and</span> nlin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>:</span>
<span id="cb44-23">        part <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> comb <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (jac.T <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> (hess <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> jac))</span>
<span id="cb44-24">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">elif</span> lin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>:</span>
<span id="cb44-25">        part <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> comb <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jac.T <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> jac</span>
<span id="cb44-26">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">elif</span> nlin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>:</span>
<span id="cb44-27">        part <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> comb <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> hess</span>
<span id="cb44-28">      </span>
<span id="cb44-29">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> part</span>
<span id="cb44-30">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> hess</span></code></pre></div>
</div>
<p>After all of that, let’s see if this works!</p>
<div id="431cf1b2" class="cell" data-execution_count="31">
<div class="sourceCode cell-code" id="cb45" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb45-1">mode_jax, H_jax <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> laplace(</span>
<span id="cb45-2">  partial(log_posterior, X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> X, y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> y),</span>
<span id="cb45-3">  x0 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>jnp.zeros(X.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb45-4">)</span>
<span id="cb45-5"></span>
<span id="cb45-6">H_smarter <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> smarter_hessian(partial(log_posterior, X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> X, y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> y))(mode_jax)</span>
<span id="cb45-7"></span>
<span id="cb45-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"The error is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>jnp<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(H_jax <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> H_smarter)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>tolist()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">!"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>The error is 2.3684690404479625e-06!</code></pre>
</div>
</div>
<p>In single precision, that is good enough for government work.</p>
<section id="but-is-it-faster" class="level3">
<h3 class="anchored" data-anchor-id="but-is-it-faster">But is it faster?</h3>
<p>Now let’s take a look at whether we have actually saved any time.</p>
<div id="7907d5b4" class="cell" data-execution_count="32">
<div class="sourceCode cell-code" id="cb47" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb47-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> jax</span>
<span id="cb47-2">times_hess <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: jax.hessian(partial(log_posterior, X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> X, y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> y))(mode_jax), number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb47-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Full Hessian: The average time with p = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>p<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>mean(times_hess)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .3f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">(+/-</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>std(times_hess)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .3f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">)"</span>)</span>
<span id="cb47-4"></span>
<span id="cb47-5">times_smarter <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: smarter_hessian(partial(log_posterior, X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> X, y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> y))(mode_jax), number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb47-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Smarter Hessian: The average time with p = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>p<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>mean(times_smarter)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .3f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">(+/-</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>std(times_smarter)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .3f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">)"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Full Hessian: The average time with p = 5000 is  3.444(+/- 0.201)
Smarter Hessian: The average time with p = 5000 is  3.569(+/- 0.024)</code></pre>
</div>
</div>
<p>Well that didn’t make much of a difference. If anything, it’s a little bit slower. This is likely due to compiler operations that can be improved if you just lower the whole thing.</p>
</section>
<section id="but-you-forgot-the-diagonal-trick" class="level3">
<h3 class="anchored" data-anchor-id="but-you-forgot-the-diagonal-trick">But you forgot the diagonal trick</h3>
<p>That said, the decomposition into linear and non-linear parts was <em>not</em> the real source of the savings. If we assume the Hessian of the likelihood is diagonal, then we can indeed do a lot better!</p>
<p>The problem here is that while <code>smarter_hessian</code> worked for any<sup>23</sup> JAX-traceable function, we are now making a structural assumption. In theory, we could go through the JAX primitives and mark all of the ones that would (conditionally) lead to diagonal Hessians, but honestly I kinda want this bit of the post to be done. So I will leave that as an <em>exercise to the interested reader</em>.</p>
<div id="f8675103" class="cell" data-execution_count="33">
<div class="sourceCode cell-code" id="cb49" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb49-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> smart_hessian(fun: Callable) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> Callable:</span>
<span id="cb49-2">  <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> jacfwd</span>
<span id="cb49-3">  <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> hessian</span>
<span id="cb49-4">  <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> jnp</span>
<span id="cb49-5">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> hess(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>args):</span>
<span id="cb49-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(args) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"This only works for functions with one input"</span></span>
<span id="cb49-7">    </span>
<span id="cb49-8">    lin_funs, nlin_funs, nlin_inputs, combine <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> decompose(fun, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>args)</span>
<span id="cb49-9">    n_in <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> args[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb49-10">    part <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.zeros((n_in, n_in))</span>
<span id="cb49-11"></span>
<span id="cb49-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> lin, nlin, nimp, comb <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(lin_funs, nlin_funs, nlin_inputs, combine):</span>
<span id="cb49-13">      </span>
<span id="cb49-14">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> lin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>:</span>
<span id="cb49-15">        lin_val <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lin(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>args)</span>
<span id="cb49-16">        jac <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jacfwd(lin)(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>args)</span>
<span id="cb49-17">      </span>
<span id="cb49-18"></span>
<span id="cb49-19">      h_args <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (lin_val,) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> lin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span> args</span>
<span id="cb49-20">      D <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(grad(nlin), h_args, (jnp.ones_like(h_args[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]),))[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> nlin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span></span>
<span id="cb49-21"></span>
<span id="cb49-22">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> lin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">and</span> nlin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>:</span>
<span id="cb49-23"></span>
<span id="cb49-24">        part <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> comb <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (jac.T <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> (jac <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> D[:,<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>]))</span>
<span id="cb49-25">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">elif</span> lin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>:</span>
<span id="cb49-26">        part <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> comb <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jac.T <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> jac</span>
<span id="cb49-27">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">elif</span> nlin <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>:</span>
<span id="cb49-28">        part <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> comb <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jnp.diag(D)</span>
<span id="cb49-29">      </span>
<span id="cb49-30">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> part</span>
<span id="cb49-31">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> hess</span>
<span id="cb49-32"></span>
<span id="cb49-33"></span>
<span id="cb49-34"></span>
<span id="cb49-35">H_smart <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> smart_hessian(partial(log_posterior, X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> X, y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> y))(mode_jax)</span>
<span id="cb49-36"></span>
<span id="cb49-37"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"The error is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>jnp<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(H_jax <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> H_smart)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>tolist()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">!"</span>)</span>
<span id="cb49-38"></span>
<span id="cb49-39">times_smart <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: smart_hessian(partial(log_posterior, X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> X, y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> y))(mode_jax), number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb49-40"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Smart (diagonal-aware) Hessian: The average time with p = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>p<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>mean(times_smart)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .3f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">(+/-</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>std(times_smart)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .3f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">)"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>The error is 2.3684690404479625e-06!
Smart (diagonal-aware) Hessian: The average time with p = 5000 is  2.269(+/- 0.031)</code></pre>
</div>
</div>
<p>That is a proper saving!</p>
</section>
</section>
<section id="some-concluding-thoughts" class="level2">
<h2 class="anchored" data-anchor-id="some-concluding-thoughts">Some concluding thoughts</h2>
<p>Well, this post got out of control. I swear when I sat down I was just going to write a quick post about Laplace approximations. Ooops.</p>
<section id="the-power-of-compiler-optimizations" class="level3">
<h3 class="anchored" data-anchor-id="the-power-of-compiler-optimizations">The power of compiler optimizations</h3>
<p>I think what I’ve shown here is that one of the really powerful things about <em>compiled</em> languages like JAX is that you can perform a pile of code optimizations that can greatly improve their performance.</p>
<p>In the ideal world, this type of optimization should be <em>invisible</em> to the end user. Were I to do this seriously<sup>24</sup>, I would make sure that if the assumptions of the optimized code weren’t met, the behaviour would revert back to the standard <code>jax.hessian</code>.</p>
<p>Recognizing when to perform an optimization is, in reality, the whole art of this type of process. And it’s very hard. For this post, I was able to do that to automatically recognize the linear operation, but I didn’t try to find conditions that ensured the Hessian would be diagonal.</p>
</section>
<section id="sparsity-detection-and-sparse-autodiff" class="level3">
<h3 class="anchored" data-anchor-id="sparsity-detection-and-sparse-autodiff">Sparsity detection and sparse autodiff</h3>
<p>Would you believe that people have spent a lot of time studying the efficiency gains when you have things like sparse Hessians? There is, in fact, a massive literature on <em>sparse autodiff</em> and it is implemented in several autodiff libraries, including <a href="https://github.com/JuliaDiff/SparseDiffTools.jl">in Julia</a>.</p>
<p>Sparsity exploiting autodiff uses symbolic analysis of the expression tree for a function to identify when certain derivatives are going to be zero. For Hessians, it needs to identify when two variables have at most linear dependencies.</p>
<p>Once you have worked out the sparsity pattern, you need to do something with it. In the logistic case, it is diagonal, but in a lot of cases it will depend on more than one element of the latent representation. That is the Hessian will be sparse<sup>25</sup>, but it won’t be diagonal.</p>
<p>I guess the question is <em>can we generalist the observation if the Hessian is diagonal we only need to compute a single Hessian-vector product</em> to general sparsity structures.</p>
<p>In general, we won’t be able to get away with a single product and will instead need a specially constructed set of <img src="https://latex.codecogs.com/png.latex?k"> <em>probing vectors</em>, where <img src="https://latex.codecogs.com/png.latex?k"> is a number to be determined (that is hopefully <em>much</em> smaller than <img src="https://latex.codecogs.com/png.latex?n">). This set of vectors <img src="https://latex.codecogs.com/png.latex?s_k"> will have the special property that <img src="https://latex.codecogs.com/png.latex?%0A%5Csum_%7Bj=1%7D%5Ek%20s_j%20=%201.%0A"> This means that the non-zero elements of each probing vector corresponds to a disjoint grouping of the variables.</p>
<p>To do this, we need to construct our set of probing vectors in a very special way. Each <img src="https://latex.codecogs.com/png.latex?s_k"> will be a vector containing zeros and ones. The set of indices with <img src="https://latex.codecogs.com/png.latex?%5Bs_k%5D_j%20=%201"> have color <img src="https://latex.codecogs.com/png.latex?k">. The aim is to associate each index with a unique color in such a way that we can recover the algorithm. We can do this with a structurally symmetric orthogonal partition, which is detailed in <a href="http://www.ii.uib.no/~fredrikm/fredrik/papers/sirev2005.pdf">Section 4 of this great review article</a>.</p>
<p>Implementing<sup>26</sup> sparsity-aware autodiff Hessians does require some graph algorithms, and is frankly beyond the scope of my patience here. But it certainly is possible and you would get quite general performance from it.</p>
<p>Critically, because it reduces the computation of a <img src="https://latex.codecogs.com/png.latex?p%20%5Ctimes%20p"> dense Hessian matrix with <img src="https://latex.codecogs.com/png.latex?k"> Hessian-vector products, it is extremely well suited to modern GPU acceleration techniques!</p>
</section>
<section id="could-we-do-more" class="level3">
<h3 class="anchored" data-anchor-id="could-we-do-more">Could we do more?</h3>
<p>There are so many many many ways to improve the very simple symbolic reduction of the autodiff beyond the simple “identify $f(Ax)” strategy. For more complex cases, it might be necessary to relax the <em>only one input and only one output</em> assumption.</p>
<p>It also might be possible to chain multiple instances of this, although this would require a more complex Hessian chain rule. Nevertheless, the extra complexity might be balanced by savings form the applicable instances of sparse autodiff.</p>
<p>But probably the thing that <em>actually</em> annoys me in all of this is that we are constantly recomputing the Jacobian for the linear equation, which is fixed. A better implementation would consider implementing symbolic differentiation for linear sub-graphs, which should lead to even more savings.</p>
</section>
<section id="but-is-jax-the-right-framework-for-this" class="level3">
<h3 class="anchored" data-anchor-id="but-is-jax-the-right-framework-for-this">But is JAX the right framework for this?</h3>
<p>All of this was a fair bit of work so I’m tempted to throw myself at the sunk-cost fallacy and just declare it to be good. But there is a problem. Because JAX doesn’t do a symbolic transformation of the program (only a trace through paths associated with specific values), there is no guarantee that the sparsity pattern for <img src="https://latex.codecogs.com/png.latex?H"> remains the same at each step. And there is nothing wrong with that. It’s an expressive, exciting language.</p>
<p>But all of the code transformation to make a sparsity-exploiting Hessian doesn’t come for free. And the idea of having to do it again every time a Hessian is needed is … troubling. If we could guarantee that the sparsity pattern was static, then we could factor all of this complex parsing and coloring code away and just run it once for each problem.</p>
<p>Theoretically, we could do something like hashing on the jaxpr, but I’m not sure how much that would help.</p>
<p>Ideally, we could do this in a library that performs <em>symbolic</em> manipulations and can compile them into an expression graph. JAX is not quite<sup>27</sup> that language. An option for this type of symbolic manipulation would be <a href="https://aesara.readthedocs.io/en/latest/">Aesara</a>. It may even be possible to do it in <a href="https://github.com/stan-dev/stanc3">Stan</a>, but even my wandering mind doesn’t want to work out how to do this in OCaml.</p>


</section>
</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>I will never reveal how much. But it was most of it.↩︎</p></li>
<li id="fn2"><p>or on↩︎</p></li>
<li id="fn3"><p>that probably converges. Think of it like <img src="https://latex.codecogs.com/png.latex?f_n(%5Ctheta)%20=%20n%5E%7B-1%7D%20%5Cleft(%5Csum_%7Bi=1%7D%5En%20p(y_i%20%5Cmid%20%5Ctheta)%20+%20p(%5Ctheta)%5Cright)">, where <img src="https://latex.codecogs.com/png.latex?p(y_i%20%5Cmid%20%5Ctheta)"> is the likelihood and <img src="https://latex.codecogs.com/png.latex?p(%5Ctheta)"> is the prior.↩︎</p></li>
<li id="fn4"><p>The first-order term disappears because at the mode <img src="https://latex.codecogs.com/png.latex?x%5E*"> <img src="https://latex.codecogs.com/png.latex?%5Cnabla%20f(x%5E*)=0">↩︎</p></li>
<li id="fn5"><p>or has one dominant mode↩︎</p></li>
<li id="fn6"><p>Something isn’t always better than nothing but sometimes it is↩︎</p></li>
<li id="fn7"><p>You could say reproducible code but I won’t because that word means something pretty specific. I mean, this is not the place for a rant, but it is <em>very</em> difficult to write strictly reproducible code and I am frankly not even going to try to take a bite out of that particular onion.↩︎</p></li>
<li id="fn8"><p>The maths under this is very interesting and surprisingly accessible (in a very advanced sort of way). I guess it depends on what you think of as accessible, but it’s certainly much nicer than entropy and VC-classes. A lovely set of notes that cover everything you’ve ever wanted to know is <a href="https://arxiv.org/abs/1011.3027">here</a>↩︎</p></li>
<li id="fn9"><p>Unless someone’s been doing their design of experiments↩︎</p></li>
<li id="fn10"><p>With 100 observations, we expect our data-driven variation (aka the frequentist version) to be about one decimal place, so the Laplace approximation is accurate within that tolerance. In fact, clever maths types can analyse the error in the Laplace approximation and show that the error is about <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(n%5E%7B-1%7D)">, which is asymptotically much smaller than the sampling variability of <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(n%5E%7B-1/2%7D)">, which suggests that the error introduced by the Laplace approximation isn’t catastrophic. At least with enough data.↩︎</p></li>
<li id="fn11"><p>Be still my beating heart.↩︎</p></li>
<li id="fn12"><p>Ok. You caught me. They’re not technically the same model. The symbolic code doesn’t include an intercept. I just honestly cannot be arsed to do the very minor matrix algebra to add it in. Nor can I be arsed to add a column of ones to <code>X</code>.↩︎</p></li>
<li id="fn13"><p>So many tuples↩︎</p></li>
<li id="fn14"><p>This is in pretty stark contrast to the pytorch docs, which are shit. Be more like JAX.↩︎</p></li>
<li id="fn15"><p>INLA does this. Very explicitly. And a lot of other cool stuff. It doesn’t use autodiff though.↩︎</p></li>
<li id="fn16"><p>For example, there’s no call to <code>logistic</code> in the code, but a quick look at <code>jax.lax.logistic</code> shows that it’s the same thing as <code>expit</code>.↩︎</p></li>
<li id="fn17"><p>This basically <em>just works</em> as long as you’ve got <code>graphviz</code> installed on your system. And once you find the right regex to strip out the <em>terrible</em> auto-generated title.↩︎</p></li>
<li id="fn18"><p>You need to install the dev version, or else it renders a lot of <code>pjit</code>s where the <code>sum</code> and <code>sub</code>s are supposed to be.↩︎</p></li>
<li id="fn19"><p>If I wasn’t sure, I deleted them from the linear list. There were also <code>scatter_mul</code>, <code>reduce_window</code>, and <code>reduce_window_shape_tuple</code>, which are all sometimes linear but frankly I didn’t want to work out the logic.↩︎</p></li>
<li id="fn20"><p>The letters are <code>__repl__</code> magic↩︎</p></li>
<li id="fn21"><p>Lord I hate a big ‘if’/‘elif’ block. Just terrible. I should refactor but this is a weekend blog post not a work thing↩︎</p></li>
<li id="fn22"><p>It is very little extra work to deal with eg JIT’d primitives and that sort of stuff, but for the purpose of this post, let’s keep things as simple as possible.↩︎</p></li>
<li id="fn23"><p>With input/output restrictions↩︎</p></li>
<li id="fn24"><p>I am currently dressed like a sexy clown.↩︎</p></li>
<li id="fn25"><p>Most of the entries will be zero↩︎</p></li>
<li id="fn26"><p>The previous article goes for ease of implementation over speed. A faster and better algorithm, and a <em>very</em> detailed comparison of all of the available options can be found <a href="http://www.ii.uib.no/~fredrikm/fredrik/papers/SISC2007.pdf">here</a>. And I am not implementing that for a fucking blog.↩︎</p></li>
<li id="fn27"><p>And it’s not trying to. Their bread and butter is autodiff and what they’re doing is absolutely natural for that.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2024,
  author = {Simpson, Dan},
  title = {An Unexpected Detour into Partially Symbolic,
    Sparsity-Expoiting Autodiff; or {Lord} Won’t You Buy Me a {Laplace}
    Approximation},
  date = {2024-05-08},
  url = {https://dansblog.netlify.app/posts/2024-05-08-laplace/laplace.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2024" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2024. <span>“An Unexpected Detour into Partially Symbolic,
Sparsity-Expoiting Autodiff; or Lord Won’t You Buy Me a Laplace
Approximation.”</span> May 8, 2024. <a href="https://dansblog.netlify.app/posts/2024-05-08-laplace/laplace.html">https://dansblog.netlify.app/posts/2024-05-08-laplace/laplace.html</a>.
</div></div></section></div> ]]></description>
  <category>JAX</category>
  <category>Laplace approximation</category>
  <category>Sparse matrices</category>
  <category>Autodiff</category>
  <guid>https://dansblog.netlify.app/posts/2024-05-08-laplace/laplace.html</guid>
  <pubDate>Tue, 07 May 2024 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2024-05-08-laplace/hat.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Diffusion models; or Yet another way to sample from an arbitrary distribution</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2023-01-30-diffusion/diffusion.html</link>
  <description><![CDATA[ 





<p>The other day I went to the cinema and watched M3GAN, a true movie masterpiece<sup>1</sup> about the death and carnage that ensues when you simply train your extremely complex ML model and don’t do proper ethics work. And that, of course, made me want to write a little bit about something relatively hip, hop, and happening<sup>2</sup> in the ML/AI space. But, like, I’m not gonna be <em>that</em> on trend<sup>3</sup> because fuck that noise, so I’m gonna talk about diffusion models.</p>
<p>It’s worth noting that I know bugger all about diffusion models. But when they first came out, I had a quick look at how they worked and then promptly forgot about them because, let’s face it, I work on different things. But hey. If that’s not enough<sup>4</sup> knowledge to write a blog post, I don’t know what is.</p>
<p>And here’s the thing. Most of the time when I blog about something I know a lot about it. Sometimes too much. But this is not one of those times. There are <em>plenty</em> of resources on the internet if you want to learn about diffusions models from an expert. Oodles. But where else but here can you read the barely proof-read writing of a man who read a couple of papers yesterday?</p>
<p>And who doesn’t want<sup>5</sup> that?</p>
<section id="a-prelude-measure-transport-for-sampling-from-arbitrary-distributions" class="level2">
<h2 class="anchored" data-anchor-id="a-prelude-measure-transport-for-sampling-from-arbitrary-distributions">A prelude: Measure transport for sampling from arbitrary distributions</h2>
<p>One of the fundamental tasks in computational statistics is to sample from a probability distribution. There are millions of ways of doing this, but the most popular generic method is Markov chain Monte Carlo. But this is not the post about MCMC methods. I’ve already made <a href="https://dansblog.netlify.app/posts/2022-11-23-wrong-mcmc/wrong-mcmc.html">a post about MCMC methods</a>.</p>
<p>Instead, let’s focus on stranger ways to do it. In particular, let’s think about methods that, create a mapping <img src="https://latex.codecogs.com/png.latex?T:%20%5Cmathbb%7BR%7D%5Ed%20%5Crightarrow%20%5Cmathbb%7BR%7D%5Ed"> that may depend on some properties of the target distribution such that the following procedure constructs a sample <img src="https://latex.codecogs.com/png.latex?x%20%5Csim%20p(x)">:</p>
<ol type="1">
<li>Sample <img src="https://latex.codecogs.com/png.latex?u%20%5Csim%20p(u)"> for some known distribution <img src="https://latex.codecogs.com/png.latex?q(u)"></li>
<li>Set <img src="https://latex.codecogs.com/png.latex?x%20=%20T(u)"></li>
</ol>
<p>The general problem of starting with a distribution <img src="https://latex.codecogs.com/png.latex?q(%5Ccdot)"> and mapping it to another distribution <img src="https://latex.codecogs.com/png.latex?p(%5Ccdot)"> is an example of a problem known as <em>measure transport</em>. Transport problems have been studied by mathematicians for yonks. It turns out that there are an infinite number of mappings <img src="https://latex.codecogs.com/png.latex?T"> that will do the job, so it’s up to us to choose a good one.</p>
<p>Probably the most famous<sup>6</sup> transport problem is the <em>optimal transport problem</em> that was first studied by Monge and Kantorovich that tries to find a mapping <img src="https://latex.codecogs.com/png.latex?T"> that minimises <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D_%7Bx%20%5Csim%20q%7D(c(x,%20T(x)))%0A"> subject to the constraint that <img src="https://latex.codecogs.com/png.latex?T(x)%20%5Csim%20p"> whenever <img src="https://latex.codecogs.com/png.latex?x%20%5Csim%20q">, where <img src="https://latex.codecogs.com/png.latex?c(x,y)"> is some sort of cost function. There are canonical choices of cost function, but for the most part we are free to choose something that is convenient.</p>
<p>The measure transport concept is underneath the method of <a href="https://arxiv.org/abs/1908.09257">normalising flows</a>, but the presentation that I’m most familiar with is due to <a href="https://arxiv.org/abs/1109.1516">Youssef Marzouk and his collaborators</a> in 2011 and predates the big sexy normalising flow papers by a few years.</p>
<section id="continuous-distributions-in-1d" class="level3">
<h3 class="anchored" data-anchor-id="continuous-distributions-in-1d">Continuous distributions in 1D</h3>
<p>If <img src="https://latex.codecogs.com/png.latex?p"> and <img src="https://latex.codecogs.com/png.latex?q"> are both continuous, univariate distributions, it is pretty easy to construct a transport map. In particular, if <img src="https://latex.codecogs.com/png.latex?F_p"> is the cumulative distribution function of <img src="https://latex.codecogs.com/png.latex?p">, then <img src="https://latex.codecogs.com/png.latex?%0AT(x)%20=%20F_p%5E%7B-1%7D(F_q(x))%0A"> is a transport map. This works because, if <img src="https://latex.codecogs.com/png.latex?x%20%5Csim%20q">, then <img src="https://latex.codecogs.com/png.latex?F_q(x)%20%5Csim%20%5Ctext%7BUnif%7D(0,1)">. From this, we can use everyone’s favourite result that you can sample from a continuous univariate random variable <img src="https://latex.codecogs.com/png.latex?p"> by evaluating the quantile function at a uniform random value.</p>
<p>There are, of course, two problems with this: it only works in one dimension and we usually don’t know <img src="https://latex.codecogs.com/png.latex?F%5E%7B-1%7D"> explicitly.</p>
<p>The second of these isn’t really a problem if we are willing to do something splendifferously dumb. And I am. Because I’m gay and frivolous<sup>7</sup>.</p>
<p>If I write <img src="https://latex.codecogs.com/png.latex?Q(t)%20=%20F%5E%7B-1%7D(t)"> then I can differentiate this to get <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7BdQ%7D%7Bdt%7D%20=%20%5Cfrac%7B1%7D%7Bp(Q)%7D,%5Cqquad%20Q(0)%20=%20-%5Cinfty.%0A"> This is a <em>very</em> non-linear differential equation. We can make it even more non-linear differential equation by repeating the procedure to get <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7Bd%5E2Q%7D%7Bdt%5E2%7D%20=%20%5Cfrac%7B1%7D%7Bp(Q)%5E2%7D%20p'(Q)%5Cfrac%7BdQ%7D%7Bdt%7D.%0A"> Noting that <img src="https://latex.codecogs.com/png.latex?Q'%20=%201/p(Q)"> we get <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7Bd%5E2%20Q%7D%7Bdt%5E2%7D%20=%20%5Cfrac%7Bp'(Q)%7D%7Bp(Q)%7D%20%5Cleft(%5Cfrac%7BdQ%7D%7Bdt%7D%5Cright)%5E2.%0A"> This is a rubbish differential equation, but it has the singular advantage that it doesn’t depend<sup>8</sup> on the normalising constant for <img src="https://latex.codecogs.com/png.latex?p">, which can be useful. The downside is that the boundary conditions are infinite on both ends.</p>
<p>Regardless of that particular challenge, we could use this to build a generic algorithm.</p>
<ol type="1">
<li><p>Sample <img src="https://latex.codecogs.com/png.latex?u%20%5Csim%20%5Ctext%7BUnif%7D(0,1)"></p></li>
<li><p>Use a numerical differential equation solver to solve the equation with boundary conditions <img src="https://latex.codecogs.com/png.latex?%0Aq(0)%20=%20-M,%20%5Cquad%20q(1)%20=%20M%0A"> for some sufficiently large number <img src="https://latex.codecogs.com/png.latex?M"> and return <img src="https://latex.codecogs.com/png.latex?x%20=%20q(u)"></p></li>
</ol>
<p>This will sample from <img src="https://latex.codecogs.com/png.latex?p(x)"> truncated to <img src="https://latex.codecogs.com/png.latex?%5B-M,%20M%5D">.</p>
<p>I was going to write some python code to do this, but honestly it hurts my soul. So I shan’t.</p>
</section>
<section id="transport-maps-a-less-terrible-method-that-works-on-general-densities" class="level3">
<h3 class="anchored" data-anchor-id="transport-maps-a-less-terrible-method-that-works-on-general-densities">Transport maps: A less terrible method that works on general densities</h3>
<p>Outside of one dimension, there is (to the best of my knowledge) no direct solution to the transport problem. That means that we need to solve our own. Thankfully, the glorious <a href="https://aeroastro.mit.edu/people/youssef-m-marzouk/">Youssef Marzouk</a> and a bunch of his collaborators have spent some quality time mapping out this idea. A really nice survey of their results can be found <a href="https://arxiv.org/pdf/1602.05023.pdf">in this paper</a>.</p>
<p>Essentially the idea is that we can try to find the most convenient transport map available to us. In particular, it’s useful to minimise the <em>Kullback-Leibler</em> divergence between <img src="https://latex.codecogs.com/png.latex?q"> and its transport. After a little bit<sup>9</sup> of maths, this is equivalent to minimising <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D_%7Bx%20%5Csim%20q%7D%5Cleft(%5Clog%20p(T(x))%20+%20%5Clog%20%5Cdet%20%5Cnabla%20T(x)%5Cright),%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Cnabla%20T(x)"> is the Jacobian of <img src="https://latex.codecogs.com/png.latex?T">. To finish the specification of the optimisation problem, it’s enough to consider <em>triangular</em> maps<sup>10</sup> <img src="https://latex.codecogs.com/png.latex?%0AT(x)%20=%20%5Cbegin%7Bpmatrix%7D%20T_1(x_1)%20%5C%5C%20T_2(x_1,x_2,)%20%5C%5C%20%5Cvdots%20%5C%5C%20T_d(x_1,%20%5Cldots,%20x_d)%20%5Cend%7Bpmatrix%7D%0A"> with the additional constraint that their Jacobians have positive determinants. Using a triangular map has two distinct advantages: it’s parsimonious and it makes the positive determinant constraint <em>much</em> easier to deal with. Triangular maps are also sufficient for the problem (my man Bogachev <a href="https://iopscience.iop.org/article/10.1070/SM2005v196n03ABEH000882/meta">showed it in 2005</a>).</p>
<p>That said, this can be a somewhat tricky optimisation problem. Youssef and his friends have spilt a lot of ink on this topic. And if you’re the sort of person who just fucking loves a weird optimisation problem, I’m sure you’ve got thoughts. With and without the triangular constraint, this can be parameterised as the composition of a sequence of simple functions, in which case you turn three times and scream <em>neural net</em> and a normalising<sup>11</sup> flow appears.</p>
</section>
<section id="what-if-we-only-have-samples-from-the-target-density" class="level3">
<h3 class="anchored" data-anchor-id="what-if-we-only-have-samples-from-the-target-density">What if we only have samples from the target density</h3>
<p>All of that is very lovely. And quite nice in its context. But what happens if you don’t actually have access of the (unnormalised) log density of the target? What if you only have samples?</p>
<p>The good news is that you’re not shit out of luck. But it’s a bit tricky. And once again, that <a href="https://arxiv.org/pdf/1602.05023.pdf">lovely review paper</a> by Youssef and friends will tell us how to do it.</p>
<p>In particular, they noticed that if you swap the direction of the KL divergence, you get the optimisation problem for the inverse mapping <img src="https://latex.codecogs.com/png.latex?S(x)%20=%20T%5E%7B-1%7D(x)"> that aims to minimise <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D_%7Bx%20%5Csim%20p%7D%5Cleft(%5Clog(q(S(x))%20+%20%5Clog%20%5Cdet%20%5Cnabla%20S(x)%5Cright)%0A"> where <img src="https://latex.codecogs.com/png.latex?S"> is once again a triangular map subject to the monotonicity constraints <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Cpartial%20S_k%7D%7B%5Cpartial%20x_k%7D%20%3E%200.%0A"> Because we have the freedom to choose the reference density <img src="https://latex.codecogs.com/png.latex?q(x)">, we can choose it to be iid standard normals, in which case we get the optimisation problem <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A&amp;%5Cmin_S%20%5Cmathbb%7BE%7D_%7Bx%20%5Csim%20p%7D%5Cleft%5B%5Csum_%7Bk%20=%201%7D%5Ed%20%5Cfrac%7B1%7D%7B2%7D%5Cleft(S_k(z_1,%20%5Cldots,%20s_k)%5Cright)%5E2%20-%20%20%5Clog%20%5Cfrac%7B%5Cpartial%20S_k%7D%7B%5Cpartial%20x_k%7D%20%5Cright%5D%5C%5C%0A&amp;%5Ctext%7Bs.t.%7D&amp;%20%5C%5C%0A&amp;%5Cquad%20%5Cfrac%7B%5Cpartial%20S_k%7D%7B%5Cpartial%20x_k%7D%20%3E0%20%5C%5C%0A&amp;%5Cquad%20S%20%5Ctext%7B%20is%20triangular%7D,%0A%5Cend%7Balign*%7D"> which is a convex, separable optimisation problem that can be solved<sup>12</sup> using, for instance, a stochastic gradient method. This can be turned into an unconstrained optimisation problem by <a href="https://joss.theoj.org/papers/10.21105/joss.04843">explicitly parameterising the monotonicity constraint</a>.</p>
<p>The monotonicity of <img src="https://latex.codecogs.com/png.latex?S"> makes the resulting nonlinear solve to compute <img src="https://latex.codecogs.com/png.latex?T%20=%20S%5E%7B-1%7D"> relatively straightforward. In fact, if <img src="https://latex.codecogs.com/png.latex?d"> isn’t too big you can solve this sequentially dimension-by-dimension. But, of course, when you’ve got a lot of parameters this is a poor method and it would make more sense<sup>13</sup> to attack it with some sort of gradient descent method. It might even be worth taking the time to learn the inverse function <img src="https://latex.codecogs.com/png.latex?T%20=%20S%5E%7B-1%7D"> so that can be applied for, essentially, free.</p>
</section>
<section id="so-does-it-work" class="level3">
<h3 class="anchored" data-anchor-id="so-does-it-work">So does it work?</h3>
<p>To some extent, the answer is <em>yes</em>. This is <em>very much</em> normalising flows in its most embryonic form. They work to some extent. And this presentation makes some of the problems fairly obvious:</p>
<ol type="1">
<li><p>There’s no real guarantee that <img src="https://latex.codecogs.com/png.latex?T"> is going to be a nice smooth map, which means that we may have problems moving beyond the training sample.</p></li>
<li><p>The most natural way to organise the computations are naturally sequential involving sweeps across the <img src="https://latex.codecogs.com/png.latex?d"> parameters. This can be difficult to parallelise efficiently on modern architectures.</p></li>
<li><p>The complexity of the triangular map is going to depend on the order of variables. This is fine if you’re processing something that is inherently sequential, but if you’re working with image data, this can be challenging.</p></li>
</ol>
<p>Of course, there are a <em>pile</em> of ways that these problems can be overcome in whole or in part. I’d point you to the last five years of ML conference papers. You’re welcome.</p>
</section>
</section>
<section id="continuous-normalising-flows-making-the-problem-easier-by-making-it-harder" class="level2">
<h2 class="anchored" data-anchor-id="continuous-normalising-flows-making-the-problem-easier-by-making-it-harder">Continuous normalising flows: Making the problem easier by making it harder</h2>
<p>A really clever idea, which is related to normalising flows, is to ask <em>what if, instead of looking for a single</em><sup>14</sup> <em>map</em> <img src="https://latex.codecogs.com/png.latex?S(x)%20=%20T%5E%7B-1%7D(x)">, <em>we tried to find a sequence of maps</em> <img src="https://latex.codecogs.com/png.latex?S(x,t)"> <em>that smoothly move from the identity map to to the transport map</em>.</p>
<p>This seems like it would be a harder problem. And it is. You need to make an infinite number of maps. But the saving grace is that as <img src="https://latex.codecogs.com/png.latex?t"> changes slightly, the map <img src="https://latex.codecogs.com/png.latex?S(%5Ccdot,%20t)"> is also only going to change slightly. This means that we can parameterise the <em>change</em> relatively simply.</p>
<p>To this end, we write <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Cpartial%20S%7D%7B%5Cpartial%20t%7D%20=%20f(S,%20t),%0A"> for some relatively simple function <img src="https://latex.codecogs.com/png.latex?f"> that models the infinitesimal change in the transport map as we move along the path. The hope is that learning the vector field <img src="https://latex.codecogs.com/png.latex?f"> will be <em>easier</em> than learning <img src="https://latex.codecogs.com/png.latex?S"> directly. To finish the specification, we require that <img src="https://latex.codecogs.com/png.latex?%0AS(x,0)%20=%20x.%0A"></p>
<p>The question is _can we learn the function <img src="https://latex.codecogs.com/png.latex?f"> from data? If we can, it will be (relatively) easy to evaluate the transport map for any sample by just solving<sup>15</sup> the differential equation.</p>
<p>It turns out that the map <img src="https://latex.codecogs.com/png.latex?S"> is most useful for <em>training</em> the normalising flow, while <img src="https://latex.codecogs.com/png.latex?T"> is useful for generating samples from the trained model. If we were using the methods in the previous section, we would have had to commit to <em>either</em> modelling <img src="https://latex.codecogs.com/png.latex?S"> <em>or</em> <img src="https://latex.codecogs.com/png.latex?T">. One of the real advantages of the continuous formulation is that we can just as easily solve the equation with the <em>terminal condition</em><sup>16</sup> <img src="https://latex.codecogs.com/png.latex?%0AS(x,1)%20=%20u%0A"> and solve the equation backwards in time to calculate <img src="https://latex.codecogs.com/png.latex?T(u)%20=%20S(x,%200)">! The dynamics of both equations are driven by the vector field <img src="https://latex.codecogs.com/png.latex?f">!</p>
<section id="a-very-quick-introduction-to-inverse-problems" class="level3">
<h3 class="anchored" data-anchor-id="a-very-quick-introduction-to-inverse-problems">A very quick introduction to inverse problems</h3>
<p>It turns out that learning parameters of differential equation (and other physical models) has a long and storied history in applied mathematics under the name of <em>inverse problems</em>. If that sounds like statistics, you’d be right. It’s statistics, except with no interest in measurement or, classically, uncertainty.</p>
<p>The classic inverse problem framing involves a <em>forward map</em> <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BF%7D(f)(t,%20x)"> that takes as its input some parameters (often a function) and returns the full state of a system (often another function). For instance, the forwards map could be the solution of a partial differential equation like <img src="https://latex.codecogs.com/png.latex?%0A%20%20%5Cfrac%7B%5Cpartial%20S%7D%7B%5Cpartial%20t%7D%20=%20f(S,%20t),%20%5Cqquad%20S(0)%20=%20x.%0A"> The thing that you should notice about this is that the forward map is a) possibly expensive to compute, b) not explicitly known, and c) extremely<sup>17</sup> non-linear.</p>
<p>The problem is specified with <img src="https://latex.codecogs.com/png.latex?n"> data points <img src="https://latex.codecogs.com/png.latex?(t_1,%20x_1),%20%5Cldots,%20(t_n,%20x_n)"> and the aim is to find the value of <img src="https://latex.codecogs.com/png.latex?f"> that best fits the data. The traditional choice is to minimise the mean-square error <img src="https://latex.codecogs.com/png.latex?%0A%20%20%5Ctheta%20=%20%5Carg%20%5Cmin_%5Ctheta%20%5Csum_%7Bi=1%7D%5En%20%5Cleft(y_i%20-%20%5Cmathcal%7BF%7D(f)(t_i,x_i)%5Cright)%5E2.%0A"></p>
<p>Now every single one of you will know immediately that this question is both vague and ill-posed. There are <em>many</em> functions <img src="https://latex.codecogs.com/png.latex?f"> that will fit the data. This means that we need to enforce<sup>18</sup> some sort of complexity penalty on <img src="https://latex.codecogs.com/png.latex?f">. This leads to the method known as Tikhonov regularisation<sup>19</sup> <img src="https://latex.codecogs.com/png.latex?%0A%20%20%5Ctheta%20=%20%5Carg%20%5Cmin_%7B%5Ctheta%20%5Cin%20B%7D%20%5Csum_%7Bi=1%7D%5En%20%5Cleft(y_i%20-%20%5Cmathcal%7BF%7D(f)(t_i,x_i)%5Cright)%5E2%20+%20%5Clambda%5C%7Cf%5C%7C_B%5E2,%0A%20%20"> where <img src="https://latex.codecogs.com/png.latex?B"> is some Banach space and <img src="https://latex.codecogs.com/png.latex?%5Clambda%3E0"> is some tuning parameter.</p>
<p>As you can imagine, there’s a lot of maths under this about when there is a unique minimum, how the reconstruction behaves as <img src="https://latex.codecogs.com/png.latex?n%5Crightarrow%20%5Cinfty"> and <img src="https://latex.codecogs.com/png.latex?%5Clambda%20%5Crightarrow%200">, and how the choice of <img src="https://latex.codecogs.com/png.latex?B"> effects the estimation of <img src="https://latex.codecogs.com/png.latex?%5Ctheta">. There is also quite a lot of work<sup>20</sup> looking at how to actually solve these sorts of optimisation problems.</p>
<p>Eventually, the field evolved and people started to realise that it’s actually fairly important to quantify the uncertainty in the estimate. This is … tricky under the Tikhonov regularlisation framework, which became a big motivation for <em>Bayesian</em> inverse problems.</p>
<p>As with all Bayesianifications, we just need to turn the above into a likelihood and a prior. Easy. Well, the likelihood part, at least, is easy. If we want to line up with Tikhonov regularisation, we can choose a Gaussian likelihood <img src="https://latex.codecogs.com/png.latex?%0Ay_i%20%5Cmid%20f,%20x_i,%20t_i,%20%5Csigma%20%5Csim%20N(%5Cmathcal%7BF%7D(f)(t_i,x_i),%20%5Csigma%5E2).%0A"></p>
<p>This is familiar to statisticians, the forward model is essentially working as a non-standard link function in a generalised linear model. There are two big practical differences. The first one is that <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BF%7D"> is <em>very</em> non-linear and almost certainly not monotone. The second problem is that evaluations of <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BF%7D"> are typically very<sup>21</sup> expensive. For instance, you may need to solve a system of differential equations. This means that any computational method<sup>22</sup> is going to need to minimise the number of likelihood evaluations.</p>
<p>The choice of prior on <img src="https://latex.codecogs.com/png.latex?f"> can, however, be a bit tricky. The problem is that in most traditional inverse problems <img src="https://latex.codecogs.com/png.latex?f"> is a function<sup>23</sup> and so we need to put a carefully specified prior on it. And there is a lot of really interesting work on what this means in a Bayesian setting. This is really the topic for another blogpost, but it’s certainly an area where you need to be aware of the limitations of different high-dimensional priors and how they perform in various contexts. For instance, if the function you are trying to reconstruct is likely to have a lot of sharp boundaries<sup>24</sup> then you need to make sure that your prior can support functions with sharp boundaries. My little soldier bois<sup>25</sup> don’t, so you need to get more<sup>26</sup> creative.</p>
</section>
<section id="the-likelihood-for-a-normalising-flow" class="level3">
<h3 class="anchored" data-anchor-id="the-likelihood-for-a-normalising-flow">The likelihood for a normalising flow</h3>
<p>Our aim now is to cast the normalising flow idea into the inverse problems framework. To do this, we remember that we begin our flow from a sample from <img src="https://latex.codecogs.com/png.latex?p(x)"> and we then deform it until it becomes a sample from <img src="https://latex.codecogs.com/png.latex?q(u)"> at some known time (which I’m going to choose as <img src="https://latex.codecogs.com/png.latex?t=1">). This means that if <img src="https://latex.codecogs.com/png.latex?x_i%20%5Csim%20p">, then <img src="https://latex.codecogs.com/png.latex?%0AS(x_i,%201)%20%5Csim%20q.%0A"></p>
<p>We can now derive a relationship between <img src="https://latex.codecogs.com/png.latex?p"> and <img src="https://latex.codecogs.com/png.latex?q"> using the change of variables formula. In particular, <img src="https://latex.codecogs.com/png.latex?%0Ap(x%20%5Cmid%20f)%20=%20q(S(x,1))%5Cleft%7C%5Cdet%20%5Cleft(%20%5Cfrac%7Bd%20S(x,1)%7D%7Bdx%20%7D%5Cright)%5Cright%7C,%0A"> which means that our log likelihood will be <img src="https://latex.codecogs.com/png.latex?%0A%5Clog%20p(x%20%5Cmid%20f)%20=%20%5Clog%20q(S(x,1))%20+%20%5Clog%20%5Cleft%7C%5Cdet%20%5Cleft(%20%5Cfrac%7Bd%20S(x,1)%7D%7Bdx%20%7D%5Cright)%5Cright%7C.%0A"></p>
<p>The log-determinant term looks like it might cause some trouble. If <img src="https://latex.codecogs.com/png.latex?S"> is parameterised as a triangular map it can be written explicitly, but there is, of course, another route.</p>
<p>For notational ease, let’s consider <img src="https://latex.codecogs.com/png.latex?z_t%20=%20S(x,%20t)">, for some <img src="https://latex.codecogs.com/png.latex?t%20%3C1">. Then <img src="https://latex.codecogs.com/png.latex?%0A%5Clog%20p(z_t%20%5Cmid%20f)%20=%20%5Clog%20q(S(x,1))%20+%20%5Clog%20%5Cleft%7C%5Cdet%20%5Cleft(%20%5Cfrac%7Bd%20S(x,t)%7D%7Bdx%20%7D%5Cright)%5Cright%7C.%0A"> We can differentiate this with respect to <img src="https://latex.codecogs.com/png.latex?t"> to get <sup>27</sup> to get <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Cpartial%20%5Clog%20p(z_t%20%5Cmid%20f)%7D%7B%5Cpartial%20t%7D%20=%20%5Coperatorname%7Btr%7D%5Cleft(%5Cfrac%7Bdf%7D%7Bdx%7D(z_t,t)%5Cright),%0A"> where I used one of those <em>magical</em> vector calculus identities to get that trace. Remembering that <img src="https://latex.codecogs.com/png.latex?S(x,0)%20=%20x">, the log-determinant of the Jacobian at zero is zero and so we get the initial condition <img src="https://latex.codecogs.com/png.latex?%0A%5Clog%20p(z_t%20%5Cmid%20f)%20=%20%5Clog%20q(S(x,1)).%0A"></p>
<p>The likelihood can be evaluated<sup>28</sup> by solving the system of differential equations <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Cfrac%7Bd%20z_t%7D%7Bdt%7D%20&amp;=%20f(z_t,%20t)%20%5C%5C%0A%5Cfrac%7Bd%20%5Cell%7D%7Bdt%7D%20&amp;=%5Coperatorname%7Btr%7D%5Cleft(%5Cfrac%7Bdf%7D%7Bdx%7D(z_t,t)%5Cright)%20%5C%5C%0Az_0%20&amp;=%20x%20%5C%5C%0A%5Cell(0)%20&amp;=%200,%0A%5Cend%7Balign*%7D"> and the log likelihood is evaluated as <img src="https://latex.codecogs.com/png.latex?%0A%5Clog%20p(x%20%5Cmid%20f)%20=%20%5Clog%20q(z_1)%20+%20%5Cell(1).%0A"></p>
<p>It turns out that you can take gradients of the log-likelihood efficiently by solving <a href="https://papers.nips.cc/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf">an augmented system of differential equations</a> that’s twice the size of the original. This allows for all kinds of gradient-driven inferential shenanigans.</p>
</section>
<section id="but-oh-that-complexity" class="level3">
<h3 class="anchored" data-anchor-id="but-oh-that-complexity">But oh that complexity</h3>
<p>One big problem with normalising flows as written is that we only have two pieces of information about the entire trajectory <img src="https://latex.codecogs.com/png.latex?z_t">:</p>
<ol type="1">
<li><p>we know that <img src="https://latex.codecogs.com/png.latex?z(1)%20%5Csim%20q">, and</p></li>
<li><p>we know that <img src="https://latex.codecogs.com/png.latex?z(0)%20%5Csim%20p">.</p></li>
</ol>
<p>We know <em>absolutely nothing</em> about <img src="https://latex.codecogs.com/png.latex?z_t"> outside of those boundary conditions. This means that our model for <img src="https://latex.codecogs.com/png.latex?f"> basically gets to freestyle in those areas.</p>
<p>We can avoid this to some extent by choosing appropriate neural network architectures and/or appropriate penalties in the classical case or priors in the Bayesian case. There’s a whole mini-literature on choosing appropriate penalties.</p>
<p>Just to show how complex it is, let me quickly sketch what <a href="https://arxiv.org/abs/2002.02798">Finlay etc</a> suggest as a way to keep the dynamics as boring as possible in the information desert. They lean into the literature on optimal transport theory to come up with the double penalty <img src="https://latex.codecogs.com/png.latex?%0A%5Cmin_f%20%5Csum_%7Bi=1%7D%5En%20%5Cleft(-%5Clog%20p(x_i)%20+%20%5Clambda_1%20%5Cint_0%5ET%20%5C%7Cf(S(x_i,s),s)%5C%7C_2%5E2%5C,ds%20+%20%5Clambda_2%20%5Cint_0%5ET%5Cleft%5C%7C%5Cfrac%7Bd%20f(S(x_i,s))%7D%7Bds%7D%5Cright%5C%7C_F%5E2%5C,ds%5Cright),%0A"> where the first term minimises the kinetic energy and, essentially, finds the least exciting path from <img src="https://latex.codecogs.com/png.latex?p"> to <img src="https://latex.codecogs.com/png.latex?q">, while the second term ensures that the Jacobian of <img src="https://latex.codecogs.com/png.latex?f"> doesn’t get too big<sup>29</sup>, which means that the mapping doesn’t have many sharp changes. Both of these penalty terms are designed to both aid generalisation and to make sure the differential equation isn’t unnecessarily difficult for a ODE solver.</p>
<p>A slightly odd feature of these penalties is that they are both data dependent. That suggests that a prior would, probably, require an <em>amount</em> of work. This is work that I don’t feel like doing today. Especially because this blog post isn’t about bloody normalising flows.</p>
</section>
</section>
<section id="diffusion-models" class="level2">
<h2 class="anchored" data-anchor-id="diffusion-models">Diffusion models</h2>
<p>Ok, so normalising flows are cool, but there are a couple of places where they could potentially be improved. There is a <em>long</em> literature on diffusion models, but the one I’m mostly stealing from is <a href="https://arxiv.org/abs/2011.13456">this one by Song et al.&nbsp;(2021)</a></p>
<p>Firstly, the vector field <img src="https://latex.codecogs.com/png.latex?f"> <em>directly</em> effects how easy the differential equations are to solve. This means that if <img src="https://latex.codecogs.com/png.latex?f"> is too complicated, it can take a long time to both train the model and generate samples from the trained model. To get around this you need to put fairly strict penalties<sup>30</sup> and/or structural assumptions on <img src="https://latex.codecogs.com/png.latex?f">.</p>
<p>Secondly, we only have information<sup>31</sup> at two ends of the flow. The problem would become <em>a lot</em> easier if we could somehow get information about intermediate states. In the inverse problems literature, there’s a concept of <em>value of information</em> that talks about how useful sampling a particular time point can be in terms of reducing model uncertainty. In general this, or other criteria, can be used to design a set of useful sampling times. I don’t particularly feel like working any of this out but one thing I am fairly certain of is that no optimal design would only have information at <img src="https://latex.codecogs.com/png.latex?t=0"> and <img src="https://latex.codecogs.com/png.latex?t=1">!</p>
<p>Diffusion models fix these two aspects of normalising flows at the cost of both a more complex mathematical formulation and some inexactness<sup>32</sup> around the base distribution <img src="https://latex.codecogs.com/png.latex?q"> when generating new samples.</p>
<section id="diffusions-and-stochastic-differential-equations" class="level3">
<h3 class="anchored" data-anchor-id="diffusions-and-stochastic-differential-equations">Diffusions and stochastic differential equations</h3>
<p>Diffusions are to applied mathematicians what gaffer tape is to<sup>33</sup> a roadie. They are a ubiquitous, convenient, and they hold down the fort when nothing else works.</p>
<p>There are a number of diffusions that are familiar in statistics and machine learning. The most famous one is probably the Langevin diffusion <img src="https://latex.codecogs.com/png.latex?%0AdX_t%20=%20%5Cfrac%7B1%7D%7B2%7D%5Cnabla%20%5Clog%20p(x)%20dt%20+%20%5Csigma%20dW_t,%0A"> which is asymptotically distributed according to <img src="https://latex.codecogs.com/png.latex?p">. This forms the basis of a bunch of MCMC methods as well as some faster, less adjusted methods.</p>
<p>But that is not the only diffusion. Today’s friend is the Ornstein-Uhlenbeck (OU) process, which is a Gaussian process that <img src="https://latex.codecogs.com/png.latex?%0AdX_t%20=%20-%20%5Cfrac%7B1%7D%7B2%7D%20X_t%20%5C,dt%20+%20%5Csigma%20dW_t.%0A"> The OU process can be thought of as a mean-reverting Brownian motion. As such, it has continuous but nowhere differentiable sample paths</p>
<p>The stationary distribution of <img src="https://latex.codecogs.com/png.latex?X_t"> is <img src="https://latex.codecogs.com/png.latex?X_%5Cinfty%20%5Csim%20N(0,%20%5Csigma%5E2I)">, where <img src="https://latex.codecogs.com/png.latex?I"> is the identity matrix. In fact, if we <em>start</em> the diffusion at stationarity by setting <img src="https://latex.codecogs.com/png.latex?%0AX_0%20%5Csim%20N(0,%20%5Csigma%5E2I),%0A"> then X_t is a <em>stationary</em> Gaussian process with covariance function <img src="https://latex.codecogs.com/png.latex?%0Ac(t,%20t')%20=%20%5Csigma%5E2e%5E%7B-%5Cfrac%7B1%7D%7B2%7D%20%7Ct-t'%7C%7DI.%0A"></p>
<p>More interestingly in our context, however, is what happens if we start the diffusion from a fixed point <img src="https://latex.codecogs.com/png.latex?x">, that will eventually be a sample from <img src="https://latex.codecogs.com/png.latex?p(x)">. In that case, we can solve the linear stochastic differential equation exactly to get <img src="https://latex.codecogs.com/png.latex?%0AX_t%20=%20xe%5E%7B-%5Cfrac%7B1%7D%7B2%7Dt%7D%20+%20%5Csigma%20%5Cint_0%5Et%20e%5E%7B%5Cfrac%7B1%7D%7B2%7D(s-t)%7D%5C,dW_s,%0A"> where the integral on the right hand side can be interpreted<sup>34</sup> as a <a href="https://dansblog.netlify.app/posts/2023-01-21-markov/markov.html#white-noise-and-its-associated-things">white noise integral</a> and so <img src="https://latex.codecogs.com/png.latex?%0AX_t%20%5Csim%20N%5Cleft(xe%5E%7B-t%7D,%20%5Csigma%5E2%5Cint_0%5Et%20e%5E%7Bs-t%7D%5C,dt%5Cright),%0A"> and the variance is <img src="https://latex.codecogs.com/png.latex?%0A%5Csigma%5E2%5Cint_0%5Et%20e%5E%7Bs-t%7D%5C,dt%20=%20%5Csigma%5E2%20e%5E%7B-t%7D%5Cfrac%7B1%7D%7B2%7D%5Cleft(e%5E%7Bt%7D%20-%201%5Cright)%20=%20%5Csigma%5E2(1-e%5E%7B-t%7D).%0A"> From these equations, we see that the mean of the diffusion hurtles exponentially fast towards zero and the variance moves at the same speed towards <img src="https://latex.codecogs.com/png.latex?%5Csigma%5E2">.</p>
<p>More importantly, this means that, given a starting point <img src="https://latex.codecogs.com/png.latex?X_0%20=%20x">, we can generate data from any part of the diffusion <img src="https://latex.codecogs.com/png.latex?X_t">! If we want a sequence of observations from the same trajectory, we can generate them sequentially using the fact that and OU process is a Markov<sup>35</sup> process. This means that we are no longer limited to information at just two points along the trajectory.</p>
</section>
<section id="reversing-the-diffusion" class="level3">
<h3 class="anchored" data-anchor-id="reversing-the-diffusion">Reversing the diffusion</h3>
<p>So far, there is nothing to learn here. The OU process has a known drift and variance, so everything is splendid. It’s even easy to simulate from. The challenge pops up when we try to reverse the diffusion, that is, when we try to <em>remove</em> noise from a sample rather than add noise to it.</p>
<p>In some sense, this shouldn’t be too disgusting. A diffusion is a Markov process and, if we run the Markov process back in time, we still get a Markov process. In fact, we are going to get another diffusion process.</p>
<p>The twist is that the new diffusion process is going to be quite a bit more complex than the original one. The problem is that unless <img src="https://latex.codecogs.com/png.latex?X_0"> comes from a Gaussian distribution, this process will be non-Gaussian, and thus somewhat tricky to find the reverse trajectory of.</p>
<p>To see this, consider <img src="https://latex.codecogs.com/png.latex?s%3Et"> and recall that <img src="https://latex.codecogs.com/png.latex?%0Ap(X_0,%20X_t,%20X_s)%20=%20p(X_s%20%5Cmid%20X_t)p(X_t%20%5Cmid%20X_0)p(X_0)%0A"> and <img src="https://latex.codecogs.com/png.latex?%0Ap(X_t,%20X_s)%20=%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%20p(X_s%20%5Cmid%20X_t)%20p(X_t%20%5Cmid%20X_0)%20p(X_0)%5C,dX_0.%0A"> The first two terms in that integrand are Gaussian densities and thus their product is a bivariate Gaussian density <img src="https://latex.codecogs.com/png.latex?%0AX_t,%20X_s%20%5Cmid%20X_0%20%5Csim%20N%5Cleft(X_0%5Cbegin%7Bpmatrix%7De%5E%7B-%5Cfrac%7Bt%7D%7B2%7D%7D%5C%5Ce%5E%7B-%5Cfrac%7Bs%7D%7B2%7D%7D%5Cend%7Bpmatrix%7D,%20%5Csigma%5E2%20%5Cbegin%7Bpmatrix%7D%201%20&amp;%20e%5E%7B-%5Cfrac%7Bs-t%7D%7B2%7D%7D%20-%20e%5E%7B-%5Cfrac%7Bs+t%7D%7B2%7D%7D%20%5C%5C%20e%5E%7B-%5Cfrac%7Bs-t%7D%7B2%7D%7D%20-%20e%5E%7B-%5Cfrac%7Bs+t%7D%7B2%7D%7D%20&amp;%201%5Cend%7Bpmatrix%7D%5Cright).%0A"> Unfortunately, as <img src="https://latex.codecogs.com/png.latex?X_0"> is not Gaussian, the marginal distribution will be non-Gaussian. This means that our reverse time transition density <img src="https://latex.codecogs.com/png.latex?%0Ap(X_t%20%5Cmid%20X_s)%20=%20%5Cfrac%7B%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%20p(X_t,X_s%20%5Cmid%20X_0)%20p(X_0)%5C,dX_0%7D%7B%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%20p(X_s%20%5Cmid%20X_t)%20%20p(X_0)%5C,dX_0%7D%0A"> is also going to be <em>very</em> non-linear.</p>
<p>In order to work out a stochastic differential equation that runs backwards in time and generates the same trajectory, we need a little bit of theory on how the unconditional density <img src="https://latex.codecogs.com/png.latex?p(X_t)"> and the transition density <img src="https://latex.codecogs.com/png.latex?p(X_t%20%5Cmid%20X_s)"> evolves in time <img src="https://latex.codecogs.com/png.latex?t"> (here and everywhere st). These are related through the Kolmogorov equations.</p>
<p>To introduce these, we need to briefly consider the more general diffusion <img src="https://latex.codecogs.com/png.latex?%0AdX_t%20=%20f(X_t,%20t)dt%20+%20g(X_t,t)dW_t%0A"> for nice<sup>36</sup> vector/matrix-valued functions <img src="https://latex.codecogs.com/png.latex?f"> and <img src="https://latex.codecogs.com/png.latex?g">. Kolmogorov showed that the unconditional density <img src="https://latex.codecogs.com/png.latex?p(X_t)%20=%20p(x,t)"> evolves according the the partial differential equation <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Cpartial%20p(x,t)%7D%7B%5Cpartial%20t%7D%20=%20-%20%5Csum_%7Bi=1%7D%5Ed%20%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%20x_i%7D%5Cleft(f_i(x,t)p(x,t)%5Cright)%20+%20%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bi,j,k%20=%201%7D%5Ed%5Cfrac%7B%5Cpartial%5E2%7D%7B%5Cpartial%20x_j%7D%5Cleft(%20g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)p(x,t)%5Cright)%0A"> subject to the initial condition <img src="https://latex.codecogs.com/png.latex?%0Ap(x,0)%20=p(x).%0A"> This is known as Kolmogorov’s forward equation or the Fokker-Planck equation.</p>
<p>The other key result is about the density of <img src="https://latex.codecogs.com/png.latex?X_t"> <em>conditioned on some future value</em> <img src="https://latex.codecogs.com/png.latex?X_s%20=%20y">, <img src="https://latex.codecogs.com/png.latex?s%20%5Cgeq%20t">. We write this density as <img src="https://latex.codecogs.com/png.latex?p(X_s%20=y%5Cmid%20X_t%20=x)%20=p(x,t;%20u,s)"> and it satisfies the partial differential equation <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Cpartial%20q(x,t;u,s)%7D%7B%5Cpartial%20t%7D%20=%20-%5Csum_%7Bi=1%7D%5Ed%20f_i(x,t)%5Cfrac%7B%5Cpartial%20q(x,t;u,s)%7D%7B%5Cpartial%20x_i%7D%20-%20%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bi,j,k=1%7D%5Ed%20g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%5Cfrac%7B%5Cpartial%5E2%20q(x,t;u,s)%7D%7B%5Cpartial%20x_i%5Cpartial%20x_j%7D%0A"> subject to the <em>terminal</em> condition <img src="https://latex.codecogs.com/png.latex?%0Ap(x,s;u,s)%20=%20p(u,s).%0A"> This is known as the Kolmogorov backward equation. Great names. Beautiful names.</p>
<p>Let’s consider a differential equation for the joint density <img src="https://latex.codecogs.com/png.latex?%0Ap(X_t%20=%20x,%20X_s=%20y)%20=%20p(x,t,u,s)%20=%20q(x,t;u,s)p(x,t).%0A"> Going ham with the product rule gives <span id="eq-diff1"><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Balign*%7D%0A%5Cfrac%7B%5Cpartial%20p(x,t,u,s)%7D%7B%5Cpartial%20t%7D%20&amp;=%20p(x,%20t)%5Cfrac%7B%5Cpartial%20q(x,t;u,s)%7D%7B%5Cpartial%20t%7D%20+%20q(x,t;u,s)%20%5Cfrac%7B%5Cpartial%20p(x,t)%7D%7B%5Cpartial%20t%7D%20%5C%5C%0A&amp;=-%5Csum_%7Bi=1%7D%5Ed%20p(x,t)f_i(x,t)%5Cfrac%7B%5Cpartial%20q(x,t;u,s)%7D%7B%5Cpartial%20x_i%7D%20-%20%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bijk%7D%20p(x,t)g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%20%5Cfrac%7B%5Cpartial%5E2%20q(x,t;u,s)%7D%7B%5Cpartial%20x_i%20%5Cpartial%20x_j%7D%20%5C%5C%20&amp;%5Cqquad-%5Csum_%7Bi=1%7D%5Edq(x,t;u,s)%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%20x_i%7D(p(x,t)f(x,t))%20+%20%5Cfrac%7B1%7D%7B2%7D%20%5Csum_%7Bijk%7Dq(x,t;u,s)%5Cfrac%7B%5Cpartial%5E2%7D%7B%5Cpartial%20x_i%20%5Cpartial%20x_j%7D(g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)p(x,t))%20.%0A%5Cend%7Balign*%7D%0A%5Ctag%7B1%7D"></span> The first-order derivatives simplify, using the product rule, to <img src="https://latex.codecogs.com/png.latex?%0A-%5Csum_%7Bi=1%7D%5Ed%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%20x_i%7D(p(x,t,u,s)f(x,t))%0A"></p>
<p>Staring at this for a moment, we notice that this looks has the same structure as the first-order term on the forward equation. In that case, the second-order term would be <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Balign*%7D%0A&amp;%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bi,j,k=1%7D%5Ed%5Cfrac%7B%5Cpartial%5E2%7D%7B%5Cpartial%20x_i%20x_j%7D%5Bp(x,t,u,s)%20g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%5D%20=%20%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bi,j,k=1%7D%5Ed%5Cfrac%7B%5Cpartial%5E2%7D%7B%5Cpartial%20x_i%20x_j%7D%5Bq(x,t;u,s)%20(p(x,t)g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t))%5D%20%5C%5C%0A&amp;%5Cqquad%5Cqquad=%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bi,j,k=1%7D%5Ed%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%20x_i%7D%5Cleft%5B%20q(x,t;u,s)%5Cfrac%7B%5Cpartial%20%7D%7B%5Cpartial%20x_j%7D%5Cleft(p(x,t)g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%5Cright)%20+%20p(x,t)g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%20%5Cfrac%7B%5Cpartial%20q(x,t;u,s)%7D%7B%5Cpartial%20x_j%7D%5Cright%5D%0A%5Cend%7Balign*%7D%0A"></p>
<p>If we notice that <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Balign*%7D%0A%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%20x_i%7D%5Cleft%5Bp(x,t)g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%20%5Cfrac%7B%5Cpartial%20q(x,t;u,s)%7D%7B%5Cpartial%20x_j%7D%5Cright%5D%20=&amp;%20%20p(x,t)g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%20%5Cfrac%7B%5Cpartial%5E2%20q(x,t;u,s)%7D%7B%5Cpartial%20x_i%20%5Cpartial%20x_j%7D%20%5C%5C%0A&amp;%5Cquad+%20%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%20x_i%7D%20%5Cleft%5Bp(x,t)g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%5Cright%5D%5Cleft%5B%20%5Cfrac%7B%5Cpartial%20q(x,t;u,s)%7D%7B%20%5Cpartial%20x_j%7D%5Cright%5D%0A%5Cend%7Balign*%7D%0A"> and <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Balign*%7D%0A%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%20x_i%7D%5Cleft%5B%20q(x,t;u,s)%5Cfrac%7B%5Cpartial%20%7D%7B%5Cpartial%20x_j%7D%5Cleft(p(x,t)g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%5Cright)%5Cright%5D%20=&amp;%20%20q(x,t;u,s)%5Cfrac%7B%5Cpartial%5E2%20%7D%7B%5Cpartial%20x_i%20%5Cpartial%20x_j%7D%20p(x,t)g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%20%5C%5C%0A&amp;%5Cquad+%20%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%20x_i%7D%20%5Cleft%5Bp(x,t)g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%5Cright%5D%5Cleft%5B%20%5Cfrac%7B%5Cpartial%20q(x,t;u,s)%7D%7B%20%5Cpartial%20x_j%7D%5Cright%5D%0A%5Cend%7Balign*%7D%0A"> we can re-write the second-order derivative terms in Equation&nbsp;1 as <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bi,j,k=1%7D%5Ed%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%20x_i%7D%5Cleft%5B%20q(x,t;u,s)%5Cfrac%7B%5Cpartial%20%7D%7B%5Cpartial%20x_j%7D%5Cleft(p(x,t)g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%5Cright)%20-%20p(x,t)g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%20%5Cfrac%7B%5Cpartial%20q(x,t;u,s)%7D%7B%5Cpartial%20x_j%7D%5Cright%5D%0A"></p>
<p>This is almost, but not quite, what we want. We are a single minus sign away. Remembering that <img src="https://latex.codecogs.com/png.latex?q(x,t;u,s)%20=%20p(x,t,u,s)/p(x,t)"> we probably don’t want it to turn up in any derivatives<sup>37</sup>. To this end, let’s make the substitution <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Balign*%7D%0A%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bi,j,k=1%7D%5Ed%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%20x_i%7D%5Cleft%5B%20p(x,t)g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%20%5Cfrac%7B%5Cpartial%20q(x,t;u,s)%7D%7B%5Cpartial%20x_j%7D%5Cright%5D%0A=&amp;%20%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bi,j,k=1%7D%5Ed%5Cfrac%7B%5Cpartial%5E2%7D%7B%5Cpartial%20x_i%5Cpartial%20x_j%7D%5Bp(x,t,u,s)%20g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%5D%5C%5C%0A&amp;%20-%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bi,j,k=1%7D%5Ed%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%20x_i%7D%5Cleft%5B%20q(x,t;u,s)%5Cfrac%7B%5Cpartial%20%7D%7B%5Cpartial%20x_j%7D%5Cleft(p(x,t)g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%5Cright)%20%5Cright%5D.%0A%5Cend%7Balign*%7D%0A"> With this substitution the second order terms are <img src="https://latex.codecogs.com/png.latex?%0A%5Csum_%7Bi=1%7D%5Ed%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%20x_i%7D%5Cleft%5B%20p(x,t,u,s)%20h(x,t)%5Cright%5D%20-%20%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bi,j,k=1%7D%5Ed%5Cfrac%7B%5Cpartial%5E2%7D%7B%5Cpartial%20x_i%5Cpartial%20x_j%7D%5Bp(x,t,u,s)%20g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%5D,%0A"> where <img src="https://latex.codecogs.com/png.latex?%0Ah(x,t)%20=%20%5Cfrac%7B1%7D%7Bp(x,t)%7D%5Csum_%7Bj,k=1%7D%5Ed%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%20x_j%7D%5Cleft%5Bp(x,t)g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t))%5Cright%5D.%0A"></p>
<p>If we write <img src="https://latex.codecogs.com/png.latex?%0A%5B%5Cbar%7Bf%7D(x,t)%5D_i%20=%20f(x,t)%20-%20h(x,t)%20=%20f(x,t)%20-%20%20%5Cfrac%7B1%7D%7Bp(x,t)%7D%5Csum_%7Bj,k=1%7D%5Ed%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%20x_j%7D%5Cleft%5Bp(x,t)g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t))%5Cright%5D,%0A"> we get the joint PDE <span id="eq-diff2"><img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Cpartial%20p(x,t,u,s)%7D%7B%5Cpartial%20t%7D%20=%20-%5Csum_%7Bi=1%7D%5Ed%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%20x_i%7D%5Bp(x,t,u,s)%5Cbar%7Bf%7D(x,t)%5D%20-%20%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bi,j,k=1%7D%5Ed%5Cfrac%7B%5Cpartial%5E2%7D%7B%5Cpartial%20x_i%5Cpartial%20x_j%7D%5Bp(x,t,u,s)%20g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%5D.%0A%5Ctag%7B2%7D"></span></p>
<p>In order to identify the reverse time diffusion, we are going to find the reverse time backward equation, which confusingly, is for <img src="https://latex.codecogs.com/png.latex?%0Aq(u,s;%20x,t)%20=%5Cfrac%7Bp(X_t%20=%20x,%20X_s%20=y))%7D%7Bp(X_s%20=y)%7D%20=%5Cfrac%7Bp(x,t,s,y)%7D%7Bp(u,s)%7D.%0A"> As <img src="https://latex.codecogs.com/png.latex?p(u,s)"> is a constant in both <img src="https://latex.codecogs.com/png.latex?x"> and <img src="https://latex.codecogs.com/png.latex?t">, we can divide both sides of Equation&nbsp;2 by it to get <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Cpartial%20q(x,t;u,s)%7D%7B%5Cpartial%20t%7D%20=%20-%5Csum_%7Bi=1%7D%5Ed%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%20x_i%7D%5Bq(x,t;u,s)%5Cbar%7Bf%7D(x,t)%5D%20-%20%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bi,j,k=1%7D%5Ed%5Cfrac%7B%5Cpartial%5E2%7D%7B%5Cpartial%20x_i%5Cpartial%20x_j%7D%5Bq(x,t;u,s)%20g_%7Bik%7D(x,t)g_%7Bjk%7D(x,t)%5D.%0A"> where again <img src="https://latex.codecogs.com/png.latex?s%3Et"> and <img src="https://latex.codecogs.com/png.latex?s"> and <img src="https://latex.codecogs.com/png.latex?y"> are known.</p>
<p>This is the forward Kolmogorov equation for the time-reversed<sup>38</sup> diffusion <img src="https://latex.codecogs.com/png.latex?%0AdX_t%20=%20%5Cbar%7Bf%7D(X_t,%20t)dt%20+%20g(X_t,%20t)d%5Ctilde%7BW%7D_t,%20%5Cqquad%20X_s%20=%20u,%0A"> where <img src="https://latex.codecogs.com/png.latex?d%20%5Ctilde%7BW%7D_t"> is another white nose. <a href="https://core.ac.uk/download/pdf/82826666.pdf">Anderson (1982)</a> shows how to connect the white noise <img src="https://latex.codecogs.com/png.latex?dW_t"> that’s driving the forward dynamics with the white noise that’s driving the reverse dynamics <img src="https://latex.codecogs.com/png.latex?d%5Ctilde%7BW%7D_t">, but that’s overkill for our present situation.</p>
<p>In the context of an OU process, we get the reverse equation <img src="https://latex.codecogs.com/png.latex?%0AdX_t=%20-%5Cleft%5B%5Cfrac%7B1%7D%7B2%7D%20X_t%20+%20%5Csigma%5E2%20%5Cnabla%20%20%5Clog%20p(X_t,%20t)%5Cright%5D%5C,dt%20+%20%5Csigma%5C,%20dW_t,%0A"> where time runs backwards and I’ve used the formula for the logarithmic derivative.</p>
<p>Unlike the forward process, the reverse process is the solution to a <em>non-linear</em> stochastic differential equation. In general, this cannot be solved in closed form and we need to use a numerical SDE solver to generate a sample.</p>
<p>It’s worth noting that the OU process is an overly simple cartoon of a diffusion model. In practice, <img src="https://latex.codecogs.com/png.latex?%5Csigma%20=%20%5Csigma_t"> is usually an increasing function of time so the system injects more noise as the diffusion moves along. This changes some of the exact equations slightly, but you can still sample <img src="https://latex.codecogs.com/png.latex?X_t%20%5Cmid%20X_0"> analytically for any <img src="https://latex.codecogs.com/png.latex?t"> (as long as you choose a fairly simple function for <img src="https://latex.codecogs.com/png.latex?%5Csigma_t">). There is a <em>large</em> literature on these choices and, to be honest, I can’t be bothered going through them here. But obviously if you want to implement a diffusion model yourself you should look this stuff up.</p>
</section>
<section id="estimating-the-score" class="level3">
<h3 class="anchored" data-anchor-id="estimating-the-score">Estimating the score</h3>
<p>The reverse dynamics are driven by the score function <img src="https://latex.codecogs.com/png.latex?%0As_t(x)%20=%20%5Cnabla%20%5Clog(p(x,t)).%0A"> Typically, we do not know the density <img src="https://latex.codecogs.com/png.latex?p(x,t)%20=%20p(X_t=%20x%20%5Cmid%20X_0%20=%20x_0)"> and while we could solve the forward equation in order to estimate it, that is wildly inefficient in high dimensions.</p>
<p>If we can assume that for each <img src="https://latex.codecogs.com/png.latex?t">, <img src="https://latex.codecogs.com/png.latex?X_t%20%5Cmid%20X_0=x_0"> is approximately <img src="https://latex.codecogs.com/png.latex?N(%5Cmu_t,%20%5CSigma_t)">, then the resulting reverse diffusion is linear <img src="https://latex.codecogs.com/png.latex?%0AdX_t%20=%20%5Cleft%5B%5CSigma_t%5E%7B-1%7D%5Cmu_t%20-%5Cleft(%5Cfrac%7B1%7D%7B2%7D%20I%20+%20%5Csigma%5E2%5CSigma_t%5E%7B-1%7D%20%5Cright)X_t%5Cright%5Ddt%20+%20%5Csigma%20dW_t,%20%5Cqquad%20X_T%20=%20u.%0A"> In this case <img src="https://latex.codecogs.com/png.latex?X_t%20%5Cmid%20X_T%20=%20u"> is Gaussian with a mean and covariance that has closed form solution in terms of <img src="https://latex.codecogs.com/png.latex?%5CSigma_t"> and <img src="https://latex.codecogs.com/png.latex?%5Cmu_t"> (perhaps after some numerical quadrature and matrix exponentials).</p>
<p>Unfortunately, as discussed above this is not true. A better approximation would be a mixture of Gaussians but, in general, we can use <em>any</em> method to approximate <img src="https://latex.codecogs.com/png.latex?%0As_t(x,t).%0A"> There are no particular constraints on it, except we expect it to be fairly smooth<sup>39</sup> in both <img src="https://latex.codecogs.com/png.latex?t"> and <img src="https://latex.codecogs.com/png.latex?x">. Hence, we can just learn the score.</p>
<p>As we are going to solve the SDE numerically, we only need to estimate the score at a finite set of locations. In every application that I’ve seen, these are pre-specified, however it would also be possible to use a basis function expansion to interpolate to arbitrary time points. But, to be honest, I think every single example I’ve seen just uses a regularly spaced grid.</p>
<p>So how do we estimate <img src="https://latex.codecogs.com/png.latex?s_t">? Well just like every other situation, we need to define a likelihood (or, I guess, an optimisation criterion). One way to think about this would be to note that you’ll never <em>perfectly</em> recover the initial signal. This is because we need to solve a non-linear stochastic partial differential equation and there will, inherently, be noise in that solution. So instead, assume that we have an initial sample <img src="https://latex.codecogs.com/png.latex?x_0%20%5Csim%20p(X_0)"> and that after solving the backward equation we have an unbiased estimator of <img src="https://latex.codecogs.com/png.latex?x_0"> with standard deviation <img src="https://latex.codecogs.com/png.latex?%5Ctau_N">, where <img src="https://latex.codecogs.com/png.latex?N"> is the number of time steps. We know a lot about how the error of SDE solvers scale with <img src="https://latex.codecogs.com/png.latex?N"> and so we can use that to set an appropriate scale for <img src="https://latex.codecogs.com/png.latex?%5Ctau_N">. For instance, if you’re using the Euler–Maruyama method, then it has strong order <img src="https://latex.codecogs.com/png.latex?1/2"> and <img src="https://latex.codecogs.com/png.latex?%5Ctau_N%20=%20%5Cmathcal%7BO%7D(N%5E%7B-1/2%7D)"> would likely be an appropriate scaling.</p>
<p>This strongly suggests a likelihood that looks like <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7BX%7D_0(x_0,%20t)%20%5Cmid%20s_t,%20x_0,%20t%20%5Csim%20N(x_0,%20%5Ctau_N%5E2),%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Chat%7BX%7D_0(x_0,t)"> is the estimate of <img src="https://latex.codecogs.com/png.latex?X_0"> you get by running the reverse diffusion conditioned on <img src="https://latex.codecogs.com/png.latex?%5Chat%7BX%7D_t%20=%20X_t(x_0)">, where <img src="https://latex.codecogs.com/png.latex?X_t(x_0)"> is an exact sample at time <img src="https://latex.codecogs.com/png.latex?t"> from the forward diffusion started at <img src="https://latex.codecogs.com/png.latex?X_0%20=%20x_0">.</p>
<p>This is the key to the success of diffusion models: given our training sample <img src="https://latex.codecogs.com/png.latex?%5C%7Bx_0%5E%7B(i)%7D%5C%7D_%7Bi=1%7D%5En">, we generate new data <img src="https://latex.codecogs.com/png.latex?x_t(x_0)"> and we can generate as much of that data as we want. Furthermore, we can choose any set of <img src="https://latex.codecogs.com/png.latex?t">s we want. We can sample a single <img src="https://latex.codecogs.com/png.latex?(t,%20x_0)"> pair multiple times or we can look at a diversity of sampling data.</p>
<p>We can even try to recover an intermediate state <img src="https://latex.codecogs.com/png.latex?%5Chat%7BX%7D_%7Bt_1%7D(x_0,t_2)"> from information about a future state <img src="https://latex.codecogs.com/png.latex?X_%7Bt_2%7D(x_0)">, <img src="https://latex.codecogs.com/png.latex?t_2%20%3Et_1%20%5Cgeq%200">. This gives us quite the opportunity to target our learning to areas of the <img src="https://latex.codecogs.com/png.latex?(t,x)"> space where we have relatively poor estimates of the score function.</p>
<p>Of course, that’s not what people do. They do stochastic gradient descent to minimise <img src="https://latex.codecogs.com/png.latex?%0A%5Cmin_%7Bs_t%7D%5Cmathbb%7BE%7D_%7Bx_0%20%5Csim%20p(X_0),%20t%20%5Csim%20%5Ctext%7BUnif%7D%5B0,1%5D%7D%5Cleft(%5C%7Cx_0%20-%20%5Chat%7BX%7D_0(x_0,t)%5C%7C%5E2%5Cright)%0A"> possibly subject to some penalties on <img src="https://latex.codecogs.com/png.latex?s_t">. In fact, the distribution on <img src="https://latex.codecogs.com/png.latex?t"> is usually a discrete uniform. As with any sufficiently complex task, there is a lot of detailed work on exactly how to best parameterise, solve, and evaluate this optimisation procedure.</p>
</section>
<section id="generating-samples" class="level3">
<h3 class="anchored" data-anchor-id="generating-samples">Generating samples</h3>
<p>Once the model is trained and we have an estimate <img src="https://latex.codecogs.com/png.latex?%5Chat%7Bs%7D_t"> of the score function, we can generate new samples by first sampling <img src="https://latex.codecogs.com/png.latex?u%20%5Csim%20N(0,%20%5Csigma%5E2)"> and running the reverse diffusion starting from <img src="https://latex.codecogs.com/png.latex?X_t%20=%20u"> for some sufficiently large <img src="https://latex.codecogs.com/png.latex?t">. One of the advantages of using a variant of the OU process with a non-constant <img src="https://latex.codecogs.com/png.latex?%5Csigma"> is that we can choose <img src="https://latex.codecogs.com/png.latex?t"> to be smaller. Nevertheless, there will always be a little bit of error introduced by the fact that <img src="https://latex.codecogs.com/png.latex?X_t"> is only <em>approximately</em> <img src="https://latex.codecogs.com/png.latex?N(0,%20%5Csigma%5E2)">. But really, in the context of all of the other errors, this one is pretty small.</p>
<p>Anyway, run the diffusion backwards and if you’ve estiamted <img src="https://latex.codecogs.com/png.latex?s_t(x)"> well for the entire trajectory, you will get something that looks a lot like a new sample from <img src="https://latex.codecogs.com/png.latex?p(X_0)">.</p>
</section>
</section>
<section id="some-closing-thoughts" class="level2">
<h2 class="anchored" data-anchor-id="some-closing-thoughts">Some closing thoughts</h2>
<p>So there you have it, a very high-level mathematical introduction to diffusion models. Along the way, I accidentally put them in some sort of historical context, which hopefully helped make some things clearer.</p>
<p>Obviously there are <em>a lot</em> of cool things that can happen. The ability to, essentially, design our training trajectories should definitely be utilised. To do that, we would need some measure of uncertainty in the recovery of <img src="https://latex.codecogs.com/png.latex?s_t">. A possible way to do this would be to insert a <a href="https://arxiv.org/abs/1812.03973">probabilistic layer</a> into neural net architecture. If this isn’t the final layer in the network, it should be possible to clean up any artifacts it introduces with further layers, but the uncertainty estimates from this hidden layer would still be indicative of the uncertainty in the recovery of the scores. Assuming, of course, that this is successful, it would be possible to target the training at improving the uncertainty.</p>
<p>Beyond the possibility of using a non-uniform distribution for <img src="https://latex.codecogs.com/png.latex?t">, these uncertainty estimates might also help indicate the reliability of the generated sample. If the reverse diffusion spends too much time in areas with highly uncertain scores, it is unlikely that the generated data will be a good sample.</p>
<p>I am also somewhat curious about whether or not this type of system could be a reasonable alternative to bootstrap resampling in some contexts. I mean image creation is cool, but it’s not the only time people want to sample from a distribution that we only know empirically.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Maybe my favourite running gag was Ronny Chieng refusing to use the American pronunciation of Megan. ↩︎</p></li>
<li id="fn2"><p>I mean, my last post was recounting literature on the Markov property from the 70s and 80s. My only desire for this blog is for it to be very difficult to guess the topic of the next post.↩︎</p></li>
<li id="fn3"><p>I can’t stress enough that I made that tomato and feta tiktok pasta for dinner. Because that’s exactly how on trend I am.↩︎</p></li>
<li id="fn4"><p>I am very much managing expectations here↩︎</p></li>
<li id="fn5"><p>I cannot stress enough that this post will not help you implement a diffusion model. It might help you understand what is being implemented, but it also might not.↩︎</p></li>
<li id="fn6"><p>Really fucking relative.↩︎</p></li>
<li id="fn7"><p>Find a lesbian and follow her blog. Then you’ll get the good shit. There are tonnes of queer women in statistics. If you don’t know any it’s because they probably hate you.↩︎</p></li>
<li id="fn8"><p>The wokerati among you will notice that the quotient is the derivative of <img src="https://latex.codecogs.com/png.latex?%5Clog%20p(Q)">.↩︎</p></li>
<li id="fn9"><p>Look. I love you all. But I don’t want to introduce measure push-forwards. So if you want the maths read the damn paper.↩︎</p></li>
<li id="fn10"><p>This is the Knothe-Rosenblatt rearrangement of the optimal transport problem if you’re curious. And let’s face it, you’re not curious.↩︎</p></li>
<li id="fn11"><p>The normalising flow literature also has a lot of nice chats about how to model the <img src="https://latex.codecogs.com/png.latex?T_j">s using masked versions of the same neural net.↩︎</p></li>
<li id="fn12"><p>If you don’t have too much data, you could just replace that expectation with its empirical approximation. But when there is a lot of data, that will be expensive and stochastic gradient methods will perform better.↩︎</p></li>
<li id="fn13"><p>And be more likely to appropriately use your computational resources↩︎</p></li>
<li id="fn14"><p>We will see later that it doesn’t matter if we model <img src="https://latex.codecogs.com/png.latex?T"> or <img src="https://latex.codecogs.com/png.latex?S">, but the likelihood calculations come out nicer if we map from <img src="https://latex.codecogs.com/png.latex?p(x)"> to <img src="https://latex.codecogs.com/png.latex?q(u)"> rather than the other way around↩︎</p></li>
<li id="fn15"><p>There is a tonne of excellent software for efficiently solving differential equations!↩︎</p></li>
<li id="fn16"><p>My notation here is a bit awkward. The <img src="https://latex.codecogs.com/png.latex?x"> in <img src="https://latex.codecogs.com/png.latex?S(x,t)"> is keeping track of the <em>initial condition</em>, which in this case we do not know. But hey. Whatever.↩︎</p></li>
<li id="fn17"><p>Potentially even multi-modal↩︎</p></li>
<li id="fn18"><p>Classically this is done with a penalty, but you could also do it with things like early stopping and specific representations of the function. Which is nice because the continuous nomalising flow people use neural nets↩︎</p></li>
<li id="fn19"><p>The square on the norm isn’t always there↩︎</p></li>
<li id="fn20"><p>This was a big-sexy area in optimisation.↩︎</p></li>
<li id="fn21"><p>or at least a lot more expensive than, say, evaluating an exponential!↩︎</p></li>
<li id="fn22"><p>If you’re familiar with scalable ML methods, you might think <em>well we have solved this problem</em>. But I promise that it is not solved. The problem is that there’s no convenient analogue to subsampling the data. You can’t be half pregnant and you can’t half evaluate the forward map. There are, however, a pile of fabulous techniques that do their best to use multiple resolutions to get something that resembles a sensible MCMC scheme.↩︎</p></li>
<li id="fn23"><p>In our context, it’s a vector-valued function↩︎</p></li>
<li id="fn24"><p>Examples abound, but they include image reconstruction, tomographic inversion, and really anything where you’re estimating diffusivity↩︎</p></li>
<li id="fn25"><p>Gaussian processes↩︎</p></li>
<li id="fn26"><p>But not necessarily too creative. Not every transformation of a penalty makes a sensible prior. I’m looking at you <a href="http://www.siltanen-research.net/publ/LassasSiltanen2004.pdf">lasso on increments</a>.↩︎</p></li>
<li id="fn27"><p>Using the “well known” fact that the derivative of the log-determinant is the trace ↩︎</p></li>
<li id="fn28"><p>There are some complexities in practice around computing that trace. A straightforward implementation would require <img src="https://latex.codecogs.com/png.latex?d"> autodiff sweeps, which would make the model totally impractical. There are basically two options: <a href="https://arxiv.org/abs/1912.03579">massively simplify</a> <img src="https://latex.codecogs.com/png.latex?f"> to be something like <img src="https://latex.codecogs.com/png.latex?f(x)%20=%20h(Ax%20+%20b)"> for a smooth function <img src="https://latex.codecogs.com/png.latex?h"> or use a stochastic trace estimator.↩︎</p></li>
<li id="fn29"><p>Measured in the Frobenius norm, of course↩︎</p></li>
<li id="fn30"><p>or priors↩︎</p></li>
<li id="fn31"><p>data + distributional assumptions = information↩︎</p></li>
<li id="fn32"><p><img src="https://latex.codecogs.com/png.latex?q"> will be the asymptotic distribution of the diffusion, but it isn’t achieved at finite time.↩︎</p></li>
<li id="fn33"><p>Arguably, gradient descent is to machine learners what arse crack is to roadies. It’s always present, but with just enough variation to make it interesting.↩︎</p></li>
<li id="fn34"><p>Technically it’s an Ito integral, but because the integrand is deterministic it reduces to a white noise integral↩︎</p></li>
<li id="fn35"><p>The Markov property implies that <img src="https://latex.codecogs.com/png.latex?p(X_%7Bt_1%7D,%20X_%7Bt_2%7D%5Cmid%20X_0%20=%20x)%20=%20p(X_%7Bt_1%7D%5Cmid%20X_0%20=%20x)p(X_%7Bt_2%7D%20%5Cmid%20X_%7Bt_1%7D)">. ↩︎</p></li>
<li id="fn36"><p>Lipschitz and bounded↩︎</p></li>
<li id="fn37"><p>I hate the quotient rule↩︎</p></li>
<li id="fn38"><p>This is why the signs don’t seem to match the forwards equation from before, but you can convince yourself if you do the change of variables <img src="https://latex.codecogs.com/png.latex?%5Ctau%20=%20s%20-%20t">, the new variable <img src="https://latex.codecogs.com/png.latex?%5Ctau"> runs forward in time and <img src="https://latex.codecogs.com/png.latex?%5Cbar%7Bf%7D"> switches signs, which gives the right forwards equations (with different signs on the first and second order terms) in <img src="https://latex.codecogs.com/png.latex?(%5Ctau,x)">.↩︎</p></li>
<li id="fn39"><p>If the <img src="https://latex.codecogs.com/png.latex?p(X_0)"> is very rough, then, for very small <img src="https://latex.codecogs.com/png.latex?t">, <img src="https://latex.codecogs.com/png.latex?p(x,t)"> will also be quite rough but it will quickly become infinitely differentiable. It turns out that mathematicians know quite a lot about parabolic equations!↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2023,
  author = {Simpson, Dan},
  title = {Diffusion Models; or {Yet} Another Way to Sample from an
    Arbitrary Distribution},
  date = {2023-02-09},
  url = {https://dansblog.netlify.app/posts/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2023" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2023. <span>“Diffusion Models; or Yet Another Way to
Sample from an Arbitrary Distribution.”</span> February 9, 2023. <a href="https://dansblog.netlify.app/posts/">https://dansblog.netlify.app/posts/</a>.
</div></div></section></div> ]]></description>
  <category>Diffusion model</category>
  <category>Introductions</category>
  <guid>https://dansblog.netlify.app/posts/2023-01-30-diffusion/diffusion.html</guid>
  <pubDate>Wed, 08 Feb 2023 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2023-01-30-diffusion/megan.jpeg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Markovian Gaussian processes: A lot of theory and some practical stuff</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2023-01-21-markov/markov.html</link>
  <description><![CDATA[ 





<p>Gaussian processes are lovely things. I’m a big fan. They are, however, thirsty. They will take your memory, your time, and anything else they can. Basically, the art of fitting Gaussian process models is the fine art of reducing the GP model until it’s simple enough to fit while still being flexible enough to be useful.</p>
<p>There’s a long literature on effective approximation to Gaussian Processes that don’t turn out to be computational nightmares. I’m definitely not going to summarise them here, but I’ll point to an <a href="https://dansblog.netlify.app/posts/2021-11-24-getting-into-the-subspace/getting-into-the-subspace.html">earlier (quite technical) post</a> that mentioned some of them. The particular computational approximation that I am most fond of makes use of the Markov property and efficient sparse matrix computations to reduce memory use and make the linear algebra operations significantly faster.</p>
<p>One of the odder challenges with Markov models is that information about how Markov structures work in more than one dimension can be quite difficult to find. So in this post I am going to lay out some of the theory.</p>
<p>A much more practical (and readable) introduction to this topic can be found in this <a href="https://arxiv.org/abs/2111.01084">lovely paper by Finn, David, and Håvard</a>. So don’t feel the burning urge to read this post if you don’t want to. I’m approaching the material from a different viewpoint and, to be very frank with you, I was writing something else and this section just became extremely long so I decided to pull it out into a blog post.</p>
<p>So please enjoy today’s entry in <em>Dan writes about the weird corners of Gaussian processes</em>. I promise that even though this post doesn’t make it seem like this stuff is useful, it really is. If you want to know anything else about this topic, essentially all of the Markov property parts of this post come from Rozanov’s excellent book <a href="https://link.springer.com/book/10.1007/978-1-4613-8190-7">Markov Random Fields</a>.</p>
<section id="gaussian-processes-via-the-covariance-operator" class="level2">
<h2 class="anchored" data-anchor-id="gaussian-processes-via-the-covariance-operator">Gaussian processes via the covariance operator</h2>
<p>By the end of today’s post we will have defined<sup>1</sup> a Markovian process in terms of its reproducing Kernel Hilbert space (RKHS), that is the space of functions that contain the posterior mean<sup>2</sup> when there are Gaussian observations. This space always exists and its inner product is entirely determined by the covariance function of a GP. That said, for a given covariance function, the RKHS can. be difficult to find. Furthermore, the problem with basing our modelling off a RKHS is that it is not immediately obvious how we will do the associated computations This is in contrast to a covariance function approach, where it is quite easy<sup>3</sup> to work out how to convert the model specification to something you can attack with a computer. By the end of this post we will have tacked that.</p>
<p>The extra complexity of the RKHS pays off in modelling flexibility, both in terms of the types of model that can be build and the spaces<sup>4</sup> you can build them on. I am telling you this now because things are about to get a little mathematical.</p>
<p>To motivate the technique, let’s consider the covariance operator <img src="https://latex.codecogs.com/png.latex?%0A%5B%5Cmathcal%7BC%7Df%5D(s)%20=%20%5Cint_T%20c(s,%20s')%20f(s')%20%5C,%20ds',%0A"> where <img src="https://latex.codecogs.com/png.latex?T"> is the domain over which the GP is defined (usually <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5Ed"> but maybe you’re feeling frisky).</p>
<p>To see how this could be useful, we are going to need to think a little bit about how we can simulate a multivariate Gaussian random variable <img src="https://latex.codecogs.com/png.latex?N(0,%20%5CSigma)">. To do this, we first compute the square root<sup>5</sup> <img src="https://latex.codecogs.com/png.latex?L%20=%20%5CSigma%5E%7B1/2%7D"> and sample a vector of iid standard normal variables <img src="https://latex.codecogs.com/png.latex?z%20%5Csim%20N(0,I)">. Then <img src="https://latex.codecogs.com/png.latex?u%20=%20Lz%20%5Csim%20N(0,%20%5CSigma)">. You can check it by checking the covariance. (it’s ok. I’ll wait.)</p>
<p>While the square root of the covariance operator <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BC%7D%5E%7B1/2%7D"> is a fairly straightforward mathematical object<sup>6</sup>, the analogue of the iid vector of standard normal random variables is a bit more complex.</p>
<section id="white-noise-and-its-associated-things" class="level3">
<h3 class="anchored" data-anchor-id="white-noise-and-its-associated-things">White noise and its associated things</h3>
<p>Thankfully I’ve covered this <a href="https://dansblog.netlify.app/posts/2022-09-07-priors5/priors5.html#part-2-an-invitation-to-the-theory-of-stationary-gaussian-processes">in a previous blog</a>. The engineering definition of white noise as a GP <img src="https://latex.codecogs.com/png.latex?w(%5Ccdot)"> such that for every <img src="https://latex.codecogs.com/png.latex?s">, <img src="https://latex.codecogs.com/png.latex?w(s)"> is an iid <img src="https://latex.codecogs.com/png.latex?N(0,1)"> random variable is not good enough for our purposes. Such a process is hauntingly irregular<sup>7</sup> and it’s fairly difficult to actually do anything with it. Instead, we consider white noise as a random function defined on the subsets of our domain. This feels like it’s just needless technicality, but it turns out to actually be very very useful.</p>
<div id="def-white-noise" class="theorem definition">
<p><span class="theorem-title"><strong>Definition 1 (White noise)</strong></span> A (complex) Gaussian white noise is a random measure<sup>8</sup> <img src="https://latex.codecogs.com/png.latex?W(%5Ccdot)"> such that, for every<sup>9</sup> disjoint<sup>10</sup> pair of sets <img src="https://latex.codecogs.com/png.latex?A,%20B"> satisfies the following properties</p>
<ol type="1">
<li><img src="https://latex.codecogs.com/png.latex?W(A)%20%5Csim%20N(0,%20%7CA%7C)"></li>
<li>If <img src="https://latex.codecogs.com/png.latex?A"> and <img src="https://latex.codecogs.com/png.latex?B"> are disjoint then <img src="https://latex.codecogs.com/png.latex?W(A%5Ccup%20B)%20=%20W(A)%20+%20W(B)"></li>
<li>If <img src="https://latex.codecogs.com/png.latex?A"> and <img src="https://latex.codecogs.com/png.latex?B"> are disjoint then <img src="https://latex.codecogs.com/png.latex?W(A)"> and <img src="https://latex.codecogs.com/png.latex?W(B)"> are uncorrelated<sup>11</sup>, ie <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(W(A)%20%5Coverline%7BW(B)%7D)%20=%200">.</li>
</ol>
</div>
<p>This doesn’t feel like we are helping very much because how on <em>earth</em> am I going to define the product <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BC%7D%5E%7B1/2%7D%20W">? Well the answer, you may be shocked to discover, requires a little bit more maths. We need to define an integral, which turns out to not be <em>shockingly</em> difficult to do. The trick is to realise that if I have an indicator function <img src="https://latex.codecogs.com/png.latex?%0A1_A(s)%20=%20%5Cbegin%7Bcases%7D%201,%20%5Cqquad%20&amp;s%20%5Cin%20A%20%5C%5C%200,%20&amp;%20s%20%5Cnot%20%5Cin%20A%20%5Cend%7Bcases%7D%0A"> then<sup>12</sup> <img src="https://latex.codecogs.com/png.latex?%0A%5Cint_T%201_A(s)%5C,%20dW(s)%20=%20%5Cint_A%20dW(s)%20=%20W(A)%20%5Csim%20N(0,%20%7CA%7C).%0A"> In that calculation, I just treated <img src="https://latex.codecogs.com/png.latex?W(s)"> like I would any other measure. (If you’re more of a probability type of girl, it’s the same thing as noticing <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(1_A(X)%20=%20%5CPr(X%20%5Cin%20A)">.)</p>
<p>We can extend the above by taking the sum of two indicator function <img src="https://latex.codecogs.com/png.latex?%0Af(s)%20=%20f_11_%7BA_1%7D(s)%20+%20f_2%201_%7BA_2%7D(s),%0A"> where <img src="https://latex.codecogs.com/png.latex?A_1"> and <img src="https://latex.codecogs.com/png.latex?A_2"> are disjoint and <img src="https://latex.codecogs.com/png.latex?f_1"> and <img src="https://latex.codecogs.com/png.latex?f_2"> are any real numbers. By the same reasoning above, and using the linearity of the integral, we get that <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Cint_T%20f(s)%20%5C,%20dW(s)%20&amp;=%20f_1%20%5Cint_%7BA_1%7D%20%5C,d%20W(s)%20+%20f_2%20%5Cint_%7BA_2%7D%20%5C,d%20W(s)%20%5C%5C%0A&amp;=%20N(0,%20f_1%5E2%20%7CA_1%7C%20+%20f_2%5E2%20%7CA_2%7C)%20%5C%5C%0A&amp;=%20N%5Cleft(0,%20%5Cint_T%20f(s)%5E2%20%5C,ds%5Cright),%0A%5Cend%7Balign*%7D"> where the last line follows by doing the ordinary<sup>13</sup> integral of <img src="https://latex.codecogs.com/png.latex?f(s)">.</p>
<p>It turns out that every interesting function can be written as the limit of piecewise constant functions<sup>14</sup> and we can therefore <em>define</em> for any function<sup>15</sup> <img src="https://latex.codecogs.com/png.latex?f%5Cin%20L%5E2(T)"> <img src="https://latex.codecogs.com/png.latex?%0A%5Cint%20f(s)%20%5C,%20dW(s)%20%5Csim%20N%5Cleft(0,%20%5Cint_T%20f(s)%5E2%20%5C,ds%5Cright).%0A"></p>
<p>With this notion in hand, we can finally define the action of an operator on white noise.</p>
<div id="def-operator-on-noise" class="theorem definition">
<p><span class="theorem-title"><strong>Definition 2 (The action of an operator on white noise)</strong></span> Let <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BA%7D"> be an operator on some Hilbert space of functions <img src="https://latex.codecogs.com/png.latex?H"> with adjoint <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BA%7D%5E*">, then we define <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BA%7DW"> to be the random measure that satisfies, for every <img src="https://latex.codecogs.com/png.latex?f%20%5Cin%20%5Coperatorname%7BDom%7D(%5Cmathcal%7BA%5E*%7D)">, <img src="https://latex.codecogs.com/png.latex?%0A%5Cint_T%20f(s)%20%5C,%20d%20(%5Cmathcal%7BA%7DW)(s)%20=%20%5Cint_T%20%5Cmathcal%7BA%7D%5E*f(s)%20%5C,%20dW(s).%0A"></p>
</div>
</section>
<section id="the-generalised-gaussian-process-eta-mathcalc12w" class="level3">
<h3 class="anchored" data-anchor-id="the-generalised-gaussian-process-eta-mathcalc12w">The generalised Gaussian process <img src="https://latex.codecogs.com/png.latex?%5Ceta%20=%20%5Cmathcal%7BC%7D%5E%7B1/2%7DW"></h3>
<p>One of those inconvenient things that you may have noticed from above is that <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BC%7D%5E%7B1/2%7DW"> is <em>not</em> going to be a function. It is going to be a measure or, as it is more commonly known, a <em>generalised Gaussian process</em>. This is the GP analogue of a generalised function and, as such, only gives an actual value when you integrate it against some sufficiently smooth function.</p>
<div id="def-generalised-gp" class="theorem definition">
<p><span class="theorem-title"><strong>Definition 3 (Generalised Gaussian Process)</strong></span> A generalised Gaussian process <img src="https://latex.codecogs.com/png.latex?%5Cxi"> is a random signed measure (or a random generalised function) that, for any <img src="https://latex.codecogs.com/png.latex?f%20%5Cin%20C%5E%5Cinfty_0(T)">, <img src="https://latex.codecogs.com/png.latex?%5Cint_T%20f(s)%5C,d%5Cxi(s)"> is Gaussian. We will often write <img src="https://latex.codecogs.com/png.latex?%0A%5Cxi(f)%20=%20%5Cint_T%20f(s)%5C,d%5Cxi(s),%0A"> which helps us understand that a generalised GP is indexed by functions.</p>
</div>
<p>In order to separate this out from the ordinary GP <img src="https://latex.codecogs.com/png.latex?u(s)">, we will write it as <img src="https://latex.codecogs.com/png.latex?%0A%5Ceta%20=%20%5Cmathcal%7BC%7D%5E%7B1/2%7DW.%0A"> These two ideas coincide in the special case where <img src="https://latex.codecogs.com/png.latex?%0A%5Ceta%20=%20u(s)%5C,ds,%0A"> which will occur when <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BC%7D%5E%7B1/2%7D"> smooths the white noise sufficiently. In all of the cases we really care about today, this happens. But there are plenty of Gaussian processes that can only be considered as generalised GPs<sup>16</sup></p>
</section>
<section id="approximating-gps-when-mathcalc-12-is-a-differential-operator" class="level3">
<h3 class="anchored" data-anchor-id="approximating-gps-when-mathcalc-12-is-a-differential-operator">Approximating GPs when <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BC%7D%5E%7B-1/2%7D"> is a differential operator</h3>
<p>This type of construction for <img src="https://latex.codecogs.com/png.latex?%5Ceta"> is used in two different situations: kernel convolution methods directly use the representation, and the SPDE methods of <a href="https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2011.00777.x">Lindgren, Lindström and Rue</a> use it indirectly.</p>
<p>I’m interested in the SPDE method, as it ties into today’s topic. Also because it works really well. This method uses a slightly modified version of the above equation <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathcal%7BC%7D%5E%7B-1/2%7D%5Ceta%20=%20W,%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BC%7D%5E%7B-1/2%7D"> is the (left) inverse of <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BC%7D%5E%7B1/2%7D">. I have covered this method <a href="https://dansblog.netlify.app/posts/2021-11-24-getting-into-the-subspace/getting-into-the-subspace.html#example-3-the-spde-method">in a previous post</a>, but to remind you the SDPE method in its simplest form involves three steps:</p>
<ol type="1">
<li><p>Approximate <img src="https://latex.codecogs.com/png.latex?%5Ceta%20=%20%5Csum_%7Bj=1%7D%5En%20u_j%20%5Cpsi_j(s)%5C,ds"> for some set of weights <img src="https://latex.codecogs.com/png.latex?u%20%5Csim%20N(0,%20Q%5E%7B-1%7D)"> and a set of deterministic functions <img src="https://latex.codecogs.com/png.latex?%5Cpsi_j"> that we are going to use to approximate the GP</p></li>
<li><p>Approximate<sup>17</sup> the <em>test function</em> <img src="https://latex.codecogs.com/png.latex?f%20=%20%5Csum_%7Bk=1%7D%5En%20f_k%20%5Cpsi_k(s)"> for some set of deterministic weights <img src="https://latex.codecogs.com/png.latex?f_j"></p></li>
<li><p>Plug these approximations into the equation <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BC%7D%5E%7B-1/2%7D%20%5Ceta%20=%20W"> to get the equation <img src="https://latex.codecogs.com/png.latex?%0A%5Csum_%7Bk,j=1%7D%5En%20u_j%20f_k%20%5Cint_T%20%5Cpsi_k(s)%20%5Cmathcal%7BC%7D%5E%7B-1/2%7D%20%5Cpsi_j(s)%5C,ds%20%5Csim%20N%5Cleft(0,%20%5Csum_%7Bj,k=1%7D%5En%20%5Cpsi_j(s)%5Cpsi_k(s)%5C,ds%5Cright)%0A"></p></li>
</ol>
<p>As this has to be true for <em>every</em> vector <img src="https://latex.codecogs.com/png.latex?f">, this is equivalent to the linear system <img src="https://latex.codecogs.com/png.latex?%0AK%20u%20%5Csim%20N(0,%20C),%0A"> where <img src="https://latex.codecogs.com/png.latex?K_%7Bkj%7D%20=%20%20%5Cint_T%20%5Cpsi_k(s)%20%5Cmathcal%7BC%7D%5E%7B-1/2%7D%20%5Cpsi_j(s)%5C,ds"> and <img src="https://latex.codecogs.com/png.latex?C_%7Bkj%7D%20=%20%5Csum_%7Bj,k=1%7D%5En%20%5Cpsi_j(s)%5Cpsi_k(s)">.</p>
<p>Obviously this method is only going to be useful if it’s possible to compute the elements of <img src="https://latex.codecogs.com/png.latex?K"> and <img src="https://latex.codecogs.com/png.latex?C"> efficiently. In the special case where <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BC%7D%5E%7B-1/2%7D"> is a differential operator<sup>18</sup> and the basis functions are chosen to have compact support<sup>19</sup>, these calculations form the basis of the finite element method for solving partial differential equations.</p>
<p>The most important thing, however, is that if <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BC%7D%5E%7B-1/2%7D"> is a differential operator <em>and</em> the basis functions have compact support, the matrix <img src="https://latex.codecogs.com/png.latex?K"> is sparse and the matrix <img src="https://latex.codecogs.com/png.latex?C"> can be made<sup>20</sup> diagonal, which means that <img src="https://latex.codecogs.com/png.latex?%0Au%20%5Csim%20N(0,%20K%5E%7B-1%7D%20C%20K%5E%7B-T%7D)%0A"> has a sparse precision matrix. This can be used to make inference with these GPs very efficient and is the basis for GPs in the <a href="http://r-inla.org">INLA software</a>.</p>
<p>A natural question to ask is <em>when will we end up with a sparse precision matrix</em>? The answer is not quite when <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BC%7D%5E%7B-1/2%7D"> is a differential operator. Although that will lead to a sparse precision matrix (and a Markov process), it is not required. So the purpose of the rest of this post is to quantify all of the cases where a GP has the Markov property and we can make use of the resulting computational savings.</p>
</section>
</section>
<section id="the-markov-property-for-on-abstract-spaces" class="level2">
<h2 class="anchored" data-anchor-id="the-markov-property-for-on-abstract-spaces">The Markov property for on abstract spaces</h2>
<p>Part of the reason why I introduced the notion of a generalised Gaussian process is that it is useful in the definition of the Markov process. Intuitively, we know what this definition is going to be: if I split my space into three disjoint sets <img src="https://latex.codecogs.com/png.latex?A">, <img src="https://latex.codecogs.com/png.latex?%5CGamma"> and <img src="https://latex.codecogs.com/png.latex?B"> in such a way that you can’t get from <img src="https://latex.codecogs.com/png.latex?A"> to <img src="https://latex.codecogs.com/png.latex?B"> without passing through <img src="https://latex.codecogs.com/png.latex?%5CGamma">, then the Markov property should say, roughly, that every random variable <img src="https://latex.codecogs.com/png.latex?%5C%7Bx(s):%20s%5Cin%20A%5C%7D"> is conditionally independent of every random variable <img src="https://latex.codecogs.com/png.latex?%5C%7Bx(s):%20s%20%5Cin%20B%5C%7D"> <em>given</em> (or conditional on) knowing the values of the entire set <img src="https://latex.codecogs.com/png.latex?%5C%7Bx(s):%20s%20%5Cin%20%5CGamma%5C%7D">.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2023-01-21-markov/markov.png" class="img-fluid figure-img"></p>
<figcaption>A graphical illustration of the three sets used above Markov property.</figcaption>
</figure>
</div>
<p>That definition is all well and good for a hand-wavey approach, but unfortunately it doesn’t quite hold up to mathematics. In particular, if we try to make <img src="https://latex.codecogs.com/png.latex?%5CGamma"> a line<sup>21</sup>, we will hit a few problems. So instead let’s do this properly.</p>
<p>All of the material here is covered in Rozanov’s excellent but unimaginatively named book <em>Markov Random Fields</em>.</p>
<p>To set us up, we should consider the types of sets we have. There are three main sets that we are going to be using: the open<sup>22</sup> set <img src="https://latex.codecogs.com/png.latex?S_1%20%5Csubset%20T">, its boundary<sup>23</sup> <img src="https://latex.codecogs.com/png.latex?%5CGamma%20%5Csupseteq%20%5Cpartial%20S">. For example, if <img src="https://latex.codecogs.com/png.latex?T%20%20=%20%5Cmathbb%7BR%7D%5E2"> and <img src="https://latex.codecogs.com/png.latex?S"> is the interior of the unit circle, and its open complement <img src="https://latex.codecogs.com/png.latex?S_2%20=%20S_1%5EC%20%5Cbackslash%20%5Cpartial%20S_1">. For a 2D example, if <img src="https://latex.codecogs.com/png.latex?S_1"> is the <em>interior</em> of the unit circle, then <img src="https://latex.codecogs.com/png.latex?%5CGamma"> could be the unit circle, and <img src="https://latex.codecogs.com/png.latex?S_2"> would be the <em>exterior</em> of the unit circle.</p>
<p>One problem with these sets, is that while <img src="https://latex.codecogs.com/png.latex?S_1"> will be a 2D set, <img src="https://latex.codecogs.com/png.latex?%5CGamma"> is only one dimensional (it’s a circle, so it’s a line!). This causes some troubles mathematically, which we need to get around by using the <img src="https://latex.codecogs.com/png.latex?%5Cepsilon"> fattening of <img src="https://latex.codecogs.com/png.latex?%5CGamma">, which is the set <img src="https://latex.codecogs.com/png.latex?%0A%5CGamma%5E%5Cepsilon%20=%20%5C%7Bs%20%5Cin%20T%20:%20d(s,%20%5CGamma)%20%3C%20%5Cepsilon%5C%7D,%0A"> where <img src="https://latex.codecogs.com/png.latex?d(s,%20%5CGamma)"> is the distance from <img src="https://latex.codecogs.com/png.latex?s"> to the nearest point in <img src="https://latex.codecogs.com/png.latex?%5CGamma">.</p>
<p>With all of this in hand we can now give a general definition of the Markov property.</p>
<div id="def-markov" class="theorem definition">
<p><span class="theorem-title"><strong>Definition 4 (The Markov property for a generalised Gaussian process)</strong></span> Consider a zero mean generalised GP<sup>24</sup> <img src="https://latex.codecogs.com/png.latex?%5Cxi">. For any<sup>25</sup> subset <img src="https://latex.codecogs.com/png.latex?A%20%5Csubset%20T">, we define the collection of random variables<sup>26</sup> <img src="https://latex.codecogs.com/png.latex?%0AH(A)%20=%20%5Coperatorname%7Bspan%7D%5C%7B%5Cxi(f):%20%5Coperatorname%7Bsupp%7D(f)%20%5Csubseteq%20A%5C%7D.%0A"> We will call <img src="https://latex.codecogs.com/png.latex?%5C%7BH(A);%20A%20%5Csubseteq%20T%5C%7D"> the <em>random field</em><sup>27</sup> associated with <img src="https://latex.codecogs.com/png.latex?%5Cxi">.</p>
<p>Let <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D"> be a system of domains<sup>28</sup> in <img src="https://latex.codecogs.com/png.latex?T">. We say that <img src="https://latex.codecogs.com/png.latex?%5Cxi"> has the Markov<sup>29</sup> property (with respect to <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D">) if, for all <img src="https://latex.codecogs.com/png.latex?S_1%20%5Cin%20%5Cmathcal%7BG%7D"> and for any sufficiently small <img src="https://latex.codecogs.com/png.latex?%5Cepsilon%20%3E%200">, <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(xy%20%5Cmid%20H(%5CGamma%5E%5Cepsilon))%20=%200,%20%5Cqquad%20x%20%5Cin%20H(S_1),%20y%20%5Cin%20H(S_2),%0A"> where <img src="https://latex.codecogs.com/png.latex?%5CGamma%20=%20%5Cpartial%20S_1"> and <img src="https://latex.codecogs.com/png.latex?S_2%20=%20S_1%5EC%20%5Cbackslash%20%5CGamma">.</p>
</div>
<section id="rewriting-the-markov-property-i-splitting-spaces" class="level3">
<h3 class="anchored" data-anchor-id="rewriting-the-markov-property-i-splitting-spaces">Rewriting the Markov property I: Splitting spaces</h3>
<p>The Markov property defined above is great and everything, but in order to manipulate it, we need to think carefully about the how the domains <img src="https://latex.codecogs.com/png.latex?S_1">, <img src="https://latex.codecogs.com/png.latex?%5CGamma%5E%5Cepsilon"> and <img src="https://latex.codecogs.com/png.latex?S_2"> can be used to divide up the space <img src="https://latex.codecogs.com/png.latex?H(T)">. To do this, we need to basically localise the Markov property to one set of <img src="https://latex.codecogs.com/png.latex?S_1">, <img src="https://latex.codecogs.com/png.latex?%5CGamma">, <img src="https://latex.codecogs.com/png.latex?S_2">. This concept is called a <em>splitting</em><sup>30</sup> of <img src="https://latex.codecogs.com/png.latex?H(S_1)"> and <img src="https://latex.codecogs.com/png.latex?H(S_2)"> by <img src="https://latex.codecogs.com/png.latex?H(%5CGamma%5E%5Cepsilon)">.</p>
<div id="def-splitting" class="theorem definition">
<p><span class="theorem-title"><strong>Definition 5</strong></span> For some domain <img src="https://latex.codecogs.com/png.latex?S_1"> and <img src="https://latex.codecogs.com/png.latex?%5CGamma%20%5Csupseteq%20%5Cpartial%20S_1">, set <img src="https://latex.codecogs.com/png.latex?S_2%20=%20(S_1%20%5Ccup%20%5CGamma)%5Ec">. The space <img src="https://latex.codecogs.com/png.latex?H(%5CGamma%5E%5Cepsilon)"> splits <img src="https://latex.codecogs.com/png.latex?H(S_1)"> and <img src="https://latex.codecogs.com/png.latex?H(S_2)"> if <img src="https://latex.codecogs.com/png.latex?%0AH(T)%20=%20H(S_1%20%5Cominus%20%5CGamma%5E%5Cepsilon)%20%5Coplus%20H(%5CGamma%5E%5Cepsilon)%20%5Coplus%20H(S_2%20%5Cominus%20%5CGamma%5E%5Cepsilon),%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Coplus"> is the sum of orthogonal components<sup>31</sup> and <img src="https://latex.codecogs.com/png.latex?x%5Cin%20H(S%20%5Cominus%20%5CGamma%5E%5Cepsilon)"> if and only if there is some <img src="https://latex.codecogs.com/png.latex?y%20%5Cin%20H(S)"> such that<sup>32</sup> <img src="https://latex.codecogs.com/png.latex?%0Ax%20=%20y%20-%20%5Cmathbb%7BE%7D(y%20%5Cmid%20H(%5CGamma%5E%5Cepsilon)).%0A"></p>
</div>
<p>This emphasizes that we can split our space into three separate components: inside <img src="https://latex.codecogs.com/png.latex?S_1">, outside <img src="https://latex.codecogs.com/png.latex?S_1"> and on the boundary of <img src="https://latex.codecogs.com/png.latex?S_1"> and the ability to do that for any<sup>33</sup> domain is the key part of the Markov<sup>34</sup> property.</p>
<p>A slightly more convenient way to deal with splitting spaces is the case where the we have overlapping sets <img src="https://latex.codecogs.com/png.latex?A">, <img src="https://latex.codecogs.com/png.latex?B"> that cover the domain (ie <img src="https://latex.codecogs.com/png.latex?A%20%5Ccup%20B%20=%20T">) and the splitting set is their intersection <img src="https://latex.codecogs.com/png.latex?S%20=%20A%20%5Ccap%20B">. In this case, the splitting equation becomes <img src="https://latex.codecogs.com/png.latex?%0AH(A)%5E%5Cperp%20%5Cperp%20H(B)%5E%5Cperp.%0A"> I shan’t lie: that looks wild. But it makes sense when you take <img src="https://latex.codecogs.com/png.latex?A%20=%20S_1%20%5Ccup%20%5CGamma%5E%5Cepsilon"> and <img src="https://latex.codecogs.com/png.latex?B%20=%20S_2%20%5Ccup%20%5CGamma%5E%5Cepsilon">, in which case <img src="https://latex.codecogs.com/png.latex?H(A)%5E%5Cperp%20=%20H(S_2)"> and <img src="https://latex.codecogs.com/png.latex?H(B)%5E%5Cperp%20=%20H(S_1)">.</p>
<p>The final thing to add before we can get to business is a way to get rid of all of the annoying <img src="https://latex.codecogs.com/png.latex?%5Cepsilon">s. The idea is to take the intersection of all of the <img src="https://latex.codecogs.com/png.latex?H(%5CGamma%5E%5Cepsilon)"> as the splitting space. If we define <img src="https://latex.codecogs.com/png.latex?%0AH_+(%5CGamma)%20=%20%5Cbigcap_%7B%5Cepsilon%3E0%7D%20H(%5CGamma%5E%5Cepsilon)%0A"> we can re-write<sup>35</sup> the splitting equation as <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A&amp;H_+(%5CGamma)%20=%20H_+(S_1%20%5Ccup%20%5CGamma)%20%5Ccap%20H_+(S_1%20%5Ccup%20%5CGamma)%20%5C%5C%0A&amp;%20H_+(S_1%20%5Ccup%20%5CGamma)%5E%5Cperp%20%5Cperp%20H_+(S_2%20%5Ccup%20%5CGamma)%5E%5Cperp.%0A%5Cend%7Balign*%7D"></p>
<p>This gives the following statement of the Markov property.</p>
<div id="def-markov2" class="theorem definition">
<p><span class="theorem-title"><strong>Definition 6</strong></span> Let <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D"> be a system of domains<sup>36</sup> in <img src="https://latex.codecogs.com/png.latex?T">. We say that <img src="https://latex.codecogs.com/png.latex?%5Cxi"> has the Markov property (with respect to <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D">) if, for all <img src="https://latex.codecogs.com/png.latex?S_1%20%5Cin%20%5Cmathcal%7BG%7D">, <img src="https://latex.codecogs.com/png.latex?%5CGamma%5Csupseteq%20%5Cpartial%20S_1"> ,<img src="https://latex.codecogs.com/png.latex?S_2%20=%20S_1%5EC%20%5Cbackslash%20%5CGamma">, we have, for some <img src="https://latex.codecogs.com/png.latex?%5Cepsilon%20%3E%200"> <img src="https://latex.codecogs.com/png.latex?%0AH_+(%5CGamma%5E%5Cepsilon)%20=%20H_+(S_1%20%5Ccup%20%5CGamma%5E%5Cepsilon)%20%5Ccap%20H_+(S_1%20%5Ccup%20%5CGamma%5E%5Cepsilon)%0A"> and <img src="https://latex.codecogs.com/png.latex?%0AH_+(S_1%20%5Ccup%20%5CGamma)%5E%5Cperp%20%5Cperp%20H_+(S_2%20%5Ccup%20%5CGamma)%5E%5Cperp.%0A"></p>
</div>
</section>
<section id="rewriting-the-markov-property-ii-the-dual-random-field-ha" class="level3">
<h3 class="anchored" data-anchor-id="rewriting-the-markov-property-ii-the-dual-random-field-ha">Rewriting the Markov property II: The dual random field <img src="https://latex.codecogs.com/png.latex?H%5E*(A)"></h3>
<p>We are going to fall further down the abstraction rabbit hole in the hope of ending up somewhere useful. In this case, we are going to invent an object that has no reason to exist and we will show that it can be used to compactly restate the Markov property. It will turn out in the next section that it is actually a useful characterization that will lead (finally) to an operational characterisation of a Markovian Gaussian process.</p>
<div id="def-dual-field" class="theorem definition">
<p><span class="theorem-title"><strong>Definition 7 (Dual random field)</strong></span> Let <img src="https://latex.codecogs.com/png.latex?%5Cxi"> be a generalised Gaussian process with an associated random field <img src="https://latex.codecogs.com/png.latex?H(A)">, <img src="https://latex.codecogs.com/png.latex?A%20%5Csubseteq%20T"> and let <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D"> be a complete system of open domains in <img src="https://latex.codecogs.com/png.latex?T">. The <em>dual</em> to the random field <img src="https://latex.codecogs.com/png.latex?H(A)">, <img src="https://latex.codecogs.com/png.latex?A%20%5Csubseteq%20T"> on the system <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D"> is the random field <img src="https://latex.codecogs.com/png.latex?H%5E*(A)">, <img src="https://latex.codecogs.com/png.latex?A%20%5Csubseteq%20T"> that satisfies <img src="https://latex.codecogs.com/png.latex?%0AH%5E*(T)%20=%20H(T)%0A"> and <img src="https://latex.codecogs.com/png.latex?%0AH%5E*(A)%20=%20H_+(A%5Ec)%5E%5Cperp,%20%5Cqquad%20A%20%5Cin%20%5Cmathcal%7BG%7D.%0A"></p>
</div>
<p>This definition looks frankly a bit wild, but I promise you, we will use it.</p>
<p>The reason for its structure is that it directly relates to the Markov property. In particular, the existence of a dual field implies that, if we have any <img src="https://latex.codecogs.com/png.latex?S_1%20%5Cin%20%5Cmathcal%7BG%7D">, then <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0AH_+(S_1%20%5Ccup%20%5Cbar%7B%5CGamma%5E%5Cepsilon%7D)%20%5Ccap%20H_+(S_1%20%5Ccup%20%5Cbar%7B%5CGamma%5E%5Cepsilon%7D)%20&amp;=%20H%5E*((S_1%20%5Ccup%20%5Cbar%7B%5CGamma%5E%5Cepsilon%7D)%5Ec)%5E%5Cperp%20%5Ccap%20H%5E*((S_2%20%5Ccup%20%5Cbar%7B%5CGamma%5E%5Cepsilon%7D)%5Ec)%5E%5Cperp%20%5C%5C%0AH%5E*((S_1%20%5Ccup%20%5Cbar%7B%5CGamma%5E%5Cepsilon%7D)%5Ec%20%5Ccup%20(S_2%20%5Ccup%20%5Cbar%7B%5CGamma%5E%5Cepsilon%7D)%5Ec)%20%5C%5C%0A&amp;=%20H_+((S_1%20%5Ccup%20%5Cbar%7B%5CGamma%5E%5Cepsilon%7D)%20%5Ccap%20(S_2%20%5Ccup%20%5Cbar%7B%5CGamma%5E%5Cepsilon%7D))%20%5C%5C%0A&amp;=%20H_+(%5CGamma%5E%5Cepsilon).%0A%5Cend%7Balign*%7D"> That’s the first thing we need to show to demonstrate the Markov property.</p>
<p>The second part is much easier. If we note that <img src="https://latex.codecogs.com/png.latex?(S_2%20%5Ccup%20%5CGamma)%5Ec%20=%20S_1%20%5Cbackslash%20%5CGamma">, it follows that <img src="https://latex.codecogs.com/png.latex?%0AH_+(S_1%20%5Ccup%20%5CGamma)%5E%5Cperp%20=%20H%5E*(S_2%20%5Cbackslash%20%5CGamma).%0A"></p>
<p>This gives us our third (and final) characterisation of the (second-order) Markov property.</p>
<div id="def-markov3" class="theorem definition">
<p><span class="theorem-title"><strong>Definition 8</strong></span> Let <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D"> be a system of domains<sup>37</sup> in <img src="https://latex.codecogs.com/png.latex?T">. Assume that the random field <img src="https://latex.codecogs.com/png.latex?H(%5Ccdot)"> has an associated dual random field <img src="https://latex.codecogs.com/png.latex?H%5E*(%5Ccdot)">.</p>
<p>We say that <img src="https://latex.codecogs.com/png.latex?H(A)">, <img src="https://latex.codecogs.com/png.latex?A%20%5Cin%20%5Cmathcal%7BG%7D"> has the Markov property (with respect to<sup>38</sup> <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D">) if and only if for all <img src="https://latex.codecogs.com/png.latex?S_1%20%5Cin%20%5Cmathcal%7BG%7D">, <img src="https://latex.codecogs.com/png.latex?%0AH%5E*(S_1%20%5Cbackslash%20%5CGamma)%20%5Cperp%20H%5E*(S_2%20%5Cbackslash%20%5CGamma).%0A"> When this holds, we say that the dual field is <em>orthogonal</em> with respect to <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D">.</p>
</div>
<p>There is probably more to say about dual fields. For instance, the dual of the dual field is the original field. Neat, huh. But really, all we need to do is know that an orthogonal dual field implies a the Markov property. Because next we are going to construct a dual field, which will give us an actually useful characterisation of Markovian GPs.</p>
</section>
<section id="building-out-our-toolset-with-the-conjugate-gp" class="level3">
<h3 class="anchored" data-anchor-id="building-out-our-toolset-with-the-conjugate-gp">Building out our toolset with the conjugate GP</h3>
<p>In this section, our job is to construct a dual random field. To do this, we are going to exploit the notion of a <em>conjugate<sup>39</sup> Gaussian process</em>, which is a generalised<sup>40</sup> GP <img src="https://latex.codecogs.com/png.latex?%5Cxi%5E*"> such that<sup>41</sup> <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(%5Cxi(f)%5Cxi%5E*(g))%20=%20%5Cint_T%20f(s)g(s)%5C,ds.%0A"> It is going to turn out that <img src="https://latex.codecogs.com/png.latex?H%5E*(%5Ccdot)"> is the random field generated by <img src="https://latex.codecogs.com/png.latex?%5Cxi%5E*">. The condition that <img src="https://latex.codecogs.com/png.latex?H(T)%20=%20H%5E*(T)"> can be assumed <em>a fortiori</em>. What we need to show is that the existence of a conjugate Gaussian process implies that, for all <img src="https://latex.codecogs.com/png.latex?S%20%5Csubset%20%5Cmathcal%7BG%7D">, <img src="https://latex.codecogs.com/png.latex?H%5E*(S)%20%5Cperp%20H%5E*(%20S%5EC)">.</p>
<p>We will return to the issue of whether or not <img src="https://latex.codecogs.com/png.latex?%5Cxi%5E*"> actually exists later, but assuming it does let’s see how it’s associated random field <img src="https://latex.codecogs.com/png.latex?H*(S)"> relates to <img src="https://latex.codecogs.com/png.latex?H_+(S%5Ec)%5E%5Cperp"> for <img src="https://latex.codecogs.com/png.latex?S%5Cin%20%5Cmathcal%7BG%7D">. While it is not always true that these things are equal, it <em>is</em> always true that <img src="https://latex.codecogs.com/png.latex?%0AH%5E*(S)%20%5Csubseteq%20H_+(S%5Ec)%5E%5Cperp.%0A"> We will consider when equality holds in the next section. But first let’s show the inclusion.</p>
<p>The space <img src="https://latex.codecogs.com/png.latex?H%5E*(S)"> contains all random variables of the form <img src="https://latex.codecogs.com/png.latex?%5Cxi%5E*(u)">, where the support of <img src="https://latex.codecogs.com/png.latex?u"> is compact in <img src="https://latex.codecogs.com/png.latex?S">, which means that it is a positive distance from <img src="https://latex.codecogs.com/png.latex?S%5EC">. That means that, for some <img src="https://latex.codecogs.com/png.latex?%5Cepsilon%20%3E%200">, the support of <img src="https://latex.codecogs.com/png.latex?u"> is outside<sup>42</sup> of <img src="https://latex.codecogs.com/png.latex?(S%5Ec)%5E%5Cepsilon">. So if we fix that <img src="https://latex.codecogs.com/png.latex?u"> and consider any smooth <img src="https://latex.codecogs.com/png.latex?v"> with support in<sup>43</sup> <img src="https://latex.codecogs.com/png.latex?(S%5Ec)%5E%5Cepsilon">, then, from the definition of the conjugate GP, we have<sup>44</sup> <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(%5Cxi(v)%5Cxi%5E*(u))%20=%20%5Cint_T%20u(s)%20v(s)%5C,%20ds%20=%200.%0A"> This means that <img src="https://latex.codecogs.com/png.latex?%5Cxi%5E*(u)"> is perpendicularity to <img src="https://latex.codecogs.com/png.latex?%5Cxi(v)"> and, therefore, <img src="https://latex.codecogs.com/png.latex?%5Cxi%5E*(u)%20%5Cin%20H((S%5Ec)%5E%5Cepsilon)%5E%5Cperp">. Now, <img src="https://latex.codecogs.com/png.latex?H_+(S%5Ec)"> is defined as the intersection of these spaces, but it turns out that<sup>45</sup> for any spaces <img src="https://latex.codecogs.com/png.latex?A"> and <img src="https://latex.codecogs.com/png.latex?B">, <img src="https://latex.codecogs.com/png.latex?%0A(A%20%5Ccap%20B)%5E%5Cperp%20=%20A%5E%5Cperp%20%5Ccup%20B%5E%5Cperp.%0A"> This is because <img src="https://latex.codecogs.com/png.latex?A%5Ccap%20B%20%5Csubset%20A"> and so every function that’s orthogonal to functions in <img src="https://latex.codecogs.com/png.latex?A"> is also orthogonal to functions in <img src="https://latex.codecogs.com/png.latex?A%5Ccap%20B">. The same goes for <img src="https://latex.codecogs.com/png.latex?B">. We have shown that <img src="https://latex.codecogs.com/png.latex?%0AH_+(S%5Ec)%20=%20%5Cbigcup_%7B%5Cepsilon%20%3E%200%7D%20H((S%5Ec)%5E%5Cepsilon)%5E%5Cperp%0A"> and every <img src="https://latex.codecogs.com/png.latex?%5Ceta%5E*%20%5Cin%20H%5E*(S)"> is in <img src="https://latex.codecogs.com/png.latex?H((S%5Ec)%5E%5Cepsilon)%5E%5Cperp"> for some <img src="https://latex.codecogs.com/png.latex?%5Cepsilon%20%3E0">. This gives the inclusion <img src="https://latex.codecogs.com/png.latex?%0AH%5E*(S)%20%5Csubseteq%20H_+(S%5Ec)%5E%5Cperp.%0A"></p>
<p>To give conditions for when it’s an actual equality is a bit more difficult. It, maybe surprisingly, involves thinking carefully about the reproducing kernel Hilbert space of <img src="https://latex.codecogs.com/png.latex?%5Cxi">. We are going to take this journey together in two steps. First we will give a condition on the RKHS that guarantees that <img src="https://latex.codecogs.com/png.latex?%5Cxi%5E*"> exists. Then we will look at when <img src="https://latex.codecogs.com/png.latex?H%5E*(S)%20=%20H_+(S%5Ec)%5E%5Cperp">.</p>
</section>
<section id="when-does-xi-exits-or-a-surprising-time-with-the-reproducing-kernel-hilbert-space" class="level3">
<h3 class="anchored" data-anchor-id="when-does-xi-exits-or-a-surprising-time-with-the-reproducing-kernel-hilbert-space">When does <img src="https://latex.codecogs.com/png.latex?%5Cxi%5E*"> exits? or, A surprising time with the reproducing kernel Hilbert space</h3>
<p>First off, though, we need to make sure that <img src="https://latex.codecogs.com/png.latex?%5Cxi%5E*"> exists. Obviously<sup>46</sup> if it exists then it is unique and <img src="https://latex.codecogs.com/png.latex?%5Cxi%5E%7B**%7D%20=%20%5Cxi">.</p>
<p>But does it exist? The answer turns out to be <em>sometimes</em>. But also <em>usually</em>. To show this, we need to do something that is, frankly, just a little bit fancy. We need to deal with the reproducing kernel Hilbert space<sup>47</sup>. This feels somewhat surprising, but it turns out that it is a fundamental object<sup>48</sup> and intrinsically tied to the space <img src="https://latex.codecogs.com/png.latex?H(T)">.</p>
<p>The reproducing kernel space, which we will now<sup>49</sup> call <img src="https://latex.codecogs.com/png.latex?V(T)"> because we are using <img src="https://latex.codecogs.com/png.latex?H"> for something else in this section, is a set of deterministic generalised functions <img src="https://latex.codecogs.com/png.latex?%5Cpsi">, that can be evaluated at <img src="https://latex.codecogs.com/png.latex?C_0%5E%5Cinfty(T)"> functions<sup>50</sup> as <img src="https://latex.codecogs.com/png.latex?%0A%5Cpsi(u)%20=%20%5Cint_T%20u(s)%5C,d%5Cpsi(s),%20%5Cqquad%20u%20%5Cin%20C_0%5E%5Cinfty(T).%0A"> A generalised function <img src="https://latex.codecogs.com/png.latex?%5Cpsi%20%5Cin%20V(T)"> if there is a corresponding random variable in <img src="https://latex.codecogs.com/png.latex?%5Ceta%20%5Cin%20H(T)"> that satisfies <img src="https://latex.codecogs.com/png.latex?%0A%5Cpsi(u)%20=%20%5Cmathbb%7BE%7D%5Cleft%5B%5Cxi(u)%20%5Ceta%5Cright%5D,%20%5Cqquad%20u%20%5Cin%20C_0%5E%5Cinfty(T).%0A"> It can be shown<sup>51</sup> that there is a one-to-one correspondence between <img src="https://latex.codecogs.com/png.latex?H(T)"> and <img src="https://latex.codecogs.com/png.latex?V(T)">, in the sense that for every <img src="https://latex.codecogs.com/png.latex?%5Cpsi"> there is a unique <img src="https://latex.codecogs.com/png.latex?%5Ceta%20=%20%5Ceta(%5Cpsi)%20%5Cin%20H(T)">.</p>
<p>We can use this correspondence to endow <img src="https://latex.codecogs.com/png.latex?V(T)"> with an inner product <img src="https://latex.codecogs.com/png.latex?%0A%5Clangle%20%5Cpsi_1,%20%5Cpsi_2%5Crangle_%7BV(T)%7D%20=%20%5Cmathbb%7BE%7D(%5Ceta(%5Cpsi_1),%20%5Ceta(%5Cpsi_2)).%0A"></p>
<p>So far, so abstract. The point of the conjugate GP is that it gives us an explicit construction of the<sup>52</sup> mapping <img src="https://latex.codecogs.com/png.latex?%5Ceta">. And, importantly for the discussion of existence, if there is a conjugate GP then the RKHS has a particular relationship with <img src="https://latex.codecogs.com/png.latex?C_0%5E%5Cinfty(T)">.</p>
<p>To see this, let’s assume <img src="https://latex.codecogs.com/png.latex?%5Cxi%5E*"> exists. Then, for each <img src="https://latex.codecogs.com/png.latex?v%20%5Cin%20C_0%5E%5Cinfty(T)">, the generalised function <img src="https://latex.codecogs.com/png.latex?%0A%5Cpsi_v(u)%20=%20%5Cint_T%20u(s)%20v(s)%5C,ds%0A"> is in <img src="https://latex.codecogs.com/png.latex?V(T)"> because, by the definition of <img src="https://latex.codecogs.com/png.latex?%5Cxi%5E*"> we have that <img src="https://latex.codecogs.com/png.latex?%0A%5Cphi_v(u)%20=%20%5Cmathbb%7BE%7D(%5Cxi(u)%5Cxi%5E*(v))%20=%20%5Cint_T%20u(s)%20v(s)%5C,ds.%0A"> Hence, the embedding is given by <img src="https://latex.codecogs.com/png.latex?%5Ceta(v)%20=%20%5Cxi%5E*(v)">.</p>
<p>Now, if we do a bit of mathematical trickery and equate things that are isomorphic, <img src="https://latex.codecogs.com/png.latex?C_0%5E%5Cinfty(T)%20%5Csubseteq%20V(T)">. On its face, that doesn’t make much sense because on the left we have a space of actual functions and on the right we have a space of generalised functions. To make it work, we associate each smooth function <img src="https://latex.codecogs.com/png.latex?v"> with the generalised function <img src="https://latex.codecogs.com/png.latex?%5Cpsi_v"> defined above.</p>
<p>This make <img src="https://latex.codecogs.com/png.latex?V(T)"> the closure<sup>53</sup> of <img src="https://latex.codecogs.com/png.latex?C_0%5E%5Cinfty(T)"> under the norm <img src="https://latex.codecogs.com/png.latex?%0A%5C%7Cv%5C%7C%5E2_%7BV(T)%7D%20=%20%5Cmathbb%7BE%7D%5Cleft(%5Cxi%5E*(v)%5E2%5Cright).%0A"> and hence we have showed that if there is a conjugate GP, then <img src="https://latex.codecogs.com/png.latex?%0AC_0%5E%5Cinfty(T)%20%5Csubseteq%20V(T),%20%5Cqquad%20%5Coverline%7BC_0%5E%5Cinfty(T)%7D%20=%20V(T).%0A"> It turns out that if <img src="https://latex.codecogs.com/png.latex?C_0%5E%5Cinfty(T)"> is dense in <img src="https://latex.codecogs.com/png.latex?V(T)"> then that implies that there exists a conjugate function defined through the isomorphism <img src="https://latex.codecogs.com/png.latex?%5Ceta(%5Ccdot)">. This is because <img src="https://latex.codecogs.com/png.latex?H(T)%20=%20%5Ceta(V(T))"> and <img src="https://latex.codecogs.com/png.latex?%5Ceta"> is continuous. Hence if we choose <img src="https://latex.codecogs.com/png.latex?%5Cxi%5E*(v)%20=%20%5Ceta(v)"> then <img src="https://latex.codecogs.com/png.latex?H%5E*(T)%20=%20H(T)">.</p>
<p>We have shown the following.</p>
<div id="thm-conjugate-exist" class="theorem">
<p><span class="theorem-title"><strong>Theorem 1</strong></span> A conjugate GP exists if and only if <img src="https://latex.codecogs.com/png.latex?C_0%5E%5Cinfty(T)"> is dense in <img src="https://latex.codecogs.com/png.latex?V(T)">.</p>
</div>
<p>This is our first step towards making statements about the stochastic process <img src="https://latex.codecogs.com/png.latex?%5Cxi"> into statements about the RKHS. We shall continue along this road.</p>
<p>You might, at this point, be wondering if that condition ever actually holds. The answer is yes. It does fairly often. For instance, if <img src="https://latex.codecogs.com/png.latex?%5Cxi"> is a <a href="https://dansblog.netlify.app/posts/2022-09-07-priors5/priors5.html#part-2-an-invitation-to-the-theory-of-stationary-gaussian-processes">stationary GP</a> with spectral density <img src="https://latex.codecogs.com/png.latex?f(%5Comega)">, the biorthogonal function exists if and only if there is some <img src="https://latex.codecogs.com/png.latex?k%3E0"> such that <img src="https://latex.codecogs.com/png.latex?%0A%5Cint%20(1%20+%20%7C%5Comega%7C%5E2)%5E%7B-k%7Df(%5Comega)%5E%7B-1%7D%5C,d%5Comega%20%3C%20%5Cinfty.%0A"> This basically says that the theory we are developing doesn’t work for GPs with extremely smooth sample paths (like a GP with the square-exponential covariance function). This is not a restriction that bothers me at all.</p>
<p>For non-stationary GPs that aren’t too smooth, this will also hold as long as nothing too bizarre is happening at infinity.</p>
</section>
<section id="but-when-does-hs-h_scperp" class="level3">
<h3 class="anchored" data-anchor-id="but-when-does-hs-h_scperp">But when does <img src="https://latex.codecogs.com/png.latex?H%5E*(S)%20=%20H_+(S%5Ec)%5E%5Cperp">?</h3>
<p>We have shown already<sup>54</sup> that <img src="https://latex.codecogs.com/png.latex?%0AH((S%5Ec)%5E%5Cepsilon)%5E%5Cperp%20=%20%5Cleft%5C%7B%5Cxi%5E*(u):%20u%20%5Cin%20V(T),%5C,%20%5Coperatorname%7Bsupp%7D(u)%20%5Csubseteq%20%5B(S%5Ec)%5E%5Cepsilon%5D%5Ec%5Cright%5C%7D%0A"> (that last bit with all the complements can be read as “the support of <img src="https://latex.codecogs.com/png.latex?u"> is inside <img src="https://latex.codecogs.com/png.latex?S"> and always more than <img src="https://latex.codecogs.com/png.latex?%5Cepsilon"> from the boundary.”). It follows then that <img src="https://latex.codecogs.com/png.latex?%0AH_+(S%5Ec)%5E%5Cperp%20=%20%5Cbigcup_%7B%5Cepsilon%3E0%7D%5Cleft%5C%7B%5Cxi%5E*(u):%20%20u%20%5Cin%20V(T),%5C,%20%5Coperatorname%7Bsupp%7D(u)%20%5Csubseteq%20%5B(S%5Ec)%5E%5Cepsilon%5D%5Ec%5Cright%5C%7D.%0A"> This is nice because it shows that <img src="https://latex.codecogs.com/png.latex?H_+(S%5Ec)%5E%5Cperp"> is related to the space <img src="https://latex.codecogs.com/png.latex?%0AV(S)%20=%20%5Cbigcup_%7B%5Cepsilon%3E0%7D%5Cleft%5C%7B%20%20u%20%5Cin%20V(T),%5C,%20%5Coperatorname%7Bsupp%7D(u)%20%5Csubseteq%20%5B(S%5Ec)%5E%5Cepsilon%5D%5Ec%5Cright%5C%7D,%0A"> that is if <img src="https://latex.codecogs.com/png.latex?v%5Cin%20V(T)"> is a function that is the limit of a sequence of functions <img src="https://latex.codecogs.com/png.latex?v_n%20%5Cin%20V(T)"> with <img src="https://latex.codecogs.com/png.latex?%5Coperatorname%7Bsupp%7D(v_n)%20=%20%5B(S%5Ec)%5E%5Cepsilon%5D%5Ec"> for some <img src="https://latex.codecogs.com/png.latex?%5Cepsilon%3E0">, then <img src="https://latex.codecogs.com/png.latex?%5Cxi%5E*(v)%20%5Cin%20H_+(S%5Ec)%5E%5Cperp"> and <em>every</em> such random variable has an associated <img src="https://latex.codecogs.com/png.latex?v">.</p>
<p>So, in the sense<sup>55</sup> of isomorphisms these are equivalent, that is <img src="https://latex.codecogs.com/png.latex?%0AH_+(S%5Ec)%5E%5Cperp%20%5Ccong%20V(S).%0A"></p>
<p>This means that if we can show that <img src="https://latex.codecogs.com/png.latex?H%5E*(S)%20%5Ccong%20V(S)">, then we have two spaces that are isomorphic to the same space <em>and</em> use the same isomorphism <img src="https://latex.codecogs.com/png.latex?%5Cxi%5E*">. This would mean that the spaces are equivalent.</p>
<p>This can also be placed in the language of function spaces. Recall that <img src="https://latex.codecogs.com/png.latex?%0AH%5E*(S)%20=%20%5Coverline%5C%7B%5Cxi(u):%20u%20%5Cin%20C_0%5E%5Cinfty(S)%5C%7D.%0A"> Hence <img src="https://latex.codecogs.com/png.latex?H%5E*(S)"> will be isomorphic to <img src="https://latex.codecogs.com/png.latex?V(S)"> if and only if <img src="https://latex.codecogs.com/png.latex?%0AV(S)%20=%20%5Coverline%7BC_0%5E%5Cinfty(S)%7D,%0A"> that is, if and only if every <img src="https://latex.codecogs.com/png.latex?v%20%5Cin%20V(S)"> is the limit of a sequence of smooth functions compactly supported within <img src="https://latex.codecogs.com/png.latex?S">.</p>
<p>This turns out to not <em>always</em> be true, but it’s true in the situations that we most care about. In particular, we get the following theorem, which I am certainly not going to prove.</p>
<div class="{thm-conjugate-dual}">
<p>Assume that the conjugate GP <img src="https://latex.codecogs.com/png.latex?%5Cxi%5E*"> exists. Assume that <em>either</em> of the following holds:</p>
<ol type="1">
<li><p>Multiplication by a function <img src="https://latex.codecogs.com/png.latex?w%20%5Cin%20C_0%5E%5Cinfty"> is bounded in <img src="https://latex.codecogs.com/png.latex?V(T)">, ie <img src="https://latex.codecogs.com/png.latex?%0A%5C%7Cwu%20%5C%7C_%7BV(T)%7D%20%5Cleq%20C(w)%20%5C%7Cu%5C%7C_%7BV(T)%7D,%20%5Cqquad%20u%20%5Cin%20C_0%5E%5Cinfty%20(T).%0A"></p></li>
<li><p>The shift operator is bounded under both the RKHS norm and the covariance<sup>56</sup> norm for small <img src="https://latex.codecogs.com/png.latex?s_0">, ie <img src="https://latex.codecogs.com/png.latex?%0A%5C%7Cu(%5Ccdot%20-%20s_0)%5C%7C%20%5Cleq%20C%20%5C%7Cu%5C%7C,%20%5Cqquad%20u%20%5Cin%20C_0%5E%5Cinfty(T)%0A"> holds in both norms for all <img src="https://latex.codecogs.com/png.latex?s_0%20%5Cleq%20s_%5Cmax">, <img src="https://latex.codecogs.com/png.latex?s_%5Cmax%20%3E0"> sufficiently small.</p></li>
</ol>
<p>Then <img src="https://latex.codecogs.com/png.latex?H%5E*(%5Ccdot)"> is the dual of <img src="https://latex.codecogs.com/png.latex?H(%5Ccdot)"> over the system of sets that are bounded or have bounded complements in <img src="https://latex.codecogs.com/png.latex?T">.</p>
</div>
<p>The second condition is particularly important because it <em>always</em> holds for stationary GPs with <img src="https://latex.codecogs.com/png.latex?C=1"> as their covariance structure is shift invariant. It’s not impossible to come up with examples of generalised GPs that don’t satisfy this condition, but they’re all a bit weird (eg the “derivative” of white noise). So as long as your GP is not too weird, you should be fine.</p>
</section>
<section id="at-long-last-an-rkhs-characterisation-of-the-markov-property" class="level3">
<h3 class="anchored" data-anchor-id="at-long-last-an-rkhs-characterisation-of-the-markov-property">At long last, an RKHS characterisation of the Markov property</h3>
<p>And with that, we are finally here! We have that <img src="https://latex.codecogs.com/png.latex?H%5E*(S)"> is the dual random field to <img src="https://latex.codecogs.com/png.latex?H(S)">, <img src="https://latex.codecogs.com/png.latex?S%5Cin%20G"> <em>and</em> we have a lovely characterisation of <img src="https://latex.codecogs.com/png.latex?H%5E*(S)"> in terms of the RKHS <img src="https://latex.codecogs.com/png.latex?V(S)">. We can combine this with our definition of a Markov property for GPs with a dual random field and get that a GP <img src="https://latex.codecogs.com/png.latex?%5Cxi"> is Markovian if and only if <img src="https://latex.codecogs.com/png.latex?%0AH%5E*(S_1%20%5Cbackslash%20%5CGamma)%20%5Cperp%20H%5E*(S_2%20%5Cbackslash%20%5CGamma).%0A"> We can use the isomorphism to say that if <img src="https://latex.codecogs.com/png.latex?%5Ceta_j%20%5Cin%20H%5E*(S_j%20%5Cbackslash%20%5CGamma)">, <img src="https://latex.codecogs.com/png.latex?j=1,2">, then there is a <img src="https://latex.codecogs.com/png.latex?v_j%20%5Cin%20V(S_j%20%5Cbackslash%20%5CGamma)"> such that <img src="https://latex.codecogs.com/png.latex?%0A%5Ceta_j%20=%20%5Cxi%5E*(v_j).%0A"> Moreover, this isomorphism is unitary (aka it preserves the inner product) and so <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(%5Ceta_1%20%5Ceta_2)%20=%20%5Clangle%20v_1,%20v_2%5Crangle_%7BV(T)%7D.%0A"> Hence, <img src="https://latex.codecogs.com/png.latex?%5Cxi"> has the Markov property if and only if <img src="https://latex.codecogs.com/png.latex?%0A%5Clangle%20v_1,%20v_2%5Crangle_%7BV(T)%7D%20=%200,%20%5Cqquad%20v_j%20%5Cin%20V(S_j%20%5Cbackslash%20%5CGamma),%5C,S_1%20%5Cin%20%5Cmathcal%7BG%7D,%5C,%20S_2%20=%20S_1%5Ec,%5C,%20j=1,2.%0A"></p>
<p>Let’s memorialise this as a theorem.</p>
<div id="thm-markov-rkhs" class="theorem">
<p><span class="theorem-title"><strong>Theorem 2</strong></span> A GP <img src="https://latex.codecogs.com/png.latex?%5Cxi"> with a conjugate GP <img src="https://latex.codecogs.com/png.latex?%5Cxi%5E*"> is Markov if and only if its RKHS is local, ie if <img src="https://latex.codecogs.com/png.latex?v_1"> and <img src="https://latex.codecogs.com/png.latex?v_2"> have disjoint supports, then <img src="https://latex.codecogs.com/png.latex?%0A%5Clangle%20v_1,%20v_2%5Crangle_%7BV(T)%7D%20=%200.%0A"></p>
</div>
<p>This result is <em>particularly</em> nice because it entirely characterises the RHKS inner product of a Markovian GP. The reason for this is a deep result from functional analysis called Peetre’s Theorem, which states, in our context, that locality implies that the inner product has the form <img src="https://latex.codecogs.com/png.latex?%0A%5Clangle%20v_1,%20v_2%5Crangle_%7BV(T)%7D%20=%20%5Csum_%7B%5Cmathbf%7Bk%7D,%20%5Cmathbf%7Bj%7D%7D%20%5Cint_T%20a_%7B%5Cmathbf%7Bk%7D%5Cmathbf%7Bj%7D%7D(s)%5Cfrac%7B%5Cpartial%5E%7B%7C%5Cmathbf%7Bk%7D%7C%7Du%7D%7B%5Cpartial%20s_%5Cmathbf%7Bk%7D%7D%20%5Cfrac%7B%5Cpartial%5E%7B%7C%5Cmathbf%7Bj%7D%7C%7Du%7D%7B%5Cpartial%20s_%5Cmathbf%7Bj%7D%7D%5C,ds,%0A"> where<sup>57</sup> <img src="https://latex.codecogs.com/png.latex?a_%7B%5Cmathbf%7Bk%7D%5Cmathbf%7Bj%7D%7D(s)"> are integrable functions and only a finite number of them are non-zero at any point <img src="https://latex.codecogs.com/png.latex?s">.</p>
<p>This connection between the RKHS and the dual space also gives the following result for stationary GPs.</p>
<div id="thm-stationary-gp" class="theorem">
<p><span class="theorem-title"><strong>Theorem 3</strong></span> Let <img src="https://latex.codecogs.com/png.latex?%5Cxi"> be a stationary Gaussian process. Then <img src="https://latex.codecogs.com/png.latex?%5Cxi"> has the Markov property if and only if its spectral density is the inverse of a non-negative, symmetric polynomial.</p>
</div>
<p>This follows from the characterisation of the RKHS as having the inner product as <img src="https://latex.codecogs.com/png.latex?%0A%5Clangle%20v_1,%20v_2%5Crangle_%7BV(T)%7D%20=%20%5Cint_T%20%5Chat%7Bv_1%7D(%5Comega)%20%5Chat%7Bv_2%7D(%5Comega)%20f(%5Comega)%5E%7B-1%7D%5C,d%5Comega,%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Chat%7Bv_1%7D"> is the Fourier transform of <img src="https://latex.codecogs.com/png.latex?v_1"> and the fact that a differential operator can is transformed to a polynomial in Fourier space.</p>
</section>
<section id="putting-this-all-in-terms-of-eta" class="level3">
<h3 class="anchored" data-anchor-id="putting-this-all-in-terms-of-eta">Putting this all in terms of <img src="https://latex.codecogs.com/png.latex?%5Ceta"></h3>
<p><em>Waaaay</em> back near the top of the post I described a way to write a (generalised) GP in terms of its covariance operator and the white noise process <img src="https://latex.codecogs.com/png.latex?%0A%5Ceta%20=%20%5Cmathcal%7BC%7D%5E%7B1/2%7DW.%0A"> From the discussions above, it follows that the corresponding conjugate GP is given by <img src="https://latex.codecogs.com/png.latex?%0A%5Ceta%5E*%20=%20C%5E%7B-1/2%7DW.%0A"> This means that the RKHS inner product is given by <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Clangle%20v_1,%20v_2%20%5Crangle_%7BV(T)%7D%20=%20%5Cmathbb%7BE%7D(%5Ceta%5E*(v_1)%5Ceta%5E*(v_2))%5C%5C%0A&amp;=%20%5Cmathbb%7BE%7D%5Cleft%5B%5Cint_T%20%5Cmathcal%7BC%7D%5E%7B-1/2%7Dv_1(s)%5C,dW(s)%5Cint_T%20%5Cmathcal%7BC%7D%5E%7B-1/2%7Dv_2(s)%5C,dW(s)%5Cright%5D%20%5C%5C%0A&amp;=%20%5Cint_T%20v_1(s)%5Cmathcal%7BC%7D%5E%7B-1%7Dv_2(s)%5C,ds%0A%5Cend%7Balign*%7D"> From the discussion above, if <img src="https://latex.codecogs.com/png.latex?%5Ceta"> is Markovian, then <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BC%7D%5E%7B-1%7D"> is<sup>58</sup> a differential<sup>59</sup> operator.</p>
</section>
</section>
<section id="using-the-rkhs-to-build-computationally-efficient-approximations-to-markovian-gps" class="level2">
<h2 class="anchored" data-anchor-id="using-the-rkhs-to-build-computationally-efficient-approximations-to-markovian-gps">Using the RKHS to build computationally efficient approximations to Markovian GPs</h2>
<p>To close out this post, let’s look at how we can use the RKHS to build an approximation to a Markovian GP. This is equivalent<sup>60</sup> to the SPDE method that was very briefly sketched above, but it only requires knowledge of the RKHS inner product.</p>
<p>In particular, if we have a set of basis functions <img src="https://latex.codecogs.com/png.latex?%5Cpsi_j">, <img src="https://latex.codecogs.com/png.latex?j=1,%5Cldots,n">, we can define the approximate RKHS <img src="https://latex.codecogs.com/png.latex?V_n(T)"> as the space of all functions <img src="https://latex.codecogs.com/png.latex?%0Af(s)%20=%20%5Csum_%7Bj=1%7D%5En%20f_j%20%5Cpsi_j(s)%0A"> equipped with the inner product <img src="https://latex.codecogs.com/png.latex?%0A%5Clangle%20f,%20g%20%5Crangle_%7BV_n(T)%7D%20=%20f%5ET%20Q%20g,%0A"> where the LHS <img src="https://latex.codecogs.com/png.latex?f"> and <img src="https://latex.codecogs.com/png.latex?g"> are functions and on the right they are the vectors of weights, and <img src="https://latex.codecogs.com/png.latex?%0AQ_%7Bij%7D%20=%20%5Clangle%20%5Cpsi_i,%20%5Cpsi_j%5Crangle_%7BV(T)%7D.%0A"></p>
<p>For a finite dimensional GP, the matrix that defines the RKHS inner product is<sup>61</sup> the inverse of the covariance matrix. Hence the finite dimensional GP <img src="https://latex.codecogs.com/png.latex?u%5E%7B(n)%7D(%5Ccdot)"> associated with the RKHS <img src="https://latex.codecogs.com/png.latex?V_n(T)"> is the random function <img src="https://latex.codecogs.com/png.latex?%0Au%5E%7B(n)%7D(s)%20=%20%5Csum_%7Bj%20=%201%7D%5En%20u_j%20%5Cpsi_j(s),%0A"> where the weights <img src="https://latex.codecogs.com/png.latex?u%20%5Csim%20N(0,%20Q%5E%7B-1%7D)">.</p>
<p>If the GP is Markovian <em>and</em> the basis functions have compact support, then <img src="https://latex.codecogs.com/png.latex?Q"> is a sparse matrix and maybe he’ll love me again.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>or redefined if you’ve read <a href="https://dansblog.netlify.app/posts/2021-11-03-yes-but-what-is-a-gaussian-process-or-once-twice-three-times-a-definition-or-a-descent-into-madness/yes-but-what-is-a-gaussian-process-or-once-twice-three-times-a-definition-or-a-descent-into-madness.html">my other post</a>↩︎</p></li>
<li id="fn2"><p>For other observation models it contains the posterior mode↩︎</p></li>
<li id="fn3"><p>Step 1: Open Rasmussen and Williams.↩︎</p></li>
<li id="fn4"><p>For example, the process I’m about to describe is not meaningfully different for a process on a sphere. Whereas if you want to use a covariance function on a sphere you are stuck trying to find a whole new class of positive definite functions. It’s frankly very annoying. Although if you want to build a career out of characterising positive definite functions on increasingly exotic spaces, you probably don’t find it annoying.↩︎</p></li>
<li id="fn5"><p>Or the Cholesky factor if you add a bunch of transposes in the right places, but let’s not kid ourselves this is not a practical discussion of how to do it↩︎</p></li>
<li id="fn6"><p>Albeit a bit advanced. It’s straightforward in the sense that for an infinite-dimensional operator it happens to work a whole like a symmetric positive semi-definite matrix. It is not straightforward in the sense that your three year old could do it. Your three year old can’t do it. But it will keep them quiet in the back seat of the car while you pop into the store for some fags. It’s ok. The window’s down.↩︎</p></li>
<li id="fn7"><p>For any subset <img src="https://latex.codecogs.com/png.latex?B">, <img src="https://latex.codecogs.com/png.latex?%5Csup_%7Bs%5Cin%20B%7D%20w(s)%20=%20%5Cinfty"> <em>and</em> <img src="https://latex.codecogs.com/png.latex?%5Cinf_%7Bs%20%5Cin%20B%7D%20w(s)%20=%20-%5Cinfty">↩︎</p></li>
<li id="fn8"><p>Countably additive set-valued function taking any value in <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BC%7D">↩︎</p></li>
<li id="fn9"><p>measurable↩︎</p></li>
<li id="fn10"><p><img src="https://latex.codecogs.com/png.latex?A%20%5Ccap%20B%20=%20%5Cemptyset">↩︎</p></li>
<li id="fn11"><p>If <img src="https://latex.codecogs.com/png.latex?W(A)"> is also Gaussian then this is the same as them being independent↩︎</p></li>
<li id="fn12"><p>Recall that <img src="https://latex.codecogs.com/png.latex?T"> is our whole space. Usually <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5Ed">, but it doesn’t matter here.↩︎</p></li>
<li id="fn13"><p>A bit of a let down really.↩︎</p></li>
<li id="fn14"><p>like <img src="https://latex.codecogs.com/png.latex?f(s)"> but with more subsets↩︎</p></li>
<li id="fn15"><p><img src="https://latex.codecogs.com/png.latex?L%5E2(T)"> is the space of functions with the property that <img src="https://latex.codecogs.com/png.latex?%5Cint_T%20f(s)%5E2%5C,ds%20%3C%20%5Cinfty">.↩︎</p></li>
<li id="fn16"><p>eg the Gaussian free field in physics, or the de Wijs process.↩︎</p></li>
<li id="fn17"><p>You can use a separate set of basis functions here, but I’m focusing on simplicity↩︎</p></li>
<li id="fn18"><p>The standard example is <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathcal%7BC%7D%5E%7B-1/2%7D%20=%20%5Ckappa%5E2%20-%20%5Csum_%7Bj=1%7D%5Ed%20%5Cfrac%7B%5Cpartial%5E2%7D%7B%5Cpartial%20s_j%5E2%7D.%0A">↩︎</p></li>
<li id="fn19"><p>In particular piecewise linear tent functions build on a triangulation↩︎</p></li>
<li id="fn20"><p>Read the paper, it’s a further approximation but the error is negligible↩︎</p></li>
<li id="fn21"><p>(<img src="https://latex.codecogs.com/png.latex?d-1">)-dimensional sub-manifold↩︎</p></li>
<li id="fn22"><p>This set does not include its boundary↩︎</p></li>
<li id="fn23"><p>This is defined as the set <img src="https://latex.codecogs.com/png.latex?%5Cpartial%20S_1%20=%20%5Cbar%7BS_1%7D%20%5Cbackslash%20S_1">, where <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BS_1%7D"> is the closure of <img src="https://latex.codecogs.com/png.latex?S_1">. But let’s face it. It’s the fucking boundary. It means what you think it means.↩︎</p></li>
<li id="fn24"><p>I’m using <img src="https://latex.codecogs.com/png.latex?%5Cxi"> here as a <em>generic</em> generalised GP, rather than <img src="https://latex.codecogs.com/png.latex?%5Ceta">, which is built using an ordinary GP. This doesn’t really make much of a difference (the Markov property for one is the same as the other), but it makes me feel better.↩︎</p></li>
<li id="fn25"><p>measurable↩︎</p></li>
<li id="fn26"><p>Here <img src="https://latex.codecogs.com/png.latex?%5Coperatorname%7Bsupp%7D(f)"> is the support of <img src="https://latex.codecogs.com/png.latex?f">, that is the values of <img src="https://latex.codecogs.com/png.latex?s"> such that <img src="https://latex.codecogs.com/png.latex?f(s)%20%5Cneq%200">.↩︎</p></li>
<li id="fn27"><p>This is the terminology of Rozanov. Random Field is also another term for stochastic process. Why only let words mean one thing?↩︎</p></li>
<li id="fn28"><p>non-empty connected open sets↩︎</p></li>
<li id="fn29"><p>Strictly, this is the <em>weak</em> or <em>second-order</em> Markov property↩︎</p></li>
<li id="fn30"><p>If you’re curious, this is basically the same thing as a splitting <img src="https://latex.codecogs.com/png.latex?%5Csigma">-algebra. But, you know, sans the <img src="https://latex.codecogs.com/png.latex?%5Csigma">-algebra bullshit.↩︎</p></li>
<li id="fn31"><p>That is, any <img src="https://latex.codecogs.com/png.latex?x%20%5Cin%20H(T)"> can be written as the sum <img src="https://latex.codecogs.com/png.latex?x%20=%20x_1%20+%20x_2%20+%20x_3">, where <img src="https://latex.codecogs.com/png.latex?x_1%20%5Cin%20%20H(S_1%20%5Cominus%20%5CGamma%5E%5Cepsilon)">, <img src="https://latex.codecogs.com/png.latex?x_2%20%5Cin%20H(%5CGamma%5E%5Cepsilon)">, and <img src="https://latex.codecogs.com/png.latex?x_3%20%5Cin%20H(S_2%20%5Cominus%20%5CGamma%5E%5Cepsilon)"> are <em>mutually orthogonal</em> (ie <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(x_1x_2)%20=%20%5Cmathbb%7BE%7D(x_1x_3)%20=%20%5Cmathbb%7BE%7D(x_2x_3)%20=0">!).↩︎</p></li>
<li id="fn32"><p>This is using the idea that the conditional expectation is a projection.↩︎</p></li>
<li id="fn33"><p>Typically any open set, or any open connected set, or any open, bounded set. A subtlety that I don’t really want to dwell on is that it is possible to have a GP that is Markov with respect to one system of domains but not another.↩︎</p></li>
<li id="fn34"><p>The Markov property can be restated in this language as for every system of complementary domains and boundary <img src="https://latex.codecogs.com/png.latex?S_1">, <img src="https://latex.codecogs.com/png.latex?%5CGamma">, <img src="https://latex.codecogs.com/png.latex?S_2">, there exists a small enough <img src="https://latex.codecogs.com/png.latex?%5Cepsilon%20%3E%200"> such that <img src="https://latex.codecogs.com/png.latex?%5CGamma%5E%5Cepsilon"> splits <img src="https://latex.codecogs.com/png.latex?S_1"> and <img src="https://latex.codecogs.com/png.latex?S_2">↩︎</p></li>
<li id="fn35"><p>Technically we are assuming that for small enough <img src="https://latex.codecogs.com/png.latex?%5Cepsilon"> <img src="https://latex.codecogs.com/png.latex?H(%5CGamma%5E%5Cepsilon)%20=%20%5Coperatorname%7Bspan%7D%5Cleft(H(%5CGamma%5E%5Cepsilon%20%5Ccap%20S_1)%20%5Ccup%20H_+(%5CGamma)%20%5Ccup%20H(%5CGamma%5E%5Cepsilon%20%5Ccap%20S_2)%5Cright)">. This is not a particularly onerous assumption.↩︎</p></li>
<li id="fn36"><p>non-empty connected open sets↩︎</p></li>
<li id="fn37"><p>non-empty connected open sets↩︎</p></li>
<li id="fn38"><p>The result works with some subsystem <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG_0%7D">. To prove it for <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D"> it’s enough to prove it for some subset <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D_0"> that separates points of <img src="https://latex.codecogs.com/png.latex?T">. This is a wildly technical aside and if it makes no sense to you, that’s very much ok. Frankly I’m impressed you’ve hung in this long.↩︎</p></li>
<li id="fn39"><p>Rozanov also calls this the <em>biorthogonal</em> GP. I like conjugate more.↩︎</p></li>
<li id="fn40"><p>Up to this point, it hasn’t been technically necessary for the GP to be generalised. However, here is very much is. It turns out that if realisations of <img src="https://latex.codecogs.com/png.latex?%5Cxi"> are almost surely continuous, then realisations of <img src="https://latex.codecogs.com/png.latex?%5Cxi%5E*"> are almost surely generalised functions.↩︎</p></li>
<li id="fn41"><p>I’m writing this as if all of these GPs are real valued, but for full generality, we should be dealing with complex GPs. Just imagine I put complex conjugates in all the correct places. I can’t stop you.↩︎</p></li>
<li id="fn42"><p>That is, inside <img src="https://latex.codecogs.com/png.latex?S"> and more than <img src="https://latex.codecogs.com/png.latex?%5Cepsilon"> from the boundary↩︎</p></li>
<li id="fn43"><p><img src="https://latex.codecogs.com/png.latex?v"> can be non-zero inside <img src="https://latex.codecogs.com/png.latex?S"> but only if it’s less than <img src="https://latex.codecogs.com/png.latex?%5Cepsilon"> away from the boundary.↩︎</p></li>
<li id="fn44"><p>It’s zero because the two functions are never non-zero at the same time, so their product is zero.↩︎</p></li>
<li id="fn45"><p>Here, and probably in a lot of other places, we are taking the union of spaces to be the span of their sum. Sorry.↩︎</p></li>
<li id="fn46"><p>Really Daniel. Really. (It’s an isomorphism so if you do enough analysis courses this is obvious. If that’s not clear to you, you should just trust me. Trust issues aren’t sexy. Unless you have cum gutters. In which case, I’ll just spray my isomorphisms on them and you can keep scrolling TikTok.)↩︎</p></li>
<li id="fn47"><p>This example is absolutely why I hate that we’ve settled on RKHS as a name for this object because the thing that we are about to construct does not always have a reproducing kernel property. Cameron-Martin space is less confusing. But hey. Whatever. The RKHS for the rest of this section is not always a Hilbert space with a reproducing kernel. We are just going to have to be ok with that.↩︎</p></li>
<li id="fn48"><p>Nothing about this analysis relies on Gaussianity. So this is a general characterisation of a Markov property for <em>any</em> stochastic process with second moments.↩︎</p></li>
<li id="fn49"><p>In previous blogs, this was denoted <img src="https://latex.codecogs.com/png.latex?H_c(T)"> and truly it was too confusing when I tried to do it here. And by that point I wasn’t going back and re-naming <img src="https://latex.codecogs.com/png.latex?H(T)">.↩︎</p></li>
<li id="fn50"><p><img src="https://latex.codecogs.com/png.latex?C_0%5E%5Cinfty(T)"> is the space of all infinitely differentiable compactly supported functions on <img src="https://latex.codecogs.com/png.latex?T">↩︎</p></li>
<li id="fn51"><p>The trick is to notice that the set of all possible <img src="https://latex.codecogs.com/png.latex?%5Cxi(u)"> is dense in <img src="https://latex.codecogs.com/png.latex?H(T)">.↩︎</p></li>
<li id="fn52"><p>unitary↩︎</p></li>
<li id="fn53"><p>the space containing the limits (in the <img src="https://latex.codecogs.com/png.latex?V(T)">-norm) of all sequences in <img src="https://latex.codecogs.com/png.latex?v_n%20%5Cin%20C_0%5E%5Cinfty(T)">↩︎</p></li>
<li id="fn54"><p>If you take some limits↩︎</p></li>
<li id="fn55"><p>I mean, really. Basically we say that <img src="https://latex.codecogs.com/png.latex?A%20%5Ccong%20B"> if there is an isomorphism between <img src="https://latex.codecogs.com/png.latex?A"> and <img src="https://latex.codecogs.com/png.latex?B">. Could I be more explicit? Yes. Would that make this unreadable? Also yes.↩︎</p></li>
<li id="fn56"><p><img src="https://latex.codecogs.com/png.latex?%5C%7Cu%5C%7C%5E2%20=%20%5Cmathbb%7BE%7D(%5Cxi(u)%5E2)">.↩︎</p></li>
<li id="fn57"><p><img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7Bj%7D%20=%20(j_1,%20j_2,%20%5Cldots)"> is a multi-index, which can be interpreted as <img src="https://latex.codecogs.com/png.latex?%7C%5Cmathbf%7Bj%7D%7C%20=%20%5Csum_%7B%5Cell%5Cgeq%201%20%7Dj_%5Cell">, and <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Cpartial%5E%7B%7C%5Cmathbf%7Bj%7D%7C%7Du%7D%7B%5Cpartial%20s_%5Cmathbf%7Bj%7D%7D%20=%20%5Cfrac%7B%5Cpartial%5E%7B%7C%5Cmathbf%7Bj%7D%7C%7Du%7D%7B%5Cpartial%5E%7Bj_1%7Ds_%7B1%7D%5Cpartial%5E%7Bj_2%7Ds_%7B2%7D%5Ccdots%7D.%0A">↩︎</p></li>
<li id="fn58"><p>in every local coordinate system↩︎</p></li>
<li id="fn59"><p>Because <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BC%7D%5E%7B-1%7D"> defines an inner product, it’s actually a symmetric elliptic differential operator↩︎</p></li>
<li id="fn60"><p>Technically, you need to choose different basis functions for <img src="https://latex.codecogs.com/png.latex?f">. In particular, you need to choose <img src="https://latex.codecogs.com/png.latex?f%20=%20%5Csum_%7Bj=1%7D%5En%20f_j%20%5Cphi_j"> where <img src="https://latex.codecogs.com/png.latex?%5Cphi_j%20=%20%5Cmathcal%7BC%7D%5E%7B-1/2%7D%20%5Cpsi_j">. This is then called a Petrov-Galerkin approximation and truly we don’t need to think about it at all. Also I am completely eliding issues of smoothness in all of this. It maters, but it doesn’t matter too much. So let’s just assume everything exists.↩︎</p></li>
<li id="fn61"><p>If you don’t believe me you are welcome to read <a href="https://dansblog.netlify.app/posts/2021-11-03-yes-but-what-is-a-gaussian-process-or-once-twice-three-times-a-definition-or-a-descent-into-madness/yes-but-what-is-a-gaussian-process-or-once-twice-three-times-a-definition-or-a-descent-into-madness.html">the monster blog post</a>, where it’s an example.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2023,
  author = {Simpson, Dan},
  title = {Markovian {Gaussian} Processes: {A} Lot of Theory and Some
    Practical Stuff},
  date = {2023-01-21},
  url = {https://dansblog.netlify.app/posts/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2023" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2023. <span>“Markovian Gaussian Processes: A Lot of Theory
and Some Practical Stuff.”</span> January 21, 2023. <a href="https://dansblog.netlify.app/posts/">https://dansblog.netlify.app/posts/</a>.
</div></div></section></div> ]]></description>
  <category>Gaussian processes</category>
  <category>Fundamentals</category>
  <category>Theory</category>
  <category>Deep Dives</category>
  <guid>https://dansblog.netlify.app/posts/2023-01-21-markov/markov.html</guid>
  <pubDate>Fri, 20 Jan 2023 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2023-01-21-markov/gays.png" medium="image" type="image/png" height="165" width="144"/>
</item>
<item>
  <title>Sparse matrices part 7a: Another shot at JAX-ing the Cholesky decomposition</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2022-11-27-sparse7/sparse7.html</link>
  <description><![CDATA[ 





<p>The time has come once more to resume my journey into sparse matrices. There’s been a bit of a pause, mostly because I realised that I didn’t know how to implement the sparse Cholesky factorisation in a JAX-traceable way. But now the time has come. It is time for me to get on top of JAX’s weird control-flow constructs.</p>
<p>And, along the way, I’m going to re-do the sparse Cholesky factorisation to make it, well, better.</p>
<p>In order to temper expectations, I will tell you that this post does not do the numerical factorisation, only the symbolic one. Why? Well I wrote most of it on a long-haul flight and I didn’t get to the numerical part. And this was long enough. So hold your breaths for Part 7b, which will come as soon as I write it.</p>
<p>You can consider this a <em>much</em> better re-do of <a href="https://dansblog.netlify.app/posts/2022-03-23-getting-jax-to-love-sparse-matrices/getting-jax-to-love-sparse-matrices.html">Part 2</a>. This is no longer my first python coding exercise in a decade, so hopefully the code is better. And I’m definitely trying a lot harder to think about the limitations of JAX.</p>
<p>Before I start, I should probably say why I’m doing this. JAX is a truly magical thing that will compute gradients and every thing else just by clever processing of the Jacobian-vector product code. Unfortunately, this is only possible if the Jacobian-vector product code is JAX traceable and this code is structurally extremely similar<sup>1</sup> to the code for the sparse Cholesky factorisation.</p>
<p>I am doing this in the hope of (eventually getting to) autodiff. But that won’t be this blog post. This blog post is complicated enough.</p>
<section id="control-flow-of-the-damned" class="level2">
<h2 class="anchored" data-anchor-id="control-flow-of-the-damned">Control flow of the damned</h2>
<p>The first an most important rule of programming with JAX is that loops will break your heart. I mean, whatever, I guess they’re fine. But there’s a problem. Imagine the following function</p>
<div class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> f(x: jax.Array, n: Int) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> jax.Array:</span>
<span id="cb1-2">  out <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.zeros_like(x)</span>
<span id="cb1-3">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n):</span>
<span id="cb1-4">    out <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> out <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> x</span>
<span id="cb1-5">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> out</span></code></pre></div>
</div>
<p>This is, basically, the worst implementation of multiplication by an integer that you can possibly imagine. This code will run fine in Python, but if you try to JIT compile it, JAX is gonna get <em>angry</em>. It will produce the machine code equivalent of</p>
<div class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> f_n(x):</span>
<span id="cb2-2">  out <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> x</span>
<span id="cb2-3">  out <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> out <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> x</span>
<span id="cb2-4">  out <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> out <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> x</span>
<span id="cb2-5">  <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span> do this n times</span>
<span id="cb2-6">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> out</span></code></pre></div>
</div>
<p>There are two bad things happening here. First, note that the “compiled” code depends on <code>n</code> and will have to be compiled anew each time <code>n</code> changes. Secondly, the loop has been replaced by <code>n</code> copies of the loop body. This is called <em>loop unrolling</em> and, when used judiciously by a clever compiler, is a great way to speed up code. When done completely for <em>every</em> loop this is a nightmare and the corresponding code will take a geological amount of time to compile.</p>
<p>A similar thing<sup>2</sup> happens when you need to run autodiff on <code>f(x,n)</code>. For each <code>n</code> an expression graph is constructed that contains the unrolled for loop. This suggests that autodiff might also end up being quite slow (or, more problematically, more memory-hungry).</p>
<p>So the first rule of JAX is to avoid for loops. But if you can’t do that, there are three built-in loop structures that play nicely with JIT compilation and sometimes<sup>3</sup> differentiation. These three constructs are</p>
<ol type="1">
<li>A while loop <code>jax.lax.while(cond_func, body_func, init)</code></li>
<li>An accumulator <code>jax.lax.scan(body_func, init, xs)</code></li>
<li>A for loop <code>jax.lax.fori_loop(lower, upper, body_fun, init)</code></li>
</ol>
<p>Of those three, the first and third work mostly as you’d expect, while the second is a bit more hairy. The <code>while</code> function is roughly equivalent to</p>
<div class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb3-1">`</span>
<span id="cb3-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> jax_lax_while_loop(cond_func, body_func, init):</span>
<span id="cb3-3">  x  <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> init</span>
<span id="cb3-4">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">while</span> cond_func(x):</span>
<span id="cb3-5">    x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> body_func(x)</span>
<span id="cb3-6">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> x</span></code></pre></div>
</div>
<p>So basically it’s just a while loop. The thing that’s important is that it compiles down to a single XLA operation<sup>4</sup> instead of some unrolled mess.</p>
<p>One thing that is important to realise is that while loops are only forwards-mode differentiable, which means that it is <em>very</em> expensive<sup>5</sup> to compute gradients. The reason for this is that we simply do not know how long that loop actually is and so it’s impossible to build a fixed-size expression graph.</p>
<p>The <code>jax.lax.scan</code> function is probably the one that people will be least familiar with. That said, it’s also the one that is roughly “how a for loop should work”. The concept that’s important here is a for-loop with <em>carry over</em>. Carry over is information that changes from one step of the loop to the next. This is what separates us from a <code>map</code> statement, which would apply the same function independently to each element of a list.</p>
<p>The scan function looks like</p>
<div class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> jax_lax_scan(body_func, init, xs):</span>
<span id="cb4-2">  len_x0 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(x0)</span>
<span id="cb4-3">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(x) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> len_x0 <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> xs):</span>
<span id="cb4-4">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">raise</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">ValueError</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"All x must have the same length!!"</span>)</span>
<span id="cb4-5">  carry <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> init</span>
<span id="cb4-6">  ys <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb4-7">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> xs:</span>
<span id="cb4-8">    carry, y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> body_func(carry, x)</span>
<span id="cb4-9">    ys.append(y)</span>
<span id="cb4-10">  </span>
<span id="cb4-11">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> carry, np.stack(ys)</span></code></pre></div>
</div>
<p>A critically important limitation to <code>jax.lax.scan</code> is that is that every <code>x</code> in <code>xs</code> must have the same shape! This mean, for example, that</p>
<div class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb5-1">xs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>], [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>], <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>]</span></code></pre></div>
</div>
<p>is not a valid argument. Like all limitations in JAX, this serves to make the code transformable into efficiently compiled code across various different processors.</p>
<p>For example, if I wanted to use <code>jax.lax.scan</code> on my example from before I would get</p>
<div class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb6-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> lax</span>
<span id="cb6-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> jnp</span>
<span id="cb6-3"></span>
<span id="cb6-4"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> f(x, n):</span>
<span id="cb6-5">  init <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.zeros_like(x)</span>
<span id="cb6-6">  xs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.repeat(x, n)</span>
<span id="cb6-7">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_func(carry, y):</span>
<span id="cb6-8">    val <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> carry <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> y</span>
<span id="cb6-9">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (val, val)</span>
<span id="cb6-10">  </span>
<span id="cb6-11">  final, journey <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.scan(body_func, init, xs)</span>
<span id="cb6-12">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (final, journey)</span>
<span id="cb6-13"></span>
<span id="cb6-14">final, journey <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> f(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>)</span>
<span id="cb6-15"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(final)</span>
<span id="cb6-16"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(journey)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>8.4
[1.2       2.4       3.6000001 4.8       6.        7.2       8.4      ]</code></pre>
</div>
</div>
<p>This translation is a bit awkward compared to the for loop but it’s the sort of thing that you get used to.</p>
<p>This function can be differentiated<sup>6</sup> and compiled. To differentiate it, I need a version that returns a scalar, which is easy enough to do with a lambda.</p>
<div class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb8-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> jit, grad</span>
<span id="cb8-2"></span>
<span id="cb8-3">f2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x, n: f(x,n)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb8-4">f2_grad <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(f2, argnums <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb8-5"></span>
<span id="cb8-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(f2_grad(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>7.0</code></pre>
</div>
</div>
<p>The <code>argnums</code> option tells JAX that we are only differentiating wrt the first argument.</p>
<p>JIT compilation is a tiny bit more delicate. If we try the natural thing, we are going to get an error.</p>
<div class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb10-1">f_jit_bad <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jit(f)</span>
<span id="cb10-2">bad <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> f_jit_bad(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>)</span></code></pre></div>
<div class="cell-output cell-output-error">
<pre><code>ConcretizationTypeError: Abstract tracer value encountered where concrete value is expected: Traced&lt;ShapedArray(int32[], weak_type=True)&gt;with&lt;DynamicJaxprTrace(level=0/1)&gt;
When jit-compiling jnp.repeat, the total number of repeats must be static. To fix this, either specify a static value for `repeats`, or pass a static value to `total_repeat_length`.
The error occurred while tracing the function f at /var/folders/08/4p5p665j4d966tr7nvr0v24c0000gn/T/ipykernel_24749/3851190413.py:4 for jit. This concrete value was not available in Python because it depends on the value of the argument 'n'.

See https://jax.readthedocs.io/en/latest/errors.html#jax.errors.ConcretizationTypeError</code></pre>
</div>
</div>
<p>In order to compile a function, JAX needs to know how big everything is. And right now it does not know what <code>n</code> is. This shows itself through the <code>ConcretizationTypeError</code>, which basically says that as JAX was looking through your code it found something it can’t manipulate. In this case, it was in the <code>jnp.repeat</code> function.</p>
<p>We can fix this problem by declaring this parameter <code>static</code>.</p>
<div class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb12-1">f_jit <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jit(f, static_argnums<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,))</span>
<span id="cb12-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(f_jit(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>8.4</code></pre>
</div>
</div>
<p>A static parameter is a parameter value that is known at compile time. If we define <code>n</code> to be static, then the first time you call <code>f_jit(x, 7)</code> it will compile and then it will reuse the compiled code for any other value of <code>x</code>. If we then call <code>f_jit(x, 9)</code>, the code will <em>compile again</em>.</p>
<p>To see this, we can make use of a JAX oddity: if a function prints something<sup>7</sup>, then it will only be printed upon compilation and never again. This means that we can’t do <em>debug by print</em>. But on the upside, it’s easy to check, when things are compiling.</p>
<div class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb14-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> f2(x, n):</span>
<span id="cb14-2">  <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"compiling: n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>n<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb14-3">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> f(x,n)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb14-4"></span>
<span id="cb14-5">f2_jit <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jit(f2, static_argnums<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,))</span>
<span id="cb14-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(f2_jit(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>))</span>
<span id="cb14-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(f2_jit(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.8</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>))</span>
<span id="cb14-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(f2_jit(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>))</span>
<span id="cb14-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(f2_jit(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.8</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>compiling: n = 7
8.4
12.6
compiling: n = 9
10.799999
12.6</code></pre>
</div>
</div>
<p>This is a perfectly ok solution as long as the static parameters don’t change very often. In our context, this is going to have to do with the sparsity pattern.</p>
<p>Finally, we can talk about <code>jax.lax.fori_loop</code>, the in-built for loop. This is basically a convenience wrapper for <code>jax.lax.scan</code> (when <code>lower</code> and <code>upper</code> are static) or <code>jax.lax.while</code> (when they are not). The Python pseudocode is</p>
<div class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb16-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> jax_lax_fori_loop(lower, upper, body_func, init):</span>
<span id="cb16-2">  out <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> init</span>
<span id="cb16-3">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(lower, upper):</span>
<span id="cb16-4">    out <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> body_func(i, out)</span>
<span id="cb16-5">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> out</span></code></pre></div>
</div>
<p>To close out this bit where I repeat the docs, there is also a traceable if/else: <code>jax.lax.cond</code> which has the pseudocode</p>
<div class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb17-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> jax_lax_cond(pred, true_fun, false_fun, val):</span>
<span id="cb17-2">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> pred:</span>
<span id="cb17-3">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> true_fun(val)</span>
<span id="cb17-4">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span>:</span>
<span id="cb17-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> false_fun(val)</span></code></pre></div>
</div>
</section>
<section id="building-a-jax-traceable-symbolic-sparse-choleksy-factorisation" class="level2">
<h2 class="anchored" data-anchor-id="building-a-jax-traceable-symbolic-sparse-choleksy-factorisation">Building a JAX-traceable symbolic sparse Choleksy factorisation</h2>
<p>In order to build a JAX-traceable sparse Cholesky factorisation <img src="https://latex.codecogs.com/png.latex?A%20=%20LL%5ET">, we are going to need to build up a few moving parts.</p>
<ol type="1">
<li><p>Build the elimination tree of <img src="https://latex.codecogs.com/png.latex?A"> and find the number of non-zeros in each column of <img src="https://latex.codecogs.com/png.latex?L"></p></li>
<li><p>Build the <em>symbolic factorisation</em><sup>8</sup> of <img src="https://latex.codecogs.com/png.latex?L"> (aka the location of the non-zeros of <img src="https://latex.codecogs.com/png.latex?L">)</p></li>
<li><p>Do the actual numerical decomposition.</p></li>
</ol>
<p>In the <a href="https://dansblog.netlify.app/posts/2022-03-23-getting-jax-to-love-sparse-matrices/getting-jax-to-love-sparse-matrices.html">previous post</a> we did not explicitly form the elimination tree. Instead, I used dynamic memory allocation. This time I’m being more mature.</p>
<section id="building-the-expression-graph" class="level3">
<h3 class="anchored" data-anchor-id="building-the-expression-graph">Building the expression graph</h3>
<p>The elimination tree<sup>9</sup> <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BT%7D_A"> is a (forest of) rooted tree(s) that compactly represent the non-zero pattern of the Cholesky factor <img src="https://latex.codecogs.com/png.latex?L">. In particular, the elimination tree has the property that, for any <img src="https://latex.codecogs.com/png.latex?k%20%3E%20j"> , <img src="https://latex.codecogs.com/png.latex?L_%7Bkj%7D%20%5Cneq%200"> if and only if there is a path from <img src="https://latex.codecogs.com/png.latex?j"> to <img src="https://latex.codecogs.com/png.latex?k"> in the tree. Or, in the language of trees, <img src="https://latex.codecogs.com/png.latex?L_%7Bkj%7D%20%5Cneq%200"> if and only if <img src="https://latex.codecogs.com/png.latex?j"> is a descendant of <img src="https://latex.codecogs.com/png.latex?k"> in the tree <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BT%7D_A">.</p>
<p>We can describe<sup>10</sup> <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BT%7D_A"> by listing the parent of each node. The parent node of <img src="https://latex.codecogs.com/png.latex?j"> in the tree is the smallest <img src="https://latex.codecogs.com/png.latex?i%20%3E%20j"> with <img src="https://latex.codecogs.com/png.latex?L_%7Bij%7D%20%5Cneq%200">.</p>
<p>We can turn this into an algorithm. An efficient version, which is described in Tim Davies book takes about <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO(%5Ctext%7Bnnz%7D(A))%7D"> operations. But I’m going to program up a slower one that takes <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO(%5Ctext%7Bnnz%7D(L))%7D"> operations, but has the added benefit<sup>11</sup> of giving me the column counts for free.</p>
<p>To do this, we are going to walk the tree and dynamically add up the column counts as we go.</p>
<p>To start off, let’s do this in standard python so that we can see what the algorithm look like. The key concept is that if we write <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BT%7D_%7Bj-1%7D"> as the elimination tree encoding the structure of<sup>12</sup> <code>L[:j, :j]</code>, then we can ask about how this tree connects with node <code>j</code>.</p>
<p>A theorem gives a very simple answer to this.</p>
<div id="thm-tree" class="theorem">
<p><span class="theorem-title"><strong>Theorem 1</strong></span> If <img src="https://latex.codecogs.com/png.latex?j%20%3E%20i">, then <img src="https://latex.codecogs.com/png.latex?A_%7Bj,i%7D%20%5Cneq%200"> implies that <img src="https://latex.codecogs.com/png.latex?i"> is a descendant of <img src="https://latex.codecogs.com/png.latex?j"> in <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BT%7D_A">. In particular, that means that there is a directed path in <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BT%7D_A"> from <img src="https://latex.codecogs.com/png.latex?i"> to <img src="https://latex.codecogs.com/png.latex?j">.</p>
</div>
<p>This tells us that the connection between <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BT%7D_%7Bj-1%7D"> and node <img src="https://latex.codecogs.com/png.latex?j"> is that for each non-zero elements <img src="https://latex.codecogs.com/png.latex?i"> of the <img src="https://latex.codecogs.com/png.latex?j">th row of <img src="https://latex.codecogs.com/png.latex?A">, we can walk $ must have a path in <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BT%7D_%7Bj-1%7D"> from <img src="https://latex.codecogs.com/png.latex?i"> and we will eventually get to a node that has no parent in <img src="https://latex.codecogs.com/png.latex?%5C%7B0,%5Cldots,%20j-1%5C%7D">. Because there <em>must</em> be a path from <img src="https://latex.codecogs.com/png.latex?i"> to <img src="https://latex.codecogs.com/png.latex?j"> in <img src="https://latex.codecogs.com/png.latex?T_j">, it means that the parent of this terminal node must be <img src="https://latex.codecogs.com/png.latex?j">.</p>
<p>As with everything Cholesky related, this works because the algorithm proceeds from left to right, which in this case means that the node label associated with <em>any</em> descendant of <img src="https://latex.codecogs.com/png.latex?j"> is always less than <img src="https://latex.codecogs.com/png.latex?j">.</p>
<p>The algorithm is then a fairly run-of-the-mill<sup>13</sup> tree traversal, where we keep track of where we have been so we don’t double count our columns.</p>
<p>Probably the most important thing here is that I am using the <em>full</em> sparse matrix rather than just its lower triangle. This is, basically, convenience. I need access to the left half of the <img src="https://latex.codecogs.com/png.latex?j">th row of <img src="https://latex.codecogs.com/png.latex?A">, which is conveniently the same as the top half of the <img src="https://latex.codecogs.com/png.latex?j">th column. And sometimes you just don’t want to be dicking around with swapping between row- and column-based representations.</p>
<div class="cell" data-execution_count="13">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb18-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb18-2"></span>
<span id="cb18-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> etree_base(A_indices, A_indptr):</span>
<span id="cb18-4">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb18-5">  parent <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> n</span>
<span id="cb18-6">  mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> n</span>
<span id="cb18-7">  col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> n</span>
<span id="cb18-8">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n):</span>
<span id="cb18-9">    mark[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> j</span>
<span id="cb18-10">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> indptr <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(A_indptr[j], A_indptr[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]):</span>
<span id="cb18-11">      node <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_indices[indptr]</span>
<span id="cb18-12">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">while</span> node <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">and</span> mark[node] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> j:</span>
<span id="cb18-13">        <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> parent[node] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:</span>
<span id="cb18-14">          parent[node] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> j</span>
<span id="cb18-15">        mark[node] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> j</span>
<span id="cb18-16">        col_count[node] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb18-17">        node <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> parent[node]</span>
<span id="cb18-18">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (parent, col_count)</span></code></pre></div>
</div>
<p>To convince ourselves this works, let’s run an example and compare the column counts we get to our previous method.</p>
<div class="cell" data-execution_count="14">
<details class="code-fold">
<summary>Some boilerplate from previous editions.</summary>
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb19-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> scipy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> sparse</span>
<span id="cb19-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> scipy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> sp</span>
<span id="cb19-3">    </span>
<span id="cb19-4"></span>
<span id="cb19-5"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> make_matrix(n):</span>
<span id="cb19-6">  one_d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.diags([[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>n, [<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)], [<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>])</span>
<span id="cb19-7">  A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (sparse.kronsum(one_d, one_d) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> sparse.eye(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>n))</span>
<span id="cb19-8">  A_csc <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A.tocsc()</span>
<span id="cb19-9">  A_csc.eliminate_zeros()</span>
<span id="cb19-10">  A_lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.tril(A_csc, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">format</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"csc"</span>)</span>
<span id="cb19-11">  A_index <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_lower.indices</span>
<span id="cb19-12">  A_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_lower.indptr</span>
<span id="cb19-13">  A_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_lower.data</span>
<span id="cb19-14">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (A_index, A_indptr, A_x, A_csc)</span>
<span id="cb19-15"></span>
<span id="cb19-16"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> _symbolic_factor(A_indices, A_indptr):</span>
<span id="cb19-17">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Assumes A_indices and A_indptr index the lower triangle of $A$ ONLY.</span></span>
<span id="cb19-18">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb19-19">  L_sym <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [np.array([], dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n)]</span>
<span id="cb19-20">  children <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [np.array([], dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n)]</span>
<span id="cb19-21">  </span>
<span id="cb19-22">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n):</span>
<span id="cb19-23">    L_sym[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_indices[A_indptr[j]:A_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb19-24">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> child <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> children[j]:</span>
<span id="cb19-25">      tmp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_sym[child][L_sym[child] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> j]</span>
<span id="cb19-26">      L_sym[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.unique(np.append(L_sym[j], tmp))</span>
<span id="cb19-27">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_sym[j]) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:</span>
<span id="cb19-28">      p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_sym[j][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb19-29">      children[p] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.append(children[p], j)</span>
<span id="cb19-30">        </span>
<span id="cb19-31">  L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb19-32">  L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum([<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> L_sym])</span>
<span id="cb19-33">  L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.concatenate(L_sym)</span>
<span id="cb19-34">  </span>
<span id="cb19-35">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_indices, L_indptr</span></code></pre></div>
</details>
</div>
<div class="cell" data-execution_count="15">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb20-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># A_indices/A_indptr are the lower triangle, A is the entire matrix</span></span>
<span id="cb20-2">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">37</span>)</span>
<span id="cb20-3">parent, col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree_base(A.indices, A.indptr)</span>
<span id="cb20-4">L_indices, L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _symbolic_factor(A_indices, A_indptr)</span>
<span id="cb20-5"></span>
<span id="cb20-6">true_parent <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_indices[L_indptr[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb20-7">true_parent[np.where(np.diff(L_indptr[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb20-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> y <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (x,y) <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(parent[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], true_parent)))</span>
<span id="cb20-9"></span>
<span id="cb20-10">true_col_count  <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.diff(L_indptr)</span>
<span id="cb20-11"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(true_col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> col_count))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>True
True</code></pre>
</div>
</div>
<p>Excellent. Now we just need to convert it to JAX.</p>
<p>Or do we?</p>
<p>To be honest, this is a little pointless. This function is only run once per matrix so we won’t really get much speedup<sup>14</sup> from compilation.</p>
<p>Nevertheless, we might try.</p>
<div class="cell" data-execution_count="16">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb22-1"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@jit</span></span>
<span id="cb22-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> etree(A_indices, A_indptr):</span>
<span id="cb22-3"> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># print("(Re-)compiling etree(A_indices, A_indptr)")</span></span>
<span id="cb22-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## innermost while loop</span></span>
<span id="cb22-5">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_while(val):</span>
<span id="cb22-6">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#  print(val)</span></span>
<span id="cb22-7">    j, node, parent, col_count, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb22-8">    update_parent <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].at[x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>])</span>
<span id="cb22-9">    parent <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.cond(lax.eq(parent[node], <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), update_parent, <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>], (parent, node, j))</span>
<span id="cb22-10">    mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mark.at[node].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(j)</span>
<span id="cb22-11">    col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> col_count.at[node].add(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb22-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (j, parent[node], parent, col_count, mark)</span>
<span id="cb22-13"></span>
<span id="cb22-14">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> cond_while(val):</span>
<span id="cb22-15">    j, node, parent, col_count, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb22-16">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> lax.bitwise_and(lax.lt(node, j), lax.ne(mark[node], j))</span>
<span id="cb22-17"></span>
<span id="cb22-18">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Inner for loop</span></span>
<span id="cb22-19">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_inner_for(indptr, val):</span>
<span id="cb22-20">    j, A_indices, A_indptr, parent, col_count, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb22-21">    node <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_indices[indptr]</span>
<span id="cb22-22">    j, node, parent, col_count, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.while_loop(cond_while, body_while, (j, node, parent, col_count, mark))</span>
<span id="cb22-23">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (j, A_indices, A_indptr, parent, col_count, mark)</span>
<span id="cb22-24">  </span>
<span id="cb22-25">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Outer for loop</span></span>
<span id="cb22-26">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_out_for(j, val):</span>
<span id="cb22-27">     A_indices, A_indptr, parent, col_count, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb22-28">     mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mark.at[j].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(j)</span>
<span id="cb22-29">     j, A_indices, A_indptr, parent, col_count, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.fori_loop(A_indptr[j], A_indptr[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], body_inner_for, (j, A_indices, A_indptr, parent, col_count, mark))</span>
<span id="cb22-30">     <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (A_indices, A_indptr, parent, col_count, mark)</span>
<span id="cb22-31"></span>
<span id="cb22-32">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Body of code</span></span>
<span id="cb22-33">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb22-34">  parent <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.repeat(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, n)</span>
<span id="cb22-35">  mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.repeat(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, n)</span>
<span id="cb22-36">  col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.repeat(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,  n)</span>
<span id="cb22-37">  init <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (A_indices, A_indptr, parent, col_count, mark)</span>
<span id="cb22-38">  A_indices, A_indptr, parent, col_count, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.fori_loop(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n, body_out_for, init)</span>
<span id="cb22-39">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> parent, col_count</span></code></pre></div>
</div>
<p>Wow. That is <em>ugly</em>. But let’s see<sup>15</sup> if it works!</p>
<div class="cell" data-execution_count="17">
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb23-1">parent_jax, col_count_jax <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb23-2"></span>
<span id="cb23-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> y <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (x,y) <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(parent_jax[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], true_parent)))</span>
<span id="cb23-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(true_col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> col_count_jax))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>True</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>True</code></pre>
</div>
</div>
<p>Success!</p>
<p>I guess we could ask ourselves if we gained any speed.</p>
<p>Here is the pure python code.</p>
<div class="cell" data-execution_count="18">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb26-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> timeit</span>
<span id="cb26-2">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>)</span>
<span id="cb26-3"></span>
<span id="cb26-4">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: etree_base(A.indices, A.indptr),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb26-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb26-6"></span>
<span id="cb26-7">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>)</span>
<span id="cb26-8">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: etree_base(A.indices, A.indptr),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb26-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb26-10"></span>
<span id="cb26-11"></span>
<span id="cb26-12">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)</span>
<span id="cb26-13">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: etree_base(A.indices, A.indptr),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb26-14"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 400: [0.0, 0.0, 0.0, 0.0, 0.0]
n = 2500: [0.03, 0.03, 0.03, 0.03, 0.03]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 40000: [0.83, 0.82, 0.82, 0.82, 0.82]</code></pre>
</div>
</div>
<p>And here is our JAX’d and JIT’d code.</p>
<div class="cell" data-execution_count="19">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb29-1">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>)</span>
<span id="cb29-2">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: etree(A.indices, A.indptr),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb29-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb29-4"></span>
<span id="cb29-5">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>)</span>
<span id="cb29-6">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: etree(A.indices, A.indptr),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb29-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb29-8"></span>
<span id="cb29-9"></span>
<span id="cb29-10">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)</span>
<span id="cb29-11">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: etree(A.indices, A.indptr),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb29-12"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb29-13"></span>
<span id="cb29-14">parent, col_count<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb29-15">L_indices, L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _symbolic_factor(A_indices, A_indptr)</span>
<span id="cb29-16"></span>
<span id="cb29-17">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>)</span>
<span id="cb29-18">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: etree(A.indices, A.indptr),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb29-19"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 400: [0.13, 0.0, 0.0, 0.0, 0.0]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 2500: [0.12, 0.0, 0.0, 0.0, 0.0]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 40000: [0.14, 0.02, 0.02, 0.02, 0.02]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 1000000: [2.24, 2.11, 2.12, 2.12, 2.12]</code></pre>
</div>
</div>
<p>You can see that there is some decent speedup. For the first three examples, the computation time is dominated by the compilation time, but we see when the matrix has a million unknowns the compilation time is negligible. At this scale it would probably be worth using the fancy algorithm. That said, it is probably not worth sweating a three second that is only done once when your problem is that big!</p>
</section>
<section id="the-non-zero-pattern-of-l" class="level3">
<h3 class="anchored" data-anchor-id="the-non-zero-pattern-of-l">The non-zero pattern of <img src="https://latex.codecogs.com/png.latex?L"></h3>
<p>Now that we know how many non-zeros there are, it’s time to populate them. Last time, I used some dynamic memory allocation to make this work, but JAX is certainly not going to allow me to do that. So instead I’m going to have to do the worst thing possible: think.</p>
<p>The way that we went about it last time was, to be honest, a bit arse-backwards. The main reason for this is that I did not have access to the elimination tree. But now we do, we can actually use it.</p>
<p>The trick is to slightly rearrange<sup>16</sup> the order of operations to get something that is more convenient for working out the structure.</p>
<p>Recall from last time that we used the <em>left-looking</em> Cholesky factorisation, which can be written in the dense case as</p>
<div class="cell" data-execution_count="20">
<div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb34-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> dense_left_cholesky(A):</span>
<span id="cb34-2">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb34-3">  L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros_like(A)</span>
<span id="cb34-4">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n):</span>
<span id="cb34-5">    L[j,j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.sqrt(A[j,j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> np.inner(L[j, :j], L[j, :j]))</span>
<span id="cb34-6">    L[(j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>):, j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (A[(j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>):, j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> L[(j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>):, :j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> L[j, :j].transpose()) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> L[j,j]</span>
<span id="cb34-7">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L</span></code></pre></div>
</div>
<p>This is not the only way to organise those operations. An alternative is the <em>up-looking</em> Cholesky factorisation, which can be implemented in the dense case as</p>
<div class="cell" data-execution_count="21">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb35-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> dense_up_cholesky(A):</span>
<span id="cb35-2">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb35-3">  L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros_like(A)</span>
<span id="cb35-4">  L[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.sqrt(A[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb35-5">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,n):</span>
<span id="cb35-6">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#if i &gt; 0:</span></span>
<span id="cb35-7">    L[i, :i] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (np.linalg.solve(L[:i, :i], A[:i,i])).transpose()</span>
<span id="cb35-8">    L[i, i] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.sqrt(A[i,i] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> np.inner(L[i, :i], L[i, :i]))</span>
<span id="cb35-9">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L</span></code></pre></div>
</div>
<p>This is quite a different looking beast! It scans row by row rather than column by column. And while the left-looking algorithm is based on matrix-vector multiplies, the up-looking algorithm is based on triangular solves. So maybe we should pause for a moment to check that these are the same algorithm!</p>
<div class="cell" data-execution_count="22">
<div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb36-1">A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.rand(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>)</span>
<span id="cb36-2">A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> A.transpose()</span>
<span id="cb36-3">A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A.transpose() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>np.eye(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>)</span>
<span id="cb36-4"></span>
<span id="cb36-5">L_left <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dense_left_cholesky(A)</span>
<span id="cb36-6">L_up <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dense_up_cholesky(A)</span>
<span id="cb36-7"></span>
<span id="cb36-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>((L_left <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> L_up)[:])))),<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>0 2</code></pre>
</div>
</div>
<p>They are the same!!</p>
<p>The reason for considering the up-looking algorithm is that it gives a slightly nicer description of the non-zeros of row <code>i</code>, which will let us find the location of the non-zeros in the whole matrix. In particular, the non-zeros to the left of the diagonal on row <code>i</code> correspond to the non-zero indices of the solution to the lower triangular linear system<sup>17</sup> <img src="https://latex.codecogs.com/png.latex?%0AL_%7B1:(i-1),1:(i-1)%7D%20x%5E%7B(i)%7D%20=%20A_%7B1:i-1,%20i%7D.%0A"> Because <img src="https://latex.codecogs.com/png.latex?A"> is sparse, this is a system of <img src="https://latex.codecogs.com/png.latex?%5Coperatorname%7Bnnz%7D(A_%7B1:i-1,i%7D)"> linear equations, rather than <img src="https://latex.codecogs.com/png.latex?(i-1)"> equations that we would have in the dense case. That means that the sparsity pattern of <img src="https://latex.codecogs.com/png.latex?x%5E%7B(i)%7D"> will be the union of the sparsity patterns of the columns of <img src="https://latex.codecogs.com/png.latex?L_%7B1:(i-1),1:(i-1)%7D"> that correspond to the non-zero entries of <img src="https://latex.codecogs.com/png.latex?A_%7B1:i-1,%20i%7D">.</p>
<p>This means two things. Firstly, if <img src="https://latex.codecogs.com/png.latex?A_%7Bji%7D%5Cneq%200">, then <img src="https://latex.codecogs.com/png.latex?x%5E%7B(i)%7D_j%20%5Cneq%200">. Secondly, if $x^{(i)}_j $ <em>and</em> <img src="https://latex.codecogs.com/png.latex?L_%7Bkj%7D%5Cneq%200">, then <img src="https://latex.codecogs.com/png.latex?x_k%20%5Cneq%200">. These two facts give us a way of finding the non-zero set of <img src="https://latex.codecogs.com/png.latex?x%5E%7B(i)%7D"> if we remember just one more fact: a definition of the elimination tree is that <img src="https://latex.codecogs.com/png.latex?L_%7Bkj%7D%20%5Cneq%200"> if <img src="https://latex.codecogs.com/png.latex?j"> is a descendant of <img src="https://latex.codecogs.com/png.latex?k"> in the elimination tree.</p>
<p>This reduces the problem of finding the non-zero elements of <img src="https://latex.codecogs.com/png.latex?x%5E%7B(i)%7D"> to the problem of finding all of the descendants of <img src="https://latex.codecogs.com/png.latex?%5C%7Bj:%20A_%7Bji%7D%20%5Cneq%200%5C%7D"> in the subtree <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BT%7D_%7Bi-1%7D">. And if there is one thing that people who are ok at programming are <em>excellent</em> at it is walking down a damn tree.</p>
<p>So let’s do that. Well, I’ve already done it. In fact, that was how I found the column counts in the first place! With this interpretation, the outer loop is taking us across the rows. And once I am in row <code>j</code><sup>18</sup>, I then find a starting node <code>node</code> (which is a non-zero in <img src="https://latex.codecogs.com/png.latex?A_%7B1:(i-1),i%7D">) and I walk along that node checking each time if I’ve actually seen that node<sup>19</sup> before. If I haven’t seen it before, I added one to the column count of column <code>node</code><sup>20</sup>.</p>
<p>To allocate the non-zero structure, I just need to replace that counter increment with an assignment.</p>
</section>
<section id="attempt-1-lord-thats-slow" class="level3">
<h3 class="anchored" data-anchor-id="attempt-1-lord-thats-slow">Attempt 1: Lord that’s slow</h3>
<p>We will do the pure python version first.</p>
<div class="cell" data-execution_count="23">
<div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb38-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> symbolic_cholesky_base(A_indices, A_indptr, parent, col_count):</span>
<span id="cb38-2">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb38-3">  col_ptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.repeat(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb38-4">  col_ptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> np.cumsum(col_count) </span>
<span id="cb38-5">  L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(col_count), dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb38-6">  L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb38-7">  L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb38-8">  mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> n</span>
<span id="cb38-9"></span>
<span id="cb38-10">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n):</span>
<span id="cb38-11">    mark[i] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> i</span>
<span id="cb38-12">    L_indices[L_indptr[i]] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> i</span>
<span id="cb38-13"></span>
<span id="cb38-14">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> indptr <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(A_indptr[i], A_indptr[i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]):</span>
<span id="cb38-15">      node <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_indices[indptr]</span>
<span id="cb38-16">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">while</span> node <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">and</span> mark[node] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> i:</span>
<span id="cb38-17">        mark[node] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> i</span>
<span id="cb38-18">        L_indices[col_ptr[node]] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> i</span>
<span id="cb38-19">        col_ptr[node] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb38-20">        node <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> parent[node]</span>
<span id="cb38-21">  </span>
<span id="cb38-22">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (L_indices, L_indptr)</span></code></pre></div>
</div>
<p>Does it work?</p>
<div class="cell" data-execution_count="24">
<div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb39-1">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span>)</span>
<span id="cb39-2">parent, col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree_base(A.indices, A.indptr)</span>
<span id="cb39-3"></span>
<span id="cb39-4">L_indices, L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> symbolic_cholesky_base(A.indices, A.indptr, parent, col_count)</span>
<span id="cb39-5">L_indices_true, L_indptr_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _symbolic_factor(A_indices, A_indptr)</span>
<span id="cb39-6"></span>
<span id="cb39-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>y <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (x,y) <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(L_indices, L_indices_true)))</span>
<span id="cb39-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>y <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (x,y) <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(L_indptr, L_indptr_true)))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>True
True</code></pre>
</div>
</div>
<p>Fabulosa!</p>
<p>Now let’s do the compiled version.</p>
<div class="cell" data-execution_count="25">
<div class="sourceCode cell-code" id="cb41" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb41-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> functools <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> partial</span>
<span id="cb41-2"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@partial</span>(jit, static_argnums <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,))</span>
<span id="cb41-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> symbolic_cholesky(A_indices, A_indptr, L_indptr, parent, nnz):</span>
<span id="cb41-4">  </span>
<span id="cb41-5">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## innermost while loop</span></span>
<span id="cb41-6">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_while(val):</span>
<span id="cb41-7">    i, L_indices, L_indptr, node, parent, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb41-8">    mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mark.at[node].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(i)</span>
<span id="cb41-9">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#p = </span></span>
<span id="cb41-10">    L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_indices.at[col_ptr[node]].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(i)</span>
<span id="cb41-11">    col_ptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> col_ptr.at[node].add(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb41-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (i, L_indices, L_indptr, parent[node], parent, col_ptr, mark)</span>
<span id="cb41-13"></span>
<span id="cb41-14">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> cond_while(val):</span>
<span id="cb41-15">    i, L_indices, L_indptr, node, parent, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb41-16">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> lax.bitwise_and(lax.lt(node, i), lax.ne(mark[node], i))</span>
<span id="cb41-17"></span>
<span id="cb41-18">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Inner for loop</span></span>
<span id="cb41-19">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_inner_for(indptr, val):</span>
<span id="cb41-20">    i, A_indices, A_indptr, L_indices, L_indptr, parent, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb41-21">    node <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_indices[indptr]</span>
<span id="cb41-22">    i, L_indices, L_indptr, node, parent, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.while_loop(cond_while, body_while, (i, L_indices, L_indptr, node, parent, col_ptr, mark))</span>
<span id="cb41-23">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (i, A_indices, A_indptr, L_indices, L_indptr, parent, col_ptr, mark)</span>
<span id="cb41-24">  </span>
<span id="cb41-25">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Outer for loop</span></span>
<span id="cb41-26">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_out_for(i, val):</span>
<span id="cb41-27">     A_indices, A_indptr, L_indices, L_indptr, parent, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb41-28">     mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mark.at[i].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(i)</span>
<span id="cb41-29">     L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_indices.at[L_indptr[i]].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(i)</span>
<span id="cb41-30">     i, A_indices, A_indptr, L_indices, L_indptr, parent, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.fori_loop(A_indptr[i], A_indptr[i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], body_inner_for, (i, A_indices, A_indptr, L_indices, L_indptr, parent, col_ptr, mark))</span>
<span id="cb41-31">     <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (A_indices, A_indptr, L_indices, L_indptr, parent, col_ptr, mark)</span>
<span id="cb41-32"></span>
<span id="cb41-33">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Body of code</span></span>
<span id="cb41-34">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb41-35">  col_ptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb41-36">  L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.zeros(nnz, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb41-37">  </span>
<span id="cb41-38">  mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.repeat(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, n)</span>
<span id="cb41-39">  </span>
<span id="cb41-40">  init <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (A_indices, A_indptr, L_indices, L_indptr, parent, col_ptr, mark)</span>
<span id="cb41-41">  A_indices, A_indptr, L_indices, L_indptr, parent, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.fori_loop(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n, body_out_for, init)</span>
<span id="cb41-42">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_indices</span></code></pre></div>
</div>
<p>Now let’s check it works</p>
<div class="cell" data-execution_count="26">
<div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb42-1">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>)</span>
<span id="cb42-2">parent, col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb42-3">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb42-4">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb42-5"></span>
<span id="cb42-6"></span>
<span id="cb42-7">L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> symbolic_cholesky(A.indices, A.indptr, L_indptr, parent, nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb42-8">L_indices_true, L_indptr_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _symbolic_factor(A_indices, A_indptr)</span>
<span id="cb42-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_indices_true))</span>
<span id="cb42-10"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_indptr_true))</span>
<span id="cb42-11"></span>
<span id="cb42-12">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">31</span>)</span>
<span id="cb42-13"></span>
<span id="cb42-14">parent, col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb42-15">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb42-16">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb42-17"></span>
<span id="cb42-18"></span>
<span id="cb42-19">L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> symbolic_cholesky(A.indices, A.indptr, L_indptr, parent, nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb42-20">L_indices_true, L_indptr_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _symbolic_factor(A_indices, A_indptr)</span>
<span id="cb42-21"></span>
<span id="cb42-22"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_indices_true))</span>
<span id="cb42-23"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_indptr_true))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>True
True</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>True
True</code></pre>
</div>
</div>
<p>Success!</p>
<p>One <em>minor</em> problem. This is slow. as. balls.</p>
<div class="cell" data-execution_count="27">
<div class="sourceCode cell-code" id="cb45" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb45-1">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>)</span>
<span id="cb45-2">parent, col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree_base(A.indices, A.indptr)</span>
<span id="cb45-3">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: symbolic_cholesky_base(A.indices, A.indptr, parent, col_count),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb45-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb45-5"></span>
<span id="cb45-6">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)</span>
<span id="cb45-7">parent, col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree_base(A.indices, A.indptr)</span>
<span id="cb45-8">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: symbolic_cholesky_base(A.indices, A.indptr, parent, col_count),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb45-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 2500: [0.05, 0.04, 0.04, 0.04, 0.04]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 40000: [1.97, 2.09, 2.04, 2.03, 1.92]</code></pre>
</div>
</div>
<p>And here is our JAX’d and JIT’d code.</p>
<div class="cell" data-execution_count="28">
<div class="sourceCode cell-code" id="cb48" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb48-1">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>)</span>
<span id="cb48-2">parent, col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb48-3">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb48-4">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb48-5">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>:symbolic_cholesky(A.indices, A.indptr, L_indptr, parent, nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb48-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb48-7"></span>
<span id="cb48-8">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)</span>
<span id="cb48-9">parent, col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb48-10">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb48-11">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb48-12">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>:symbolic_cholesky(A.indices, A.indptr, L_indptr, parent, nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb48-13"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 2500: [0.15]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 40000: [29.19]</code></pre>
</div>
</div>
<p>Oooof. Something is going horribly wrong.</p>
</section>
<section id="why-is-it-so-slow" class="level3">
<h3 class="anchored" data-anchor-id="why-is-it-so-slow">Why is it so slow?</h3>
<p>The first thing to check is if it’s the compile time. We can do this by explicitly <em>lowering</em> the the JIT’d function to its XLA representation and then compiling it.</p>
<div class="cell" data-execution_count="29">
<div class="sourceCode cell-code" id="cb51" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb51-1">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>)</span>
<span id="cb51-2">parent, col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb51-3">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb51-4">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb51-5">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: jit(partial(symbolic_cholesky, nnz<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]))).lower(A.indices, A.indptr, L_indptr, parent).<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">compile</span>(),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb51-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Compilation time: n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb51-7"></span>
<span id="cb51-8">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)</span>
<span id="cb51-9">parent, col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb51-10">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb51-11">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb51-12">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: jit(partial(symbolic_cholesky, nnz<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]))).lower(A.indices, A.indptr, L_indptr, parent).<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">compile</span>(),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb51-13"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Compilation time: n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Compilation time: n = 2500: [0.15, 0.15, 0.15, 0.16, 0.15]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>Compilation time: n = 40000: [0.16, 0.15, 0.14, 0.16, 0.15]</code></pre>
</div>
</div>
<p>It is not the compile time.</p>
<p>And that is actually a good thing because that suggests that we aren’t having problems with the compiler unrolling all of our wonderful loops! But that does mean that we have to look a bit deeper into the code. Some smart people would probably be able to look at the <code>jaxpr</code> intermediate representation to diagnose the problem. But I couldn’t see anything there.</p>
<p>Instead I thought <em>if I were a clever, efficient compiler, what would I have problems with?</em>. And the answer is the classic sparse matrix answer: indirect indexing.</p>
<p>The only structural difference between the <code>etree</code> function and the <code>symbolic_cholesky</code> function is this line in the <code>body_while()</code> function:</p>
<div class="cell" data-execution_count="30">
<div class="sourceCode cell-code" id="cb54" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb54-1"> L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_indices.at[col_ptr[node]].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(i)</span></code></pre></div>
</div>
<p>In order to evaluate this code, the compiler has to resolve <em>two levels</em> of indirection. By contrast, the indexing in <code>etree()</code> was always direct. So let’s see what happens if we take the same function and remove that double indirection.</p>
<div class="cell" data-execution_count="31">
<div class="sourceCode cell-code" id="cb55" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb55-1"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@partial</span>(jit, static_argnums <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,))</span>
<span id="cb55-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> test_fun(A_indices, A_indptr, L_indptr, parent, nnz):</span>
<span id="cb55-3">  </span>
<span id="cb55-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## innermost while loop</span></span>
<span id="cb55-5">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_while(val):</span>
<span id="cb55-6">    i, L_indices, L_indptr, node, parent, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb55-7">    mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mark.at[node].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(i)</span>
<span id="cb55-8">    L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_indices.at[node].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(i)</span>
<span id="cb55-9">    col_ptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> col_ptr.at[node].add(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb55-10">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (i, L_indices, L_indptr, parent[node], parent, col_ptr, mark)</span>
<span id="cb55-11"></span>
<span id="cb55-12">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> cond_while(val):</span>
<span id="cb55-13">    i, L_indices, L_indptr, node, parent, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb55-14">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> lax.bitwise_and(lax.lt(node, i), lax.ne(mark[node], i))</span>
<span id="cb55-15"></span>
<span id="cb55-16">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Inner for loop</span></span>
<span id="cb55-17">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_inner_for(indptr, val):</span>
<span id="cb55-18">    i, A_indices, A_indptr, L_indices, L_indptr, parent, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb55-19">    node <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_indices[indptr]</span>
<span id="cb55-20">    i, L_indices, L_indptr, node, parent, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.while_loop(cond_while, body_while, (i, L_indices, L_indptr, node, parent, col_ptr, mark))</span>
<span id="cb55-21">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (i, A_indices, A_indptr, L_indices, L_indptr, parent, col_ptr, mark)</span>
<span id="cb55-22">  </span>
<span id="cb55-23">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Outer for loop</span></span>
<span id="cb55-24">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_out_for(i, val):</span>
<span id="cb55-25">     A_indices, A_indptr, L_indices, L_indptr, parent, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb55-26">     mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mark.at[i].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(i)</span>
<span id="cb55-27">     L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_indices.at[L_indptr[i]].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(i)</span>
<span id="cb55-28">     i, A_indices, A_indptr, L_indices, L_indptr, parent, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.fori_loop(A_indptr[i], A_indptr[i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], body_inner_for, (i, A_indices, A_indptr, L_indices, L_indptr, parent, col_ptr, mark))</span>
<span id="cb55-29">     <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (A_indices, A_indptr, L_indices, L_indptr, parent, col_ptr, mark)</span>
<span id="cb55-30"></span>
<span id="cb55-31">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Body of code</span></span>
<span id="cb55-32">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb55-33">  col_ptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb55-34">  L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.zeros(nnz, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb55-35">  </span>
<span id="cb55-36">  mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.repeat(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, n)</span>
<span id="cb55-37">  </span>
<span id="cb55-38">  init <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (A_indices, A_indptr, L_indices, L_indptr, parent, col_ptr, mark)</span>
<span id="cb55-39">  A_indices, A_indptr, L_indices, L_indptr, parent, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.fori_loop(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n, body_out_for, init)</span>
<span id="cb55-40">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_indices</span>
<span id="cb55-41"></span>
<span id="cb55-42">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>)</span>
<span id="cb55-43">parent, col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb55-44">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb55-45">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb55-46">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>:test_fun(A.indices, A.indptr, L_indptr, parent, nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb55-47"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb55-48"></span>
<span id="cb55-49">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)</span>
<span id="cb55-50">parent, col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb55-51">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb55-52">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb55-53">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>:test_fun(A.indices, A.indptr, L_indptr, parent, nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb55-54"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 2500: [0.14]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 40000: [0.17]</code></pre>
</div>
</div>
<p>That isn’t conclusive, but it does indicate that this might<sup>21</sup> be the problem.</p>
<p>And this is a <em>big</em> problem for us! The sparse Cholesky algorithm has similar amounts of indirection. So we need to fix it.</p>
</section>
<section id="attempt-2-after-some-careful-thought-things-stayed-the-same" class="level3">
<h3 class="anchored" data-anchor-id="attempt-2-after-some-careful-thought-things-stayed-the-same">Attempt 2: After some careful thought, things stayed the same</h3>
<p>Now. I want to pretend that I’ve got elegant ideas about this. But I don’t. So let’s just do it. The most obvious thing to do is to use the algorithm to get the non-zero structure of the <em>rows</em> of <img src="https://latex.codecogs.com/png.latex?L">. These are the things that are being indexed by <code>col_ptr[node]]</code>, so if we have these explicitly we don’t need multiple indirection. We also don’t need a while loop.</p>
<p>In fact, if we have the non-zero structure of the rows of <img src="https://latex.codecogs.com/png.latex?L">, we can turn that into the non-zero structure of the columns in linear-ish<sup>22</sup> time.</p>
<p>All we need to do is make sure that our <code>etree()</code> function is also counting the number of nonzeros in each row.</p>
<div class="cell" data-execution_count="32">
<div class="sourceCode cell-code" id="cb58" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb58-1"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@jit</span></span>
<span id="cb58-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> etree(A_indices, A_indptr):</span>
<span id="cb58-3"> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># print("(Re-)compiling etree(A_indices, A_indptr)")</span></span>
<span id="cb58-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## innermost while loop</span></span>
<span id="cb58-5">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_while(val):</span>
<span id="cb58-6">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#  print(val)</span></span>
<span id="cb58-7">    j, node, parent, col_count, row_count, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb58-8">    update_parent <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].at[x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>])</span>
<span id="cb58-9">    parent <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.cond(lax.eq(parent[node], <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), update_parent, <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>], (parent, node, j))</span>
<span id="cb58-10">    mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mark.at[node].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(j)</span>
<span id="cb58-11">    col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> col_count.at[node].add(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb58-12">    row_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> row_count.at[j].add(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb58-13">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (j, parent[node], parent, col_count, row_count, mark)</span>
<span id="cb58-14"></span>
<span id="cb58-15">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> cond_while(val):</span>
<span id="cb58-16">    j, node, parent, col_count, row_count, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb58-17">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> lax.bitwise_and(lax.lt(node, j), lax.ne(mark[node], j))</span>
<span id="cb58-18"></span>
<span id="cb58-19">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Inner for loop</span></span>
<span id="cb58-20">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_inner_for(indptr, val):</span>
<span id="cb58-21">    j, A_indices, A_indptr, parent, col_count, row_count, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb58-22">    node <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_indices[indptr]</span>
<span id="cb58-23">    j, node, parent, col_count, row_count, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.while_loop(cond_while, body_while, (j, node, parent, col_count, row_count, mark))</span>
<span id="cb58-24">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (j, A_indices, A_indptr, parent, col_count, row_count, mark)</span>
<span id="cb58-25">  </span>
<span id="cb58-26">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Outer for loop</span></span>
<span id="cb58-27">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_out_for(j, val):</span>
<span id="cb58-28">     A_indices, A_indptr, parent, col_count, row_count, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb58-29">     mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mark.at[j].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(j)</span>
<span id="cb58-30">     j, A_indices, A_indptr, parent, col_count, row_count, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.fori_loop(A_indptr[j], A_indptr[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], body_inner_for, (j, A_indices, A_indptr, parent, col_count, row_count, mark))</span>
<span id="cb58-31">     <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (A_indices, A_indptr, parent, col_count, row_count, mark)</span>
<span id="cb58-32"></span>
<span id="cb58-33">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Body of code</span></span>
<span id="cb58-34">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb58-35">  parent <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.repeat(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, n)</span>
<span id="cb58-36">  mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.repeat(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, n)</span>
<span id="cb58-37">  col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.repeat(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,  n)</span>
<span id="cb58-38">  row_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.repeat(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, n)</span>
<span id="cb58-39">  init <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (A_indices, A_indptr, parent, col_count, row_count, mark)</span>
<span id="cb58-40">  A_indices, A_indptr, parent, col_count, row_count, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.fori_loop(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n, body_out_for, init)</span>
<span id="cb58-41">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (parent, col_count, row_count)</span></code></pre></div>
</div>
<p>Let’s check that the code is actually doing what I want.</p>
<div class="cell" data-execution_count="33">
<div class="sourceCode cell-code" id="cb59" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb59-1">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">57</span>)</span>
<span id="cb59-2">parent, col_count, row_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb59-3">L_indices, L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _symbolic_factor(A_indices, A_indptr)</span>
<span id="cb59-4"></span>
<span id="cb59-5">true_parent <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_indices[L_indptr[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb59-6">true_parent[np.where(np.diff(L_indptr[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb59-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> y <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (x,y) <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(parent[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], true_parent)))</span>
<span id="cb59-8"></span>
<span id="cb59-9">true_col_count  <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.diff(L_indptr)</span>
<span id="cb59-10"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(true_col_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> col_count))</span>
<span id="cb59-11"></span>
<span id="cb59-12">true_row_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.array([<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(np.where(L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> i)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">57</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)])</span>
<span id="cb59-13"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(true_row_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> row_count))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>True
True</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>True</code></pre>
</div>
</div>
<p>Excellent! With this we can modify our previous function to give us the row-indices of the non-zero pattern instead. Just for further chaos, please note that we are using a CSC representation of <img src="https://latex.codecogs.com/png.latex?A"> to get a CSR representation of <img src="https://latex.codecogs.com/png.latex?L">.</p>
<p>Once again, we will prototype in pure python and then translate to JAX. The thing to look out for this time is that we <em>know</em> how many non-zeros there are in a row and we know where we need to put them. This suggests that we can compute these things in <code>body_inner_for</code> and then do a vectorised version of our indirect indexing. This should compile down to a single <a href="https://www.tensorflow.org/xla/operation_semantics#scatter">XLA <code>scatter</code> call</a>. This will reduce the number of overall <code>scatter</code> calls from <img src="https://latex.codecogs.com/png.latex?%5Coperatorname(nnz)(L)"> to <img src="https://latex.codecogs.com/png.latex?n">. And hopefully this will fix things.</p>
<div class="cell" data-execution_count="34">
<div class="sourceCode cell-code" id="cb62" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb62-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> symbolic_cholesky2_base(A_indices, A_indptr, L_indptr, row_count, parent, nnz):</span>
<span id="cb62-2"></span>
<span id="cb62-3">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb62-4">  col_ptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb62-5">  L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb62-6">  mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> n</span>
<span id="cb62-7"></span>
<span id="cb62-8">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n):</span>
<span id="cb62-9">    mark[i] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> i</span>
<span id="cb62-10">    row_ind <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.repeat(nnz<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, row_count[i])</span>
<span id="cb62-11">    row_ind[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_indptr[i]</span>
<span id="cb62-12">    counter <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb62-13">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> indptr <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(A_indptr[i], A_indptr[i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]):</span>
<span id="cb62-14">      node <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_indices[indptr]</span>
<span id="cb62-15">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">while</span> node <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">and</span> mark[node] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> i:</span>
<span id="cb62-16">        mark[node] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> i</span>
<span id="cb62-17">        row_ind[counter] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> col_ptr[node]</span>
<span id="cb62-18">        col_ptr[node] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb62-19">        node <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> parent[node]</span>
<span id="cb62-20">        counter <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb62-21">    L_indices[row_ind] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> i</span>
<span id="cb62-22">  </span>
<span id="cb62-23">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_indices</span>
<span id="cb62-24"></span>
<span id="cb62-25"></span>
<span id="cb62-26">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span>)</span>
<span id="cb62-27">parent, col_count, row_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb62-28">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb62-29">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb62-30"></span>
<span id="cb62-31"></span>
<span id="cb62-32">L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> symbolic_cholesky2_base(A.indices, A.indptr, L_indptr, row_count, parent, L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb62-33">L_indices_true, L_indptr_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _symbolic_factor(A_indices, A_indptr)</span>
<span id="cb62-34"></span>
<span id="cb62-35"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>y <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (x,y) <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(L_indices, L_indices_true)))</span>
<span id="cb62-36"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>y <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (x,y) <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(L_indptr, L_indptr_true)))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>True
True</code></pre>
</div>
</div>
<p>Excellent. Now let’s JAX this. The JAX-heads among you will notice that we have a subtle<sup>23</sup> problem: in a <code>fori_loop</code>, JAX does not treat <code>i</code> as static, which means that the length of the repeat (<code>row_count[i]</code>) can never be static and it therefore can’t be traced.</p>
<p>Shit.</p>
<p>It is hard to think of a good option here. A few months back Junpeng Lao<sup>24</sup> <a href="https://gist.github.com/junpenglao/f5b48c34dd8ea5029202fb607806ea0f#file-sparse-cholesky-in-jax-ipynb">sent me a script</a> with his attempts at making the Cholesky stuff JAX transformable. And he hit the same problem. I was, in an act of hubris, trying very hard to not end up here. But that was tragically slow. So here we are.</p>
<p>He came up with two methods.</p>
<ol type="1">
<li><p>Pad out <code>row_ind</code> so it’s always long enough. This only costs memory. The maximum size of <code>row_ind</code> is <code>n</code>. Unfortunately, this happens whenever <img src="https://latex.codecogs.com/png.latex?A"> has a dense row. Sadly, for Bayesian<sup>25</sup> linear mixed models this will happen if we put Gaussian priors on the covariate coefficients<sup>26</sup> and we try to marginalise them out with the other multivariate Gaussian parts. It is possible to write the routines that deal with dense rows and columns explicitly, but it’s a pain in the arse.</p></li>
<li><p>Do some terrifying work with <code>lax.scan</code> and dynamic slicing.</p></li>
</ol>
<p>I’m going to try the first of these options.</p>
<div class="cell" data-execution_count="35">
<div class="sourceCode cell-code" id="cb64" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb64-1"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@partial</span>(jit, static_argnums <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>))</span>
<span id="cb64-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> symbolic_cholesky2(A_indices, A_indptr, L_indptr, row_count, parent, nnz, max_row):</span>
<span id="cb64-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## innermost while loop</span></span>
<span id="cb64-4">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_while(val):</span>
<span id="cb64-5">    i, counter, row_ind, node, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb64-6">    mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mark.at[node].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(i)</span>
<span id="cb64-7">    row_ind <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> row_ind.at[counter].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(col_ptr[node])</span>
<span id="cb64-8">    col_ptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> col_ptr.at[node].add(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb64-9">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (i, counter <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, row_ind, parent[node], col_ptr, mark)</span>
<span id="cb64-10"></span>
<span id="cb64-11">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> cond_while(val):</span>
<span id="cb64-12">    i, counter, row_ind, node, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb64-13">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> lax.bitwise_and(lax.lt(node, i), lax.ne(mark[node], i))</span>
<span id="cb64-14"></span>
<span id="cb64-15">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Inner for loop</span></span>
<span id="cb64-16">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_inner_for(indptr, val):</span>
<span id="cb64-17">    i, counter, row_ind, parent, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb64-18">    node <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_indices[indptr]</span>
<span id="cb64-19">    i, counter, row_ind, node, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.while_loop(cond_while, body_while, (i, counter, row_ind, node, col_ptr, mark))</span>
<span id="cb64-20">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (i, counter, row_ind, parent, col_ptr, mark)</span>
<span id="cb64-21">  </span>
<span id="cb64-22">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Outer for loop</span></span>
<span id="cb64-23">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_out_for(i, val):</span>
<span id="cb64-24">     L_indices, parent, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb64-25">     mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mark.at[i].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(i)</span>
<span id="cb64-26">     row_ind <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.repeat(nnz<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, max_row)</span>
<span id="cb64-27">     row_ind <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> row_ind.at[row_count[i]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(L_indptr[i])</span>
<span id="cb64-28">     counter <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb64-29"></span>
<span id="cb64-30">     i, counter, row_ind, parent, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.fori_loop(A_indptr[i], A_indptr[i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], body_inner_for, (i, counter, row_ind, parent, col_ptr, mark))</span>
<span id="cb64-31"></span>
<span id="cb64-32">     L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_indices.at[row_ind].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(i, mode <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"drop"</span>)</span>
<span id="cb64-33">     <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (L_indices, parent, col_ptr, mark)</span>
<span id="cb64-34"></span>
<span id="cb64-35">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Body of code</span></span>
<span id="cb64-36">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb64-37"></span>
<span id="cb64-38">  col_ptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb64-39">  L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.ones(nnz, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb64-40">  mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.repeat(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, n)</span>
<span id="cb64-41"></span>
<span id="cb64-42">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Make everything a jnp array. Really should use jaxtyping</span></span>
<span id="cb64-43">  A_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(A_indices)</span>
<span id="cb64-44">  A_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(A_indptr)</span>
<span id="cb64-45">  L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(L_indptr)</span>
<span id="cb64-46">  row_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(row_count)</span>
<span id="cb64-47">  parent <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(parent)</span>
<span id="cb64-48"></span>
<span id="cb64-49">  init <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (L_indices, parent, col_ptr, mark)</span>
<span id="cb64-50">  L_indices, parent, col_ptr, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.fori_loop(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n, body_out_for, init)</span>
<span id="cb64-51">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_indices</span></code></pre></div>
</div>
<p>Ok. Let’s see if that worked.</p>
<div class="cell" data-execution_count="36">
<div class="sourceCode cell-code" id="cb65" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb65-1">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>)</span>
<span id="cb65-2">parent, col_count, row_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb65-3">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb65-4">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb65-5"></span>
<span id="cb65-6"></span>
<span id="cb65-7">L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> symbolic_cholesky2(A.indices, A.indptr, L_indptr, row_count, parent, nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]), max_row <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>(row_count)))</span>
<span id="cb65-8">L_indices_true, L_indptr_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _symbolic_factor(A_indices, A_indptr)</span>
<span id="cb65-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_indices_true))</span>
<span id="cb65-10"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_indptr_true))</span>
<span id="cb65-11"></span>
<span id="cb65-12">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">31</span>)</span>
<span id="cb65-13">parent, col_count, row_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb65-14">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb65-15">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb65-16"></span>
<span id="cb65-17"></span>
<span id="cb65-18">L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> symbolic_cholesky2(A.indices, A.indptr, L_indptr, row_count, parent, nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]), max_row <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>(row_count)))</span>
<span id="cb65-19">L_indices_true, L_indptr_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _symbolic_factor(A_indices, A_indptr)</span>
<span id="cb65-20"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_indices_true))</span>
<span id="cb65-21"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_indptr_true))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>True
True</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>True
True</code></pre>
</div>
</div>
<p>Ok. Once more into the breach. Is this any better?</p>
<div class="cell" data-execution_count="37">
<div class="sourceCode cell-code" id="cb68" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb68-1">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>)</span>
<span id="cb68-2">parent, col_count, row_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb68-3">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb68-4">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb68-5">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>:symbolic_cholesky2(A.indices, A.indptr, L_indptr, row_count, parent, nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]), max_row <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>(row_count))),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb68-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb68-7"></span>
<span id="cb68-8">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)</span>
<span id="cb68-9">parent, col_count, row_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb68-10">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb68-11">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb68-12">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>:symbolic_cholesky2(A.indices, A.indptr, L_indptr, row_count, parent, nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]), max_row <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>(row_count))),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb68-13"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb68-14"></span>
<span id="cb68-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># A_indices, A_indptr, A_x, A = make_matrix(300)</span></span>
<span id="cb68-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># parent, col_count, row_count = etree(A.indices, A.indptr)</span></span>
<span id="cb68-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># L_indptr = np.zeros(A.shape[0]+1, dtype=int)</span></span>
<span id="cb68-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># L_indptr[1:] = np.cumsum(col_count)</span></span>
<span id="cb68-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># times = timeit.repeat(lambda:symbolic_cholesky2(A.indices, A.indptr, L_indptr, row_count, parent, nnz = int(L_indptr[-1]), max_row = int(max(row_count))),number = 1, repeat = 1)</span></span>
<span id="cb68-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># print(f"n = {A.shape[0]}: {[round(t,2) for t in times]}")</span></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 2500: [0.28]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 40000: [28.31]</code></pre>
</div>
</div>
<p>Fuck.</p>
</section>
<section id="attempt-3-a-desperate-attempt-to-make-this-bloody-work" class="level3">
<h3 class="anchored" data-anchor-id="attempt-3-a-desperate-attempt-to-make-this-bloody-work">Attempt 3: A desperate attempt to make this bloody work</h3>
<p>Right. Let’s try again. What if instead of doing all those scatters we instead, idk, just store two vectors and sort. Because at this point I will try fucking anything. What if we just list out the column index and row index as we find them (aka build the sparse matrix in COO<sup>27</sup> format. The <code>jax.experimental.sparse</code> module has support for (blocked) COO objects but doesn’t implement this transformation. <code>scipy.sparse</code> has a fast conversion routine so I’m going to use it. In the interest of being 100% JAX, I tried a version with <code>jnp.lexsort[index[1][jnp.lexsort((index[1], index[0]))]</code>], which basically does the same thing but it’s a lot slower.</p>
<div class="cell" data-execution_count="38">
<div class="sourceCode cell-code" id="cb71" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb71-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> symbolic_cholesky3(A_indices, A_indptr, L_indptr, parent, nnz):</span>
<span id="cb71-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@partial</span>(jit, static_argnums <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,))</span>
<span id="cb71-3">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> _inner(A_indices_, A_indptr_, L_indptr, parent, nnz):</span>
<span id="cb71-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Make everything a jnp array. Really should use jaxtyping</span></span>
<span id="cb71-5">    A_indices_ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(A_indices_)</span>
<span id="cb71-6">    A_indptr_ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(A_indptr_)</span>
<span id="cb71-7">    L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(L_indptr)</span>
<span id="cb71-8">    parent <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(parent)</span>
<span id="cb71-9"></span>
<span id="cb71-10">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## innermost while loop</span></span>
<span id="cb71-11">    <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_while(val):</span>
<span id="cb71-12">      index, i, counter,  node,  mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb71-13">      mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mark.at[node].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(i)</span>
<span id="cb71-14">      index[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> index[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].at[counter].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(node) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#column</span></span>
<span id="cb71-15">      index[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> index[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>].at[counter].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(i) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># row</span></span>
<span id="cb71-16">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (index, i, counter <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, parent[node], mark)</span>
<span id="cb71-17"></span>
<span id="cb71-18">    <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> cond_while(val):</span>
<span id="cb71-19">      index, i, counter,  node,  mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb71-20">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> lax.bitwise_and(lax.lt(node, i), lax.ne(mark[node], i))</span>
<span id="cb71-21"></span>
<span id="cb71-22">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Inner for loop</span></span>
<span id="cb71-23">    <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_inner_for(indptr, val):</span>
<span id="cb71-24">      index, i, counter, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb71-25">      node <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_indices_[indptr]</span>
<span id="cb71-26">      </span>
<span id="cb71-27">      index, i, counter,  node,  mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.while_loop(cond_while, body_while, (index, i, counter,  node,  mark))</span>
<span id="cb71-28">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (index, i, counter,  mark)</span>
<span id="cb71-29">    </span>
<span id="cb71-30">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Outer for loop</span></span>
<span id="cb71-31">    <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_out_for(i, val):</span>
<span id="cb71-32">      index, counter,  mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> val</span>
<span id="cb71-33">      mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mark.at[i].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(i)</span>
<span id="cb71-34">      index[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> index[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].at[counter].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(i)</span>
<span id="cb71-35">      index[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> index[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>].at[counter].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(i)</span>
<span id="cb71-36">      counter <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> counter <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb71-37">      index, i, counter, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.fori_loop(A_indptr_[i], A_indptr_[i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], body_inner_for, (index, i, counter,  mark))</span>
<span id="cb71-38"></span>
<span id="cb71-39">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (index, counter,  mark)</span>
<span id="cb71-40"></span>
<span id="cb71-41">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Body of code</span></span>
<span id="cb71-42">    n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr_) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb71-43">    mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.repeat(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, n)</span>
<span id="cb71-44"></span>
<span id="cb71-45">    index <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [jnp.zeros(nnz, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>), jnp.zeros(nnz, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)]</span>
<span id="cb71-46">    counter <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb71-47"></span>
<span id="cb71-48">    init <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (index, counter, mark)</span>
<span id="cb71-49">    index, counter, mark <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.fori_loop(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n, body_out_for, init)</span>
<span id="cb71-50">    </span>
<span id="cb71-51">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> index</span>
<span id="cb71-52">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb71-53">  index <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _inner(A_indices, A_indptr, L_indptr, parent, nnz)</span>
<span id="cb71-54">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## return jnp.lexsort[index[1][jnp.lexsort((index[1], index[0]))</span></span>
<span id="cb71-55">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse.coo_array((np.ones(nnz), (index[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], index[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])), shape <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (n,n)).tocsc().indices</span></code></pre></div>
</div>
<p>First things first, let’s check how fast this is.</p>
<div class="cell" data-execution_count="39">
<div class="sourceCode cell-code" id="cb72" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb72-1">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>)</span>
<span id="cb72-2">parent, col_count, row_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb72-3">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb72-4">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb72-5">L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> symbolic_cholesky3(A.indices, A.indptr, L_indptr, parent, nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]))</span>
<span id="cb72-6">L_indices_true, L_indptr_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _symbolic_factor(A_indices, A_indptr)</span>
<span id="cb72-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_indices_true))</span>
<span id="cb72-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_indptr_true))</span>
<span id="cb72-9"></span>
<span id="cb72-10">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">31</span>)</span>
<span id="cb72-11">parent, col_count, row_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb72-12">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb72-13">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb72-14">L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> symbolic_cholesky3(A.indices, A.indptr, L_indptr, parent, nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]))</span>
<span id="cb72-15">L_indices_true, L_indptr_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _symbolic_factor(A_indices, A_indptr)</span>
<span id="cb72-16"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_indices_true))</span>
<span id="cb72-17"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_indptr_true))</span>
<span id="cb72-18"></span>
<span id="cb72-19">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>)</span>
<span id="cb72-20">parent, col_count, row_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb72-21">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb72-22">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb72-23">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: symbolic_cholesky3(A.indices, A.indptr, L_indptr, parent, nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb72-24"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb72-25"></span>
<span id="cb72-26">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)</span>
<span id="cb72-27">parent, col_count, row_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb72-28">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb72-29">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb72-30">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: symbolic_cholesky3(A.indices, A.indptr, L_indptr, parent, nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb72-31"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb72-32"></span>
<span id="cb72-33">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">300</span>)</span>
<span id="cb72-34">parent, col_count, row_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb72-35">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb72-36">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb72-37">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: symbolic_cholesky3(A.indices, A.indptr, L_indptr, parent, nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb72-38"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb72-39"></span>
<span id="cb72-40">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>)</span>
<span id="cb72-41">parent, col_count, row_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> etree(A.indices, A.indptr)</span>
<span id="cb72-42">L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(A.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb72-43">L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum(col_count)</span>
<span id="cb72-44">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span>: symbolic_cholesky3(A.indices, A.indptr, L_indptr, parent, nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(L_indptr[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])),number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, repeat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb72-45"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>True
True
True
True</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 2500: [0.13, 0.13, 0.13, 0.14, 0.14]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 40000: [0.19, 0.19, 0.19, 0.19, 0.19]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 90000: [0.43, 0.32, 0.32, 0.33, 0.33]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 1000000: [11.91]</code></pre>
</div>
</div>
<p>You know what? I’ll take it. It’s not perfect, in particular I would prefer a pure JAX solution. But everything I tried hit hard against the indirect memory access issue. The best I found was using <code>jnp.lexsort</code> but even it had noticeable performance degradation as <code>nnz</code> increased relative to the scipy solution.</p>
</section>
</section>
<section id="next-time-on-sparse-matrices-with-dan" class="level1">
<h1>Next time on Sparse Matrices with Dan</h1>
<p>So that’s where I’m going to leave it. I am off my flight and I’ve slept very well and now I’m going to be on holidays for a little while.</p>
<p>The next big thing to do is look at the numerical factorisation. We are going to run headlong into all of the problems we’ve hit today, so that should be fun. The reason why I’m separating it into a separate post<sup>28</sup> is that I want to actually test all of those things out properly.</p>
<p>So next time you can expect</p>
<ol type="1">
<li><p>Classes! Because frankly this code is getting far too messy, especially now that certain things need to be passed as static arguments. The only reason I’ve avoided it up to now is that I think it hides too much of the algorithm in boilerplate. But now the boilerplate is ruining my life and causing far too many dumb typos<sup>29</sup>.</p></li>
<li><p>Type hints! Because for a language where types aren’t explicit, they sure are important. Also because I’m going to class it up I might as well do it properly.</p></li>
<li><p>Some helper routines! I’m going to need a sparse-matrix scatter operation (aka the structured copy of <img src="https://latex.codecogs.com/png.latex?A"> to have the sparsity pattern of <img src="https://latex.codecogs.com/png.latex?L">)! And I’m certainly going to need some reorderings<sup>30</sup></p></li>
<li><p>A battle royale between padded and non-padded methods!</p></li>
</ol>
<p>It should be a fun time!</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>If you’re wondering about the break between sparse matrix posts, I realised this pretty much immediately and just didn’t want to deal with it!↩︎</p></li>
<li id="fn2"><p>If a person who actually knows how the JAX autodiff works happens across this blog, I’m so sorry.↩︎</p></li>
<li id="fn3"><p>omg you guys. So many details↩︎</p></li>
<li id="fn4"><p>These are referred to as HLOs (Higher-level operations)↩︎</p></li>
<li id="fn5"><p>Instead of doing one pass of reverse-mode, you would need to do <img src="https://latex.codecogs.com/png.latex?d"> passes of forwards mode to get the gradient with respect to a d-dimensional parameter.↩︎</p></li>
<li id="fn6"><p>Unlike <code>jax.lax.while</code>, which is only forwards differentiable, <code>jax.lax.scan</code> is fully differentiable.↩︎</p></li>
<li id="fn7"><p>In general, if the function has state.↩︎</p></li>
<li id="fn8"><p>This is the version of the symbolic factorisation that is most appropriate for us, as we will be doing a lot of Cholesky factorisations with the same sparsity structure. If we rearrange the algorithm to the up-looking Cholesky decomposition, we only need the column counts and this is also called the symbolic factorisation. This is, incidentally, how Eigen’s sparse Cholesky works.↩︎</p></li>
<li id="fn9"><p>Actually it’s a forest↩︎</p></li>
<li id="fn10"><p>Because we are talking about a tree, each child node has at most one parent. If it doesn’t have a parent it’s the root of the tree. I remember a lecturer saying that it should be called “father and son” or “mother and daughter” because every child has 2 parents but only one mother or one father. The 2000s were a wild time.↩︎</p></li>
<li id="fn11"><p>These can also be computed in approximately <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO(%5Ctext%7Bnnz%7D(A))%7D"> time, which is much faster. But the algorithm is, frankly, pretty tricky and I’m not in the mood to program it up. This difference would be quite important if I wasn’t storing the full symbolic factorisation and was instead computing it every time, but in my context it is less clear that this is worth the effort.↩︎</p></li>
<li id="fn12"><p>Python notation! This is rows/cols 0 to <code>j-1</code>↩︎</p></li>
<li id="fn13"><p>Python, it turns out, does not have a <code>do while</code> construct because, apparently, everything is empty and life is meaningless.↩︎</p></li>
<li id="fn14"><p>The argument for JIT works by amortizing the compile time over several function evaluations. If I wanted to speed this algorithm up, I’d implement the more complex <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(%5Coperatorname%7Bnnz%7D(A))"> version.↩︎</p></li>
<li id="fn15"><p>Obviously it did not work the first time. A good way to debug JIT’d code is to use the python translations of the control flow literals. Why? Well for one thing there is an annoying tendency for JAX to fail silently when their is an out-of-bounds indexing error. Which happens, just for example, if you replace <code>node = A_indices[indptr]</code> with <code>node = A_indices[A_indptr[indptr]]</code> because you got a text message half way through the line.↩︎</p></li>
<li id="fn16"><p>We will still use the left-looking algorithm for the numerical computation. The two algorithms are equivalent in exact arithmetic and, in particular, have identical sparsity structures.↩︎</p></li>
<li id="fn17"><p>I’m mixing 1-based indexing in the maths with 0-based in the code because I think we need more chaos in our lives.↩︎</p></li>
<li id="fn18"><p>Yes. I know. I’m swapping the meaning of <img src="https://latex.codecogs.com/png.latex?i"> and <img src="https://latex.codecogs.com/png.latex?j"> but you know that’s because in a symmetric matrix rows and columns are a bit similar. The upper half of column $$ is the left half of row <img src="https://latex.codecogs.com/png.latex?j"> after all.↩︎</p></li>
<li id="fn19"><p>If <code>mark[node]==j</code> then I have already found <code>node</code> and all of its ancestors in my sweep of row <code>j</code>↩︎</p></li>
<li id="fn20"><p>This is because <code>L[j,node] != 0</code> by our logic.↩︎</p></li>
<li id="fn21"><p>I mean, I’m pretty sure it is. I’m writing this post in order, so I don’t know yet. But surely the compiler can’t reason about the possible values of <code>node</code>, which would be the only thing that would speed this up.↩︎</p></li>
<li id="fn22"><p>Convert from CSR to <code>(i, j, val)</code> (called COO, which has a convenient implementation in <code>jax.experimental.sparse</code>) to CSC. This involves a linear pass, a sort, and another linear pass. So it’s $n n`ish. Hire me fancy tech companies. I can count. Just don’t ask me to program quicksort.↩︎</p></li>
<li id="fn23"><p>Replace “subtle” with “fairly obvious once I realised how it’s converted to a <code>lax.scan</code>, but not at all obvious to me originally”.↩︎</p></li>
<li id="fn24"><p>Who demanded a footnote.↩︎</p></li>
<li id="fn25"><p>This also happens with the profile likelihood in non-Bayesian methods.↩︎</p></li>
<li id="fn26"><p>the <img src="https://latex.codecogs.com/png.latex?%5Cbeta">s↩︎</p></li>
<li id="fn27"><p>COO stands for <em>coordinate list</em> and it’s the least space-efficient of our options. It directly stores 3 length <code>n</code> vectors <code>(row, col, value)</code>. It’s great for specifying matrices and it’s pretty easy to convert from this format to any of the others.↩︎</p></li>
<li id="fn28"><p>other than holiday↩︎</p></li>
<li id="fn29"><p><code>A_index</code> and <code>A.index</code> are different↩︎</p></li>
<li id="fn30"><p>I’m probably going to bind Eigen’s AMD decomposition. I’m certainly not writing it myself.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2022,
  author = {Simpson, Dan},
  title = {Sparse Matrices Part 7a: {Another} Shot at {JAX-ing} the
    {Cholesky} Decomposition},
  date = {2022-12-02},
  url = {https://dansblog.netlify.app/posts/2022-11-27-sparse7/sparse7.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2022" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2022. <span>“Sparse Matrices Part 7a: Another Shot at
JAX-Ing the Cholesky Decomposition.”</span> December 2, 2022. <a href="https://dansblog.netlify.app/posts/2022-11-27-sparse7/sparse7.html">https://dansblog.netlify.app/posts/2022-11-27-sparse7/sparse7.html</a>.
</div></div></section></div> ]]></description>
  <category>JAX</category>
  <category>Sparse matrices</category>
  <category>Autodiff</category>
  <guid>https://dansblog.netlify.app/posts/2022-11-27-sparse7/sparse7.html</guid>
  <pubDate>Thu, 01 Dec 2022 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2022-11-27-sparse7/south_pac.png" medium="image" type="image/png" height="81" width="144"/>
</item>
<item>
  <title>MCMC with the wrong acceptance probability</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2022-11-23-wrong-mcmc/wrong-mcmc.html</link>
  <description><![CDATA[ 





<p>Just the other day<sup>1</sup> I was chatting with a friend<sup>2</sup> about MCMC and he asked me a fundamental, but seldom asked, question: <em>What happens my acceptance probability is a bit off?</em>.</p>
<p>This question comes up a bunch. In this context, they were switching from double to single precision<sup>3</sup> and were a little worried that some of their operations would be a bit more inexact than they were used to. Would this tank MCMC? Would everything still be fine?</p>
<section id="what-is-markov-chain-monte-carlo" class="level2">
<h2 class="anchored" data-anchor-id="what-is-markov-chain-monte-carlo">What is Markov chain Monte Carlo</h2>
<p>Markov chain Monte Carlo (MCMC) is, usually, guess-and-check for people who want to be fancy.</p>
<p>It is a class of algorithms that allow you to construct a<sup>4</sup> Markov chain that has a given <em>stationary distribution</em><sup>5</sup> <img src="https://latex.codecogs.com/png.latex?%5Cpi">. In Bayesian applications, we usually want to choose <img src="https://latex.codecogs.com/png.latex?%5Cpi%20=%20p(%5Ctheta%20%5Cmid%20y)">, but there are other applications of MCMC.</p>
<p>Most<sup>6</sup> MCMC algorithms live in the Metropolis-Hastings family of algorithms. These methods require only one component: a proposal distribution <img src="https://latex.codecogs.com/png.latex?q(%5Ctheta'%20%5Cmid%20%5Ctheta)">. Given basically any<sup>7</sup> proposal distribution, we can go from our current state <img src="https://latex.codecogs.com/png.latex?%5Ctheta_k"> to the new state <img src="https://latex.codecogs.com/png.latex?%5Ctheta_%7Bk+1%7D"> using the following three steps:</p>
<ol type="1">
<li><p>Propose a potential new state <img src="https://latex.codecogs.com/png.latex?%5Ctheta'%20%5Csim%20q(%5Ctheta'%20%5Cmid%20%5Ctheta_k)"></p></li>
<li><p>Sample a Bernoulli random variable <img src="https://latex.codecogs.com/png.latex?r_%7Bk+1%7D"> with <img src="https://latex.codecogs.com/png.latex?%0A%5CPr(r_%7Bk+1%7D%20=%201%20%5Cmid%20%5Ctheta_k)%20=%20%5Calpha_%7Bk+1%7D%20=%20%20%5Cmin%5Cleft%5C%7B1,%20%5Cfrac%7B%5Cpi(%5Ctheta')%7D%7B%5Cpi(%5Ctheta_k)%7D%5Cfrac%7Bq(%5Ctheta_k%20%5Cmid%20%5Ctheta')%7D%7Bq(%5Ctheta'%20%5Cmid%20%5Ctheta_k)%7D%5Cright%5C%7D%0A"></p></li>
<li><p>Set <img src="https://latex.codecogs.com/png.latex?%5Ctheta_%7Bk+1%7D"> according to the formula <img src="https://latex.codecogs.com/png.latex?%0A%5Ctheta_%7Bk+1%7D%20=%20%5Cbegin%7Bcases%7D%20%5Ctheta',%20&amp;%20r_%7Bk+1%7D=1%20%5C%5C%20%5Ctheta_k,%20&amp;r_%7Bk+1%7D%20=%200.%5Cend%7Bcases%7D%0A"></p></li>
</ol>
<p>The acceptance probability<sup>8</sup> is chosen<sup>9</sup> to balance<sup>10</sup> out the proposal <img src="https://latex.codecogs.com/png.latex?q(%5Ccdot%20%5Cmid%20%5Ccdot)"> with the target distribution <img src="https://latex.codecogs.com/png.latex?%5Cpi">.</p>
<p>You can interpret the two ratios in the acceptance probability separately. The first one prefers proposals from high-density regions over proposals from low-density regions. The second ratio balances this by down-weighting proposed states that were <em>easy</em> to propose from the current location. When the proposal is symmetric, ie <img src="https://latex.codecogs.com/png.latex?q(%5Ctheta'%5Cmid%20%5Ctheta)=%20q(%5Ctheta%20%5Cmid%20%5Ctheta')">, the second ratio is always 1. However, in better algorithms like MALA<sup>11</sup>, the proposal is not symmetric. If we look at the MALA proposal <img src="https://latex.codecogs.com/png.latex?%0Aq(%5Ctheta'%5Cmid%20%5Ctheta)%20%5Csim%20N%5Cleft(%5Ctheta%20+%20%5Cfrac%7B1%7D%7B2%7D%5CSigma%5Cnabla%20%5Clog%20%5Cpi(%5Ctheta),%20%5CSigma%5Cright)%0A"> it’s pretty easy to see that we are biasing our samples towards the mode of the distribution. If we did not have the second ratio in the acceptance probability we would severely under-sample the tails of the distribution.</p>
</section>
<section id="mcmc-with-approximate-acceptance-probabilities" class="level2">
<h2 class="anchored" data-anchor-id="mcmc-with-approximate-acceptance-probabilities">MCMC with approximate acceptance probabilities</h2>
<p>With this definition in hand, it’s now possible to re-cast the question my friend asked as &gt; What happens to my MCMC algorithm if, instead of <img src="https://latex.codecogs.com/png.latex?%5Calpha_%7Bk+1%7D"> I accidentally compute <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20%5Calpha_%7Bk+1%7D"> and use that instead to simulate <img src="https://latex.codecogs.com/png.latex?r_%7Bk+1%7D">?</p>
<p>So let’s go about answering that!</p>
</section>
<section id="a-bit-of-a-literature-review" class="level2">
<h2 class="anchored" data-anchor-id="a-bit-of-a-literature-review">A bit of a literature review</h2>
<p>Unsurprisingly, this type of question has popped up over and over again in the literature:</p>
<ul>
<li><p>This exact question was asked by Gareth Roberts and Jeff Rosenthal first<sup>12</sup> <a href="http://probability.ca/jeff/ftpdir/sens.pdf">with Peter Schwartz</a> and a second, more<sup>13</sup> <sup>14</sup> realistic, time <a href="http://probability.ca/jeff/ftpdir/gjl.pdf">with Laird Breyer</a>. They found that as long as the chain’s convergence is sufficiently nice<sup>15</sup> then the perturbed chain will converge nicely and have<sup>16</sup> a central limit theorem.</p></li>
<li><p>About 10 years ago, an absolute orgy<sup>17</sup> <sup>18</sup> of research happened around the question <em>What happens if the acceptance probability is random but unbiased?</em>. These <em>exact approximate</em><sup>19</sup> or <em>pseudo-marginal</em> methods. These have some success in situations<sup>20</sup> where the likelihood has a <em>parameter dependent</em> normalising constant that can’t be computed exactly, but can be estimated unbiasedly. The problem with this class of methods is that the extra noise tends to make the Markov chain perform pretty badly<sup>21</sup>. This limits its practical use to models where we really can’t do anything else<sup>22</sup>. That said, there is some interesting literature on random sub-sampling of data where it <a href="https://www.jmlr.org/papers/volume18/15-205/15-205.pdf">doesn’t really work</a> and where <a href="https://ses.library.usyd.edu.au/bitstream/handle/2123/16205/BAWP-2017-01.pdf">it does work</a>.</p></li>
<li><p>A third branch of literature is on truly approximate algorithms. These try to understand what happens if you’re just wrong with <img src="https://latex.codecogs.com/png.latex?%5Calpha_%7Bk+1%7D"> and you don’t do anything to correct it. There are a lot of papers on this, and I’m not going to do anything approaching a thorough review. I have work<sup>23</sup> <sup>24</sup> to do. So I will just list two older papers that were influential for me. The first was by <a href="https://arxiv.org/abs/1205.6857">Geoff Nichols, Colin Fox, and Alexis Muir Watt</a>, which looks at what happens when you don’t correct your pseudo-marginal method correctly. It’s a really neat theory paper that is a great presentation<sup>25</sup> of the concepts. The second paper is by <a href="https://arxiv.org/abs/1205.6857">Pierre Alquier, Nial Friel, Richard Everitt, and Aidan Boland</a>, which looks at general approximate Markov chains. They show empirically that these methods work extremely well relative to pseudo-marginal methods for practical settings. There are also some nice results on perturbations of Markov chains in general, for instance <a href="https://arxiv.org/pdf/1503.04123.pdf">this paper</a> by Daniel Rudolf and Nikolaus Schweizer.</p></li>
</ul>
<section id="trying-to-understand-noisy-markov-chains" class="level3">
<h3 class="anchored" data-anchor-id="trying-to-understand-noisy-markov-chains">Trying to understand noisy Markov chains</h3>
<p>So how do I think of noisy Markov chains. Despite all appearances<sup>26</sup> I am not really a theory person. So while I know that there’s a massive literature on the stability of Markov chains, it doesn’t really influence how I think about it.</p>
<p>Instead, I think about it in terms of that <a href="https://arxiv.org/abs/1205.6857">Nicholls, Fox, and Muir Watt paper</a> paper. Or, specifically, a talk I saw Colin give at some point that was really clear.</p>
<p>The important thing to recognise is that <em>it is not important how well you compute</em> <img src="https://latex.codecogs.com/png.latex?%5Calpha_%7Bk+1%7D">. What is important is if you get the same outcome. Imagine we have two random variables <img src="https://latex.codecogs.com/png.latex?r_%7Bk+1%7D%20%5Csim%20%5Ctext%7BBernoulli%7D(%5Calpha_%7Bk+1%7D)"> and <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20r_%7Bk+1%7D%20%5Csim%20%5Ctext%7BBernoulli%7D(%5Ctilde%20%5Calpha_%7Bk+1%7D)">. If our realisation of <img src="https://latex.codecogs.com/png.latex?r_%7Bk+1%7D"> is the same as our realisation of <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20r_%7Bk+1%7D">, then we get the same <img src="https://latex.codecogs.com/png.latex?x_%7Bk+1%7D">. Or, to put it another way, when <img src="https://latex.codecogs.com/png.latex?r_%7Bk+1%7D%20=%20%5Ctilde%20r_%7Bk+1%7D">, no one can tell<sup>27</sup> that it’s an approximate Markov chain.</p>
<p>This means that one way to understand inexact MCMC is to think of the Markov chain <img src="https://latex.codecogs.com/png.latex?%0A(%5Ctilde%7B%5Ctheta%7D_k,%20s_k),%20%5Cqquad%20k=0,%201,%20%5Cldots,%20%5Cinfty,%0A"> where<sup>28</sup> <img src="https://latex.codecogs.com/png.latex?%0As_k%20=%20%5Cbegin%7Bcases%7D%200,%20%5Cquad%20&amp;%20r_%7Bk%7D%20=%20%5Ctilde%20r_k%20%5C%5C%0A1,%20&amp;r_k%20%5Cneq%20%5Ctilde%20r_k%5Cend%7Bcases%7D%0A"> indicates whether or not we made the wrong decision. It’s important to note that while <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20%5Ctheta_k"> is marginally a Markov chain, <img src="https://latex.codecogs.com/png.latex?s_k"> is not. You can actually think of <img src="https://latex.codecogs.com/png.latex?s_k"> as the observation of a hidden Markov model if you want to. I won’t stop you. Nothing will. There is no morality, there is no law. It is The Purge.</p>
<p>Although we can never actually observe <img src="https://latex.codecogs.com/png.latex?s_k">, thinking about it is really useful. In particular, we note that until <img src="https://latex.codecogs.com/png.latex?s_k%20=1"> for the first time, the samples of <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20%5Ctheta_k"> are <em>identical</em> to a correct Metropolis-Hastings algorithm. After this point, the approximate chain and the (imaginary) exact chain will be different. But we can iterate this argument.</p>
<p>To do this, we can define the length <img src="https://latex.codecogs.com/png.latex?N_j"> of the Markov chain that would be the same as the exact MCMC algorithm started at <img src="https://latex.codecogs.com/png.latex?%5Ctheta_%7BN_%7Bk-1%7D%7D"> by <img src="https://latex.codecogs.com/png.latex?N_0=0"> and <img src="https://latex.codecogs.com/png.latex?%0AN_k%20=%20%5Cinf_%7Bi%20%3E%20N_k%7D%5C%7Bi%20-%20N_%7Bk-1%7D:%20s_i%20=%201%5C%7D.%0A"></p>
<p>If we run our algorithm for <img src="https://latex.codecogs.com/png.latex?N"> steps, we can then think of the output as being the same as running <img src="https://latex.codecogs.com/png.latex?J%20=%20%5Csum_%7Bk=1%7D%5EN%20s_k"> Markov chains of different lengths. The <img src="https://latex.codecogs.com/png.latex?j">th chain starts at <img src="https://latex.codecogs.com/png.latex?%5Ctheta_%7BN_%7Bj-1%7D%7D"> and is length <img src="https://latex.codecogs.com/png.latex?N_%7Bj%7D-1">. It is worth remembering that these chains are not started from independent points. In particular, if <img src="https://latex.codecogs.com/png.latex?N_j"> is small, then the starting position of the <img src="https://latex.codecogs.com/png.latex?j">th and the <img src="https://latex.codecogs.com/png.latex?j+1">th chain will be heavily correlated.</p>
<p>To think about this we need to think about what happens after <img src="https://latex.codecogs.com/png.latex?N_k"> steps of a Markov chain. We are going to need the notation <img src="https://latex.codecogs.com/png.latex?%5Ctheta_k%20=%20P%5Ek%20%5Ctheta_0"> denotes <img src="https://latex.codecogs.com/png.latex?k"> steps of the exact algorithm.</p>
<p>The topic of convergence of Markov chains is a complex business, but we are going to assume that our exact Markov chain is<sup>29</sup> <em>geometrically ergodic</em>, which means that <img src="https://latex.codecogs.com/png.latex?%0A%5C%7CP%5Ek%20%5Ctheta_0%20-%20%5Cpi%5C%7C%20%5Cleq%20M(%5Ctheta_0)%20%5Crho%5E%7Bk%7D%0A"> for some function<sup>30</sup> <img src="https://latex.codecogs.com/png.latex?M(x_0)"> and <img src="https://latex.codecogs.com/png.latex?0%20%3C%20%5Crho%20%3C%201">.</p>
<p>Geometric ergodicity is a great condition because, among other things, it ensures that sample means from the Markov chain satisfy a central limit theorem. It’s also bloody impossible to prove. But usually indicators like <a href="https://arxiv.org/abs/1903.08008">R-hat</a> do a decent job at suggesting that there might be problems. Also if you are spending a lot of time rejecting proposals in certain parts of the space, there’s a solid chance that you’re not geometrically ergodic.</p>
<p>Now let’s assume that we are interested in computing <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D_%5Cpi(h(%5Ctheta))"> for some nice<sup>31</sup> function <img src="https://latex.codecogs.com/png.latex?h">. Then the nice thing about Markov chains is that, give or take<sup>32</sup> <img src="https://latex.codecogs.com/png.latex?%0A%5Cleft%7C%5Cfrac%7B1%7D%7BN_j-1%7D%5Csum_%7Bk=N_%7Bj-1%7D%7D%5E%7BN_j-1%7Dh(%5Ctheta_k)%20-%20%5Cmathbb%7BE%7D_%5Cpi(h(%5Ctheta))%5Cright%7C%20%5Cleq%20C%20%5Cfrac%7BM(%5Ctheta_%7BN_%7Bj-1%7D%7D)%7D%7BN_j-1%7D%5Cfrac%7B1%20-%20%5Crho%5E%7BN_%7Bj%7D-1%7D%7D%7B1-%20%5Crho%7D.%0A"> where <img src="https://latex.codecogs.com/png.latex?C"> might depend on <img src="https://latex.codecogs.com/png.latex?h"> if <img src="https://latex.codecogs.com/png.latex?h"> is unbounded.</p>
<p>This suggests that the error is bounded by, roughly, <img src="https://latex.codecogs.com/png.latex?%0A%5Cleft%7C%5Cfrac%7B1%7D%7BN%7D%5Csum_%7Bk=1%7D%5E%7BN%7Dh(%5Ctheta_k)%20-%20%5Cmathbb%7BE%7D_%5Cpi(h(%5Ctheta))%5Cright%7C%20%5Cleq%20%5Cfrac%7BC%7D%7BN%7D%20%5Csum_%7Bj%20=%201%7D%5EJ%20M(%5Ctheta_%7BN_%7Bj-1%7D%7D)%5Cfrac%7B1%20-%20%5Crho%5E%7BN_%7Bj%7D-1%7D%7D%7B1-%20%5Crho%7D.%0A"></p>
<p>This suggests a few things:</p>
<ul>
<li><p>If <img src="https://latex.codecogs.com/png.latex?J"> is small relative to <img src="https://latex.codecogs.com/png.latex?N">, we are going to get <em>very</em> similar estimates to just running <img src="https://latex.codecogs.com/png.latex?J"> parallel Markov chains and combining them <em>without removing any warm up iterations</em>. In particular, if almost all <img src="https://latex.codecogs.com/png.latex?N_j"> are big, it will be <em>a lot</em> like combining <img src="https://latex.codecogs.com/png.latex?J"> warmed up <em>independent</em> chains.</p></li>
<li><p>Effective sample size and Monte Carlo standard error estimates will potentially be very wrong. This is because instead of computing them based on multiple dependent chains, we are pretending that all of our samples came from a single ergodic Markov chain. Is this a problem? I really don’t know. Again, if the <img src="https://latex.codecogs.com/png.latex?N_j">s are usually large, we will be fine.</p></li>
<li><p>Because <img src="https://latex.codecogs.com/png.latex?M(%5Ctheta)"> can be pretty large when <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> is large, we might have some problems. It’s easy to imagine cases where we get stuck out in a tail and we just fire off a lot of events when <img src="https://latex.codecogs.com/png.latex?%5Ctheta_%7BN_j%7D"> is really big. This will be a problem. But also, if we are stuck out in a tail, we are rightly fucked anyway and all of the MCMC diagnostics should be screaming at you. We can take heart that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D_%5Cpi(M(%5Ctheta))"> is usually finite<sup>33</sup> and not, you know, massive.</p></li>
</ul>
</section>
<section id="what-do-the-n_j-look-like" class="level3">
<h3 class="anchored" data-anchor-id="what-do-the-n_j-look-like">What do the <img src="https://latex.codecogs.com/png.latex?N_j"> look like?</h3>
<p>So the take away from the last section was that if the random variables <img src="https://latex.codecogs.com/png.latex?N_j"> are usually pretty big, then everything will work ok. Intuitively this makes sense. If the <img src="https://latex.codecogs.com/png.latex?N_j">s were always small, it would be very difficult to ever get close to any sort of stationary distribution.</p>
<p>The paper by <a href="https://arxiv.org/abs/1205.6857">Nicholls, Fox, and Muir Watt paper</a> talks about potential sizes for <img src="https://latex.codecogs.com/png.latex?N_j">. The general construction that they use is a <em>coupling</em>, which is a bivariate Markov chain <img src="https://latex.codecogs.com/png.latex?(%5Ctheta_k,%20%5Ctilde%20%5Ctheta_k)"> that start from the same position and are updated as follows:</p>
<ol type="1">
<li>Propose <img src="https://latex.codecogs.com/png.latex?%5Ctheta'%20%5Csim%20q(%5Ctheta'%20%5Cmid%20%5Ctilde%20%5Ctheta_%7Bk%7D)"></li>
<li>Generate a uniform random number <img src="https://latex.codecogs.com/png.latex?u_%7Bk+1%7D"></li>
<li>Update <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> as <img src="https://latex.codecogs.com/png.latex?%0A%5Ctheta_%7Bk+1%7D%20=%20%5Cbegin%7Bcases%7D%20%5Ctheta',%20%5Cqquad%20&amp;%20u_%7Bk+1%7D%20%5Cleq%20%5Calpha_%7Bk+1%7D%20%5C%5C%0A%5Ctheta_%7Bk%7D,%20&amp;%20u_%7Bk+1%7D%20%3E%20%5Calpha_%7Bk+1%7D.%5Cend%7Bcases%7D%0A"></li>
<li>Update <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20%5Ctheta"> as <img src="https://latex.codecogs.com/png.latex?%0A%5Ctilde%20%5Ctheta_%7Bk+1%7D%20=%20%5Cbegin%7Bcases%7D%20%5Ctheta',%20%5Cqquad%20&amp;%20u_%7Bk+1%7D%20%5Cleq%20%5Ctilde%20%5Calpha_%7Bk+1%7D%20%5C%5C%0A%5Ctilde%20%5Ctheta_%7Bk%7D,%20&amp;%20u_%7Bk+1%7D%20%3E%20%5Ctilde%20%5Calpha_%7Bk+1%7D.%5Cend%7Bcases%7D%0A"></li>
</ol>
<p>This Markov chain is coupled in three ways ways. The chain starts at the same values <img src="https://latex.codecogs.com/png.latex?%5Ctheta_0%20=%20%5Ctilde%20%5Ctheta_0">, the proposed <img src="https://latex.codecogs.com/png.latex?%5Ctheta'"> is the same for both chains, and the randomness<sup>34</sup> used to do the accept/reject step is the same. Together, this things mean that <img src="https://latex.codecogs.com/png.latex?%5Ctheta_k%20=%20%5Ctilde%20%5Ctheta_k"> for all <img src="https://latex.codecogs.com/png.latex?k%20%3C%20N_1">.</p>
<p>For this coupling construction, we can get the exact distribution of the <img src="https://latex.codecogs.com/png.latex?s_k">. To do this, we remember that we will only make different decisions in the two chains (or uncouple) if <img src="https://latex.codecogs.com/png.latex?u"> is on different sides of the two acceptance probabilities. The probability of happening is <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5CPr(s_k%20=%201)%20&amp;=%20%5CPr(%20u%20%5Cin%20%5B%5Cmin%5C%7B%20%5Calpha_%7Bk%7D,%20%5Ctilde%20%5Calpha_k%5C%7D,%20%5Cmax%5C%7B%20%5Calpha_%7Bk%7D,%20%5Ctilde%20%5Calpha_k%5C%7D%5D)%20%5C%5C%0A&amp;=%20%7C%5Calpha_k%20-%20%5Ctilde%20%5Calpha_k%7C.%0A%5Cend%7Balign*%7D"></p>
<p>I guess you could write down the distribution of the <img src="https://latex.codecogs.com/png.latex?N_j"> in terms of this. In particular, you get <img src="https://latex.codecogs.com/png.latex?%0A%5CPr(N_1%20=%20n)%20=%20%7C%5Calpha_n%20-%20%5Ctilde%20%5Calpha_n%7C%5Cprod_%7Bk=1%7D%5E%7Bn-1%7D%20(1-%20%7C%5Calpha_k%20-%20%5Ctilde%20%5Calpha_k%7C)%0A">, but honestly it would be an absolute nightmare.</p>
<p>When people get stuck in probability questions, the natural thing to do is to make the problem so abstract that you can make the answer up. In that spirit, let’s ask a slightly different: what is the distribution of the <em>maximal</em> decoupling time between the exact and the approximate chain. This is the distribution of the longest possible coupling of the two chains over all<sup>35</sup> possible random sequences <img src="https://latex.codecogs.com/png.latex?(%5Ctheta_k,%20%5Ctilde%20%5Ctheta_k)"> such that the distribution of <img src="https://latex.codecogs.com/png.latex?(%5Ctheta_1,%20%5Ctheta_2,%20%5Cldots)"> is the same as our exact Markov chain and the distribution of <img src="https://latex.codecogs.com/png.latex?(%5Ctilde%5Ctheta_1,%5Ctilde%20%5Ctheta_2,%20%5Cldots)"> is the same as our approximate Markov chain.</p>
<p>This maximal value of <img src="https://latex.codecogs.com/png.latex?N_1"> is called the <a href="https://arxiv.org/abs/1608.01511"><em>maximal agreement coupling time</em></a> or, more whimsically, the <a href="https://arxiv.org/pdf/1702.03917.pdf">MEXIT time</a>. It turns out that getting the distribution of <img src="https://latex.codecogs.com/png.latex?N_1"> is … difficult, but we<sup>36</sup> can construct a random variable <img src="https://latex.codecogs.com/png.latex?%5Ctau"> that is independent of <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20%5Ctheta_k"> such that <img src="https://latex.codecogs.com/png.latex?%5Ctau%20%5Cleq%20N_1"> almost surely and <img src="https://latex.codecogs.com/png.latex?%0A%5CPr(%5Ctau%20=%20t%5Cmid%20%5Ctau%20%5Cgeq%20t)%20=%201%20-%20%5Coperatorname*%7Bess%5C,inf%7D_%7BB,%20%5Ctheta_%7B%3Ct%7D%7D%20%5Cleft%5C%7B%5Cfrac%7BP(%5Ctheta_t%20%5Cin%20B%20%5Cmid%20%5Ctheta_%7B%3Ct%7D)%7D%7B%5Ctilde%20P(%5Ctheta_t%20%5Cin%20B%20%5Cmid%20%5Ctheta_%7B%3Ct%7D)%7D%5Cright%5C%7D,%0A"> where <img src="https://latex.codecogs.com/png.latex?P(%5Ctheta_t%20%5Cmid%20%5Ctheta_%7B%3Ct%7D)"> is the transition distribution for the exact Markov<sup>37</sup> chain and <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20P(%5Ctheta_t%20%5Cmid%20%5Ctheta_%7B%3Ct%7D)"> is the transition distribution for the approximate Markov chain.</p>
<p>For a Metropolis-Hastings algorithm, the transition distribution has the form <img src="https://latex.codecogs.com/png.latex?%0AP(B,%20%5Ctheta)=%20%5Cbegin%7Bcases%7D%20%5Calpha(%5Ctheta)Q(B%20%5Cmid%20%5Ctheta),%5Cqquad%20&amp;%20%5Ctheta%20%5Cnot%20%5Cin%20B%20%5C%5C%0A%5Calpha(%5Ctheta)Q(B%5Cmid%20%5Ctheta)%20+%20(1-%5Calpha(%5Ctheta)),%20&amp;%5Ctheta%20%5Cin%20B%0A%5Cend%7Bcases%7D%0A"> where <img src="https://latex.codecogs.com/png.latex?Q(B%5Cmid%20%5Ctheta)"> is the probability associated with the proposal density <img src="https://latex.codecogs.com/png.latex?q(%5Ccdot%20%5Cmid%20%5Ctheta)"> and I have been very explicit about the dependence of the acceptance probability on <img src="https://latex.codecogs.com/png.latex?%5Ctheta">. (The <img src="https://latex.codecogs.com/png.latex?(1-%5Calpha(%5Ctheta))"> term takes into account the probability of starting at <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> and not accepting the proposed state.)</p>
<p>That definition of <img src="https://latex.codecogs.com/png.latex?%5Ctau"> looks pretty nasty, but it’s not too bad: in particular, the infinitum only cares of <img src="https://latex.codecogs.com/png.latex?%5Ctheta_%7Bt-1%7D%5Cin%20B">. This means that the condition simplifies to <img src="https://latex.codecogs.com/png.latex?%0A%5CPr(%5Ctau%20=%20t%5Cmid%20%5Ctau%20%5Cgeq%20t)%20=%201%20-%20%5Cmin%5Cleft%5C%7B%5Coperatorname*%7Bess%5C,inf%7D_%7BB,%20%5Ctheta_%7Bt-1%7D%7D%20%5Cfrac%7B%5Calpha_t(%5Ctheta_%7Bt-1%7D)%20Q(B%20%5Cmid%20%5Ctheta_%7Bt-1%7D)%7D%7B%5Ctilde%5Calpha_t(%5Ctheta_%7Bt-1%7D)%20Q(B%20%5Cmid%20%5Ctheta_%7Bt-1%7D)%7D,%20%5Coperatorname*%7Bess%5C,inf%7D_%7BB,%20%5Ctheta_%7Bt-1%7D%7D%20%5Cfrac%7B%5Calpha_t(%5Ctheta_%7Bt-1%7D)%20Q(B%20%5Cmid%20%5Ctheta_%7Bt-1%7D)%20+%20(1-%5Calpha_t(%5Ctheta_%7Bt-1%7D))%7D%7B%5Ctilde%5Calpha_t(%5Ctheta_%7Bt-1%7D)%20Q(B%20%5Cmid%20%5Ctheta_%7Bt-1%7D)%20+%20(1-%20%5Ctilde%20%5Calpha_t(%5Ctheta_%7Bt-1%7D))%7D%5Cright%5C%7D.%0A"></p>
<p>This simplifies further if we assume that the proposal distribution <img src="https://latex.codecogs.com/png.latex?Q(%5Ccdot%20%5Cmid%20%5Ctheta_k)"> is absolutely continuous and has a strictly positive density. Then, it truly does not matter what <img src="https://latex.codecogs.com/png.latex?B"> is. For the first term, it just cancels, while the second term is monotone<sup>38</sup> in <img src="https://latex.codecogs.com/png.latex?Q(B%20%5Cmid%20%5Ctheta_%7Bt-1%7D)">, so we can take this term to be either zero or one and get<sup>39</sup> <img src="https://latex.codecogs.com/png.latex?%0A%5CPr(%5Ctau%20=%20t%5Cmid%20%5Ctau%20%5Cgeq%20t)%20=%201%20-%20%5Cmin%5Cleft%5C%7B%5Coperatorname*%7Bess%5C,inf%7D_%7B%20%5Ctheta_%7Bt-1%7D%7D%20%5Cfrac%7B%5Calpha_t(%5Ctheta_%7Bt-1%7D)%20%7D%7B%5Ctilde%5Calpha_t(%5Ctheta_%7Bt-1%7D)%7D,%20%5Coperatorname*%7Bess%5C,inf%7D_%7B%5Ctheta_%7Bt-1%7D%7D%20%5Cfrac%7B1-%5Calpha_t(%5Ctheta_%7Bt-1%7D)%7D%7B%201-%20%5Ctilde%20%5Calpha_t(%5Ctheta_%7Bt-1%7D)%7D,1%5Cright%5C%7D.%0A"></p>
<p>This is, as the Greeks would say, not too bad.</p>
<p>If, for instance, we know the relative error <img src="https://latex.codecogs.com/png.latex?%0A%5Ctilde%5Calpha(%5Ctheta)%20=%20(1%20+%20%5Cdelta(%5Ctheta))%5Calpha(%5Ctheta),%0A"> then <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Calpha(%5Ctheta)%7D%7B%5Ctilde%20%5Calpha(%5Ctheta)%7D%20=%20%5Cfrac%7B1%7D%7B1%20+%20%5Cdelta(%5Ctheta)%7D,%0A"> and if we know<sup>40</sup> <img src="https://latex.codecogs.com/png.latex?%5Cdelta(%5Ctheta)%20%5Cleq%20%5Cbar%20%5Cdelta">, we get <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Calpha(%5Ctheta)%7D%7B%5Ctilde%20%5Calpha(%5Ctheta)%7D%20%5Cgeq%20%5Cfrac%7B1%7D%7B1%20+%20%5Cbar%5Cdelta%7D.%0A"> Similarly, if <img src="https://latex.codecogs.com/png.latex?%0A1-%5Ctilde%20%5Calpha(%5Ctheta)%20=%20(1-%5Calpha(%5Ctheta))(1+%5Cepsilon(%5Ctheta)),%0A"> and <img src="https://latex.codecogs.com/png.latex?%5Cepsilon(%5Ctheta)%20%5Cleq%20%5Cbar%20%5Cepsilon">, then we get <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B1-%5Calpha(%5Ctheta)%7D%7B1-%5Ctilde%20%5Calpha(%5Ctheta)%7D%20=%20%5Cfrac%7B1%7D%7B1+%5Cepsilon(x)%7D%20%5Cgeq%20%5Cfrac%7B1%7D%7B1+%5Cbar%5Cepsilon%7D.%0A"></p>
<p>The nice thing is that we can choose our upper bounds so that <img src="https://latex.codecogs.com/png.latex?%5Crho%20=%20(1+%20%5Cbar%20%5Cdelta)%5E%7B-1%7D%20=%20(1+%20%5Cbar%5Cepsilon)%5E%7B-1%7D"> and get the upper bound <img src="https://latex.codecogs.com/png.latex?%0A%5CPr(%5Ctau%20=%20t%5Cmid%20%5Ctau%20%5Cgeq%20t)%20%5Cleq%201%20-%20%5Crho.%0A"> It follows that <img src="https://latex.codecogs.com/png.latex?%0A%5CPr(%5Ctau%20=%20t)%20%5Cleq%20%5Crho%5E%7Bt-1%7D(1-%5Crho).%0A"></p>
<p>Now this is a bit nasty. It’s an upper bound on the probability of a lower bound on the maximal decoupling time. Probability, eh.</p>
<p>Probably the most useful thing we can get from this is an upper bound on <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(%5Ctau)">, which is<sup>41</sup> <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(%5Ctau)%20%5Cleq%20%5Cfrac%7B1%7D%7B1-%5Crho%7D%20=%201%20+%20%5Cbar%20%5Cdelta%5E%7B-1%7D.%0A"></p>
<p>This confirms our intuition that if the relative error is large, we will have, on average, quite small <img src="https://latex.codecogs.com/png.latex?N_j">. It’s not quite enough to show the opposite (good floating point error begets big <img src="https://latex.codecogs.com/png.latex?N_j">), but that’s probably true as well.</p>
<p>And that is where we end this saga. There is definitely more that could be said, but I decided to spend exactly one day writing this post and that time is now over.</p>


</section>
</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Usually this is a lie, but it was actually a thing that happened last week↩︎</p></li>
<li id="fn2"><p>Don’t judge me (or my friends) based on this. I promise we also talk about other shit.↩︎</p></li>
<li id="fn3"><p>Hi GPUs!↩︎</p></li>
<li id="fn4"><p>usually reversible, although a lot of cool but not ready for prime time work is being done on non-reversible chains.↩︎</p></li>
<li id="fn5"><p>A stationary distribution, if it exists, is the distribution that is preserved by the Markov chain. If <img src="https://latex.codecogs.com/png.latex?%5Cpi"> is the stationary distribution and <img src="https://latex.codecogs.com/png.latex?x_1%20%5Csim%20%5Cpi">, then if we construct <img src="https://latex.codecogs.com/png.latex?x_2,%20x_3,%5Cldots"> by running the Markov chain then for every <img src="https://latex.codecogs.com/png.latex?k">, the marginal distribution is <img src="https://latex.codecogs.com/png.latex?x_k%20%5Csim%20%5Cpi">.↩︎</p></li>
<li id="fn6"><p>But critically not all! The dynamic HMC algorithm used in Stan, for instance, is not a Metropolis-Hastings algorithm. Instead of doing an accept/reject step it samples from the proposed trajectory. Betancourt’s <a href="https://arxiv.org/abs/1701.02434">long intro to Hamiltonian Monte Carlo</a> covers this very well.↩︎</p></li>
<li id="fn7"><p>The conditions for this to work are <em>very</em> light. But that’s because the definition of “working” only thinks about what happens after infinitely many steps. To get a practically useful Metropolis-Hastings algorithm, you’ve got to work very hard on choosing your proposal density.↩︎</p></li>
<li id="fn8"><p>sometimes called the Hastings correction↩︎</p></li>
<li id="fn9"><p>This is not the only choice that will work, but in some sense it is the most efficient one.↩︎</p></li>
<li id="fn10"><p>Technically, it is chosen by requiring that the Markov proposal <img src="https://latex.codecogs.com/png.latex?P(%5Ctheta,%5Ctheta')"> satisfies the detailed balance condition <img src="https://latex.codecogs.com/png.latex?%5Cpi%20P(%5Ctheta,%5Ctheta')%20=%20P(%5Ctheta',%20%5Ctheta)%5Cpi">, but everything about that equation is beyond the scope of this particular post.↩︎</p></li>
<li id="fn11"><p>Metropolis-adjusted Langevin Algorithm↩︎</p></li>
<li id="fn12"><p>Under the assumption that the total floating point error was bounded by a constant <img src="https://latex.codecogs.com/png.latex?%5Cdelta">↩︎</p></li>
<li id="fn13"><p>This time the assumption was that the rounding error for the acceptance probability at state <img src="https://latex.codecogs.com/png.latex?%5Ctheta_k"> was bounded by <img src="https://latex.codecogs.com/png.latex?%5Cdelta%20%5C%7C%5Ctheta_k%5C%7C">. This is a lot closer to how floating point arithmetic actually works. The trade off is that it requires a tighter condition on the drift function <img src="https://latex.codecogs.com/png.latex?V">.↩︎</p></li>
<li id="fn14"><p>IEEE floating point arithmetic represents a real number using <img src="https://latex.codecogs.com/png.latex?B"> bits. Typically <img src="https://latex.codecogs.com/png.latex?B%20=%2064"> (double precision) or <img src="https://latex.codecogs.com/png.latex?B%20=%2032"> (single precision). You can read a great intro to this on <a href="https://nhigham.com/2020/05/04/what-is-floating-point-arithmetic/">Nick Higham’s blog</a>. But in general, the <em>best</em> we can represent a real number <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> by is by a floating point number <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20%5Ctheta"> that satisfies <img src="https://latex.codecogs.com/png.latex?%0A%7C%5Ctheta%20-%20%5Ctilde%20%5Ctheta%7C%20%5Cleq%202%5E%7B-N+1%7D%7C%5Ctheta%7C,%0A"> where <img src="https://latex.codecogs.com/png.latex?N=23"> in single precision and <img src="https://latex.codecogs.com/png.latex?N=32"> in double precision. Of course, the acceptance probability is a non-linear combination of floating point numbers, so the actual error is going to be more complicated than that. I strongly recommend you read <a href="http://www.maths.manchester.ac.uk/~higham/asna/index.php">Nick Higham’s book</a> on the subject.↩︎</p></li>
<li id="fn15"><p><img src="https://latex.codecogs.com/png.latex?V">-geometrically ergodic with some light conditions on <img src="https://latex.codecogs.com/png.latex?V">↩︎</p></li>
<li id="fn16"><p>Geometric ergodicity implies the existence of a CLT! Which is nice, because all of our intuition about how to use the output from MCMC depends on a CLT.↩︎</p></li>
<li id="fn17"><p>Like all good orgies, this one was mostly populated by men↩︎</p></li>
<li id="fn18"><p>Yes, I know. My (limited) contribution this literature was some small contributions to a paper <a href="https://www.jstor.org/stable/24780815">lead by Anne-Marie Lyne</a>. But if years of compulsory catholicism taught me anything (other than “If you’re drinking with a nun or an aging homosexual, don’t try to keep up”) it’s that something does not have to be literally true to be morally true.↩︎</p></li>
<li id="fn19"><p>We have to slightly redefine the word “exact” to mean “targets the correct stationary distribution” for this name to make sense↩︎</p></li>
<li id="fn20"><p>Random graph models and point processes are two great examples↩︎</p></li>
<li id="fn21"><p>for instance, it gets stuck for long times at single values↩︎</p></li>
<li id="fn22"><p>the aforementioned point process and graph models↩︎</p></li>
<li id="fn23"><p>Playing God of War: Ragnarok↩︎</p></li>
<li id="fn24"><p>The first run of God of War Games were not my cup of tea, but the 2008 game, which is essentially a detailed simulation of what happens when a muscle bear is entrusted with walking an 11 year old up a hill, was really enjoyable. So far this is too.↩︎</p></li>
<li id="fn25"><p>Does it talk about involutions for not fucking reason? Of course it does. Read past that.↩︎</p></li>
<li id="fn26"><p>Yeah, like I have also read my blog. Think of it as being like social media. It is not a representation of me a whole person. It’s actually biased towards stuff that I have either found or find difficult.↩︎</p></li>
<li id="fn27"><p>A friend of mine has a “No one knows I’m a transexual” t-shirt that she likes to wear to supermarkets.↩︎</p></li>
<li id="fn28"><p>Note that both <img src="https://latex.codecogs.com/png.latex?r_k"> and <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20r_k"> are computed using the <em>same</em> value <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20%5Ctheta_%7Bk-1%7D">.↩︎</p></li>
<li id="fn29"><p>The norm here is usually either the total variation norm of the <img src="https://latex.codecogs.com/png.latex?V">-norm. But truly it’s not important for the hand waving.↩︎</p></li>
<li id="fn30"><p>In most cases <img src="https://latex.codecogs.com/png.latex?M(%5Ctheta)%20%5Crightarrow%20%5Cinfty"> as <img src="https://latex.codecogs.com/png.latex?%5C%7C%5Ctheta%5C%7C%20%5Crightarrow%20%5Cinfty">.↩︎</p></li>
<li id="fn31"><p>Bounded and continuous always works. But everything is probably ok for unbounded functions as long as <img src="https://latex.codecogs.com/png.latex?h(%5Ctheta)"> has a pile of finite moments.↩︎</p></li>
<li id="fn32"><p>This is roughly true. I basically used the geometric ergodicity bound to bound <img src="https://latex.codecogs.com/png.latex?%0A%5Csum_%7Bk=N_%7Bj-1%7D%7D%5E%7BN_j-1%7D%20%5Cleft(%5Ctheta_k%20-%20%5Cfrac%7B1%7D%7BN_j-1%7D%5Cmathbb%7BE%7D_%5Cpi(h(%5Ctheta)%5Cright)%0A"> and summed it up. There are smarter things to do, but it’s close enough for government work. ↩︎</p></li>
<li id="fn33"><p>Sometimes, if you squint, this term will kinda, sorta start to look like <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D_%5Cpi(%5Cpi(%5Ctheta)%5E%7B-1/2%7D)">, which isn’t usually toooo big. But also, sometimes it looks totally different. Theory is wild.↩︎</p></li>
<li id="fn34"><p>If you’ve ever wondered how <code>rbinom(1,p)</code> works, there you are.↩︎</p></li>
<li id="fn35"><p>Think of this as the opposite of an adversarial example. We are trying to find the exact chain that is scared to leave the approximate chain behind. Which is either romantic or creepy, depending on finer details.↩︎</p></li>
<li id="fn36"><p>Well not me. <a href="https://arxiv.org/pdf/1608.01511.pdf">Florian Völlering</a> did it in his Theorem 1.4. I most certainly could not have done it.↩︎</p></li>
<li id="fn37"><p>Well the result does not need this to be a Markov chain!↩︎</p></li>
<li id="fn38"><p>it goes up if <img src="https://latex.codecogs.com/png.latex?%5Calpha%3E%5Ctilde%20%5Calpha"> otherwise it goes down↩︎</p></li>
<li id="fn39"><p>The 1 case can basically never happen except in the trivial case where both acceptance probabilities are the same. And if we thought that was going to happen we would’ve done something bloody else↩︎</p></li>
<li id="fn40"><p>The the relative error being bounded does not stop the absolute error growing!↩︎</p></li>
<li id="fn41"><p>Look above and recognize the Geometric distribution↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2022,
  author = {Simpson, Dan},
  title = {MCMC with the Wrong Acceptance Probability},
  date = {2022-11-23},
  url = {https://dansblog.netlify.app/posts/2022-11-23-wrong-mcmc/wrong-mcmc.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2022" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2022. <span>“MCMC with the Wrong Acceptance
Probability.”</span> November 23, 2022. <a href="https://dansblog.netlify.app/posts/2022-11-23-wrong-mcmc/wrong-mcmc.html">https://dansblog.netlify.app/posts/2022-11-23-wrong-mcmc/wrong-mcmc.html</a>.
</div></div></section></div> ]]></description>
  <category>Fundamentals</category>
  <category>MCMC</category>
  <category>Bayes</category>
  <guid>https://dansblog.netlify.app/posts/2022-11-23-wrong-mcmc/wrong-mcmc.html</guid>
  <pubDate>Tue, 22 Nov 2022 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2022-11-23-wrong-mcmc/elvira.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>On that example of Robins and Ritov; or A sleeping dog in harbor is safe, but that’s not what sleeping dogs are for</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2022-11-12-robins-ritov/robins-ritov.html</link>
  <description><![CDATA[ 





<section id="sometimes-its-the-parable-of-the-barren-fig-tree.-sometimes-youre-just-pissed-at-a-shrub." class="level1">
<h1>Sometimes it’s the parable of the barren fig tree. Sometimes you’re just pissed at a shrub.</h1>
<p>Paradoxes and counterexamples live in statistics as our morality plays and our ghost stories. They serve as the creepy gas station attendants that populate the roads leading to the curséd woods; existing not to force change on the adventurer, but to signpost potential danger.<sup>1</sup></p>
<p>As a rule, we should also look in askance at attempts to resolve these paradoxes and counterexamples. That is not what they are for. They are community resources, objects of our collective culture, monuments to thwarted desire.</p>
<p>But sometimes, driven by the endless thirst for content, it’s worth diving down into a counterexample and resolving it. This quixotic quest is not to somehow patch a hole, but to rather expand the hole until it can comfortably encase our wants, needs, and prayers.</p>
<p>To that end, let’s gather ’round the campfire and attend the tale of The Bayesian and the Ancillary Coin.</p>
<p>This example<sup>2</sup> was introduced by Robins and Ritov, and greatly popularised (and frequently reformulated) by Larry Wasserman<sup>3</sup>. It says<sup>4</sup> this:</p>
<blockquote class="blockquote">
<p>A committed subjective Bayesian (one who cleaves to the likelihood priniciple tighter than Rose clings to that door) will sometimes get a very wrong answer under some simple, but realistic, forms of randomization. Only a less committed Bayesian will be able to skirt the danger.</p>
</blockquote>
<p>So this is what we’re going to do now. First let’s introduce a version of the problem that does not trigger the counterexample. We then introduce the randomization scheme that leads to the error and talk about exactly how things go wrong. As someone who is particular skeptical of any claims to purity<sup>5</sup>, the next job is going to be deconstructing this idea of a committed<sup>6</sup> subjective Bayesian. I will, perhaps unsurprisingly, argue that this is the only part of the Robins and Ritov (and Wasserman) conclusions that are somewhat questionable. In fact, a <em>true</em> committed subjective Bayesian<sup>7</sup> can solve the problem. It’s just a matter of looking at it through the correct lens.</p>
<section id="a-counterexample-always-proceedes-from-the-least-interesting-premise" class="level2">
<h2 class="anchored" data-anchor-id="a-counterexample-always-proceedes-from-the-least-interesting-premise">A counterexample always proceedes from the least interesting premise</h2>
<p>This example exists in a number of forms, that each add important corners to the problem, but in the interest of simplicity, we will start with a simple situation where no problems occur.</p>
<p>Assume that there is a large, but fixed, finite number <img src="https://latex.codecogs.com/png.latex?J">, and <img src="https://latex.codecogs.com/png.latex?J"> unknown parameters <img src="https://latex.codecogs.com/png.latex?%5Cmu_j">, <img src="https://latex.codecogs.com/png.latex?j=1,%5Cldots,%20J">. The large number <img src="https://latex.codecogs.com/png.latex?J"> can be thought of as the number of strata in a population, while <img src="https://latex.codecogs.com/png.latex?%5Cmu_j"> are the means of the corresponding stratum. Now construct an experiment where you draw <img src="https://latex.codecogs.com/png.latex?%0Ay_i%20%5Cmid%20%5Cmu,x%20=%20j%20%5Csim%20N(%5Cmu_j,%201).%0A"> To close out the generative model, we assume that the covariates have a known distribution <img src="https://latex.codecogs.com/png.latex?x_i%20%5Csim%20%5Ctext%7BUnif%7D%5C%7B1,%5Cldots,%20p%5C%7D">.</p>
<p>A classical problem in mathematical statistics is to construct a <img src="https://latex.codecogs.com/png.latex?%5Csqrt%7Bn%7D">-consistent<sup>8</sup> estimator <img src="https://latex.codecogs.com/png.latex?%5Chat%5Cmu_n"> of the vector <img src="https://latex.codecogs.com/png.latex?%5Cmu">. But in the setting of this problem, this is quite difficult. The challenge is that if <img src="https://latex.codecogs.com/png.latex?J"> is a very large number, then we would need a gargantuan<sup>9</sup> number of observations (<img src="https://latex.codecogs.com/png.latex?n%20%5Cgg%20J">) in order to resolve all of the parameters properly.</p>
<p>But there is a saving grace! The <em>population</em><sup>10</sup> average <img src="https://latex.codecogs.com/png.latex?%0A%5Cmu%20=%20%5Cmathbb%7BE%7D(y)%20=%20%5Csum_%7Bj=1%7D%5EJ%20%5Cmu_j%20%5CPr(x%20=%20j)=%20%5Cfrac%7B1%7D%7BJ%7D%5Csum_%7Bj=1%7D%5EJ%20%5Cmu_j%0A"> can be estimated fairly easily. In fact, the sample mean (aka the most obvious estimator) <img src="https://latex.codecogs.com/png.latex?%5Cbar%7By%7D%20=%20n%5E%7B-1%7D%20%5Csum_%7Bi=1%7D%5En%20y_i"> is going to be <img src="https://latex.codecogs.com/png.latex?%5Csqrt%7Bn%7D">-consistent.</p>
<p>Similarly, if we were to construct a Bayesian estimate of the population mean based off the prior <img src="https://latex.codecogs.com/png.latex?%5Cmu_j%20%5Cmid%20m%20%5Csim%20N(m,%201)"> and <img src="https://latex.codecogs.com/png.latex?m%20%5Csim%20N(0,%5Ctau%5E2)">, then the posterior estimate of the population mean is, for large enough<sup>11</sup> <img src="https://latex.codecogs.com/png.latex?n">, <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%20%5Cmu_%7B%5Ctext%7BBayes%7D,n%7D=%20%5Cmathbb%7BE%7D(%5Cmu%20%5Cmid%20y)%20%5Capprox%20%5Cfrac%7B1%7D%7Bn%20+%202/%5Ctau%7D%20%5Csum_%7Bi=1%7D%5En%20y_i.%0A"> This means that the<sup>12</sup> Bayesian resolution of this problem is roughly the same as the classical resolution. This is a nice thing. For very simple problems, these estimators should be fairly similar. It’s only when shit gets complicated where things become subtle.</p>
<p>This scenario, where a model is parameterized by an extremely high dimensional parameter <img src="https://latex.codecogs.com/png.latex?%5Cmu"> but the quantity of inferential inference is a low-dimensional summary of <img src="https://latex.codecogs.com/png.latex?%5Cmu">, is widely and deeply studied under the name of semi-parametric statistics.</p>
<p>Semi-parametric statistics is, unsurprisingly, harder than parametric statistics, but it also quite a bit more challenging than non-parametric statistics. The reason is that if we want to guarantee a good estimate of a particular finite dimensional summary, it turns out that it’s not enough to generically get a “good” estimate of the high-dimensional parameter. In fact, getting a good estimate of the high-dimensional parameter is often not possible (see the example we just considered).</p>
<p>Instead understanding semi-parametric models becomes the fine art of understanding what needs to be done well and what we can half arse. A description of this would take us <em>well</em> outside the scope of a mere blog post, but if you want to learn more about the topic, that’s what to google.</p>
</section>
<section id="robins-and-ritov-toss-an-ancillary-coin-and-let-slip-the-dogs-of-war" class="level2">
<h2 class="anchored" data-anchor-id="robins-and-ritov-toss-an-ancillary-coin-and-let-slip-the-dogs-of-war">Robins and Ritov toss an ancillary coin and let slip the dogs of war</h2>
<p>In order to destroy all that is right and good about the previous example, we only need to do one thing: randomize in a nefarious way. Robins and Ritov (actually, Wasserman who proposed the case with a finite <img src="https://latex.codecogs.com/png.latex?J">) add to their experiment <img src="https://latex.codecogs.com/png.latex?J"> biased coins <img src="https://latex.codecogs.com/png.latex?r_j"> with the property that <img src="https://latex.codecogs.com/png.latex?%0A%5CPr(r_j%20=%201%20%5Cmid%20X=j)%20=%20%5Cxi_j,%0A"> for some <em>known</em> <img src="https://latex.codecogs.com/png.latex?0%20%3C%20%5Cdelta%20%5Cleq%20%5Cxi_j%20%3C%201-%5Cdelta">, <img src="https://latex.codecogs.com/png.latex?j=1,%5Cldots,%20J"> and some <img src="https://latex.codecogs.com/png.latex?c%3E0">.</p>
<p>They then go through the data and add a column <img src="https://latex.codecogs.com/png.latex?r_i%20%5Csim%20%5Ctext%7BBernouili%7D(%5Cxi_%7Bx_i%7D)">. The new data is now a three dimensional vector <img src="https://latex.codecogs.com/png.latex?(y_i,%20x_i,%20r_i)">. It’s important to this problem that the <img src="https://latex.codecogs.com/png.latex?%5Cxi_j"> are known and that we have the conditional independence structure <img src="https://latex.codecogs.com/png.latex?y%20%5Cperp%20r%20%5Cmid%20x">.</p>
<p>Robins, Ritov, and Wasserman all ask the same question: Can we still estimate the population mean if we only observe samples from the <em>conditional</em> distribution <img src="https://latex.codecogs.com/png.latex?(y_i,%20x_i)%20%5Csim%20p(x,y%20%5Cmid%20r=1)">?</p>
<p>The answer is going to turn out that there is a perfectly good estimator from classical survey statistics, but a Bayesian estimator is a bit more challenging to find.</p>
<p>Before we get there, it’s worth noting that unlike the problem in the previous section, this problem is at least a little bit interesting. It’s a cartoon of a very common situation where there is covariate-dependent randomization in a clinical trial. Or, maybe even more cleanly, a cartoon of a simple probability survey.</p>
<p>A critical feature of this problem is that because the <img src="https://latex.codecogs.com/png.latex?%5Cxi_j"> are known and <img src="https://latex.codecogs.com/png.latex?p(x)"> is known, the joint likelihood factors as <img src="https://latex.codecogs.com/png.latex?%0Ap(y,x,r%20%5Cmid%20%5Cmu)%20=%20p(x)p(r%5Cmid%20x)%20p(y%20%5Cmid%20x,%20%5Cmu)%20=%20p(r%20,%20x)%20p(y%20%5Cmid%20x,%20%5Cmu),%0A"> so <img src="https://latex.codecogs.com/png.latex?r"> is ancillary<sup>13</sup> for <img src="https://latex.codecogs.com/png.latex?%5Cmu">.</p>
<p>The simplest classical estimator for <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(y)"> is the Horvitz-Thompson estimator <img src="https://latex.codecogs.com/png.latex?%0A%5Cbar%7By%7D_%5Ctext%7BHT%7D%20=%20%5Cfrac%7B1%7D%7Bn%7D%20%5Csum_%7Bi=1%7D%5En%20%5Cfrac%7By_i%7D%7B%5Cxi_%7Bx_i%7D%7D.%0A"> It’s easy to show that this is a <img src="https://latex.codecogs.com/png.latex?%5Csqrt%7Bn%7D">-consistent estimator. Better yet, <em>uniform</em> over <img src="https://latex.codecogs.com/png.latex?%5Cmu"> in the sense that the convergence of the estimator isn’t affected (to leading order) by the specific <img src="https://latex.codecogs.com/png.latex?%5Cmu_j"> values. This uniformity is quite useful as it gives some hope of good finite-data behaviour.</p>
<p>So now that we know that the problem <em>can</em> be solved, let’s see if we can solve it in a Bayesian way. Robins and Ritov gave the following result.</p>
<blockquote class="blockquote">
<p>There is no uniformly consistent Baysesian estimator of the parameter <img src="https://latex.codecogs.com/png.latex?%5Cmu"> unless the prior depends on the <img src="https://latex.codecogs.com/png.latex?%5Cxi_j"> values.</p>
</blockquote>
<p>Robins and Ritov argue that a “committed subjective Bayesian” would, by the Likelihood Principle, never allow their prior to depend on the ancillary statistic <img src="https://latex.codecogs.com/png.latex?%5Cxi"> as the Likelihood Principle clearly states that inference should be independent on ancillary information.</p>
<p>There are, of course, ways to construct priors that depend on the sampling probabilities. Wasserman calls this “frequentist chasing”</p>
<p>So let’s investigate this, by talking about what went wrong, how to fix it, and whether fixing it makes us bad Bayesians.</p>
</section>
</section>
<section id="the-likelihood-principle-and-the-death-of-nuance" class="level1">
<h1>The likelihood principle and the death of nuance</h1>
<p>So what is the likelihood principle and why is it being such a bastard to us poor liddle bayesians?</p>
<p>The likelihood principle says, roughly, that the all of the information needed for parameter inference<sup>14</sup> should be contained in the likelihood function.</p>
<p>In particular, if we follow the likelihood principle, then if we have two likelihoods that are scalar multiples of each other, our estimates of the parameters should be the same.</p>
<p>Ok. Sure.</p>
<p>Why on earth do people care about the likelihood principle? I guess it’s because they aren’t happy with the fact that Bayesian methods actually work in practice and instead want to do some extremely boring philosophy-ish stuff to “prove” the superiority and purity of Bayesian methods. And you know all power<sup>15</sup> to them. Your kink is not my kink.</p>
<p>In this context, it means that because <img src="https://latex.codecogs.com/png.latex?r"> is ancillary to <img src="https://latex.codecogs.com/png.latex?y"> for estimating <img src="https://latex.codecogs.com/png.latex?%5Cmu"> we should avoid using the <img src="https://latex.codecogs.com/png.latex?r_i">s (and the <img src="https://latex.codecogs.com/png.latex?%5Cxi_j">s) to estimate <img src="https://latex.codecogs.com/png.latex?%5Cmu">. This is in direct opposition to what the Horvitz-Thompson estimator uses.</p>
<p>What happens if we follow this principle? We get a bad estimate.</p>
<p>It’s pretty easy to see that the posterior mean will, eventually, converge to the true value. All that has to happen is you need to see enough observations in each category. So if you get enough data, you will eventually get a good estimate.</p>
<p>Unfortunately, when <img src="https://latex.codecogs.com/png.latex?J"> is large, this will potentially take a very very long<sup>16</sup> time.</p>
<p>Let’s go a bit deeper and see why this behaviour is not wrong, <em>per se</em>, it’s just Bayesian.</p>
<p>Bayesian inference produces a posterior distribution, which is conditional on an observed sample. This posterior distribution is an update to the prior that describes how compatible different parameter configurations are with the observed sample.</p>
<p>The thing is, our sample only sees a small sample of the values of <img src="https://latex.codecogs.com/png.latex?x">. This means that we are, essentially, estimating <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D_x%20(%5Cmathbb%7BE%7D(y%20%5Cmid%20x)%201_%7Bx%20%5Cin%20A_%7Br%7D%7D%20%5Cmid%20r),%0A"> where <img src="https://latex.codecogs.com/png.latex?A_r"> is the set observed values of <img src="https://latex.codecogs.com/png.latex?x">, which depends on <img src="https://latex.codecogs.com/png.latex?r">. This target changes as we get more data and see more levels of <img src="https://latex.codecogs.com/png.latex?x"> and eventually coalesces towards the thing we are trying to compute.</p>
<p>But, and this is critical, we <em>cannot</em> say <em>anything</em> about <img src="https://latex.codecogs.com/png.latex?%5Cmu_j"> for <img src="https://latex.codecogs.com/png.latex?j%20%5Cnot%20%5Cin%20A_r"> unless we can assume that they are, in some sense, very strongly related. Unfortunately, the whole point of this example is that we are not allowed<sup>17</sup> to assume that!</p>
<p>In this extremely flexible model, it’s possible to have a sequence <img src="https://latex.codecogs.com/png.latex?%5Cxi_j"> that is highly correlated<sup>18</sup> with <img src="https://latex.codecogs.com/png.latex?%5Cmu_j">. If, for instance, <img src="https://latex.codecogs.com/png.latex?%5Coperatorname%7Bexpit%7D(%5Cmu_j)%20=%20%5Cxi_j"> were<sup>19</sup> equally spaced on <img src="https://latex.codecogs.com/png.latex?%5B%5Cdelta,%201-%5Cdelta%5D"> for some small <img src="https://latex.codecogs.com/png.latex?%5Cdelta%3E0">, you would have the situation where you are very likely to see the largest values of <img src="https://latex.codecogs.com/png.latex?y"> and quite unlikely to see any of the smaller values. This would gravely bias your sample mean upwards.</p>
<p>This construction is the basis similar to the one that Robins and Ritov use to prove that there is always at parameter value where the posterior mean converges<sup>20</sup> to the true mean at a rate no faster than <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D((%5Clog%20%5Clog%20n)%5E2%20%5Clog%20n)">, which would require an exponentially large number of samples to do any sort of inference.</p>
<p>A reasonable criticism of this argument is that surely most problems will not have strong correlation between the sampling probabilities and the conditional means. In a follow up paper, <a href="https://projecteuclid.org/journals/statistical-science/volume-29/issue-4/The-Bayesian-Analysis-of-Complex-High-Dimensional-Models--Can/10.1214/14-STS483.full">Ritov <em>et al.</em></a> argue that it’s not necessarily all that rare. For instance, if they are both realisations of independent GPs<sup>21</sup> the empirical correlation between the two observed sequences can be far from zero! Less abstractly, it’s pretty easy to imagine something that is more popular with old people (who often answer their phones) than with young people (who don’t typically answer their phones). So this type of adversarial correlation certainly can happen in practice.</p>
</section>
<section id="can-we-save-bayes" class="level1">
<h1>Can we save Bayes?</h1>
<p>No.</p>
<p>Bayes does not need to be saved. She is doing exactly what it set out to do and is living her best life. Do not interfere<sup>22</sup>.</p>
<p>So let’s look at why we don’t need to fix things.</p>
<section id="a-simple-posterior-and-its-post-processing" class="level2">
<h2 class="anchored" data-anchor-id="a-simple-posterior-and-its-post-processing">A simple posterior and its post-processing</h2>
<p>Once again, recall the setting: we are observing the triple<sup>23</sup> <img src="https://latex.codecogs.com/png.latex?%0Az_i%20=%20(x_i,r_i,y_i)%20=%20(x_i,%20r_i,%20%5Ctexttt%7Br%5Bi%5D==1?%20y%5Bi%5D:%20NA%7D).%0A"> In particular, we can process this data to get some quantities:</p>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?N">: The total sample size</li>
<li><img src="https://latex.codecogs.com/png.latex?n=%20%5Csum_%7Bi=1%7D%5EN%20r_i">: The number of observed <img src="https://latex.codecogs.com/png.latex?y"></li>
<li><img src="https://latex.codecogs.com/png.latex?N_j%20=%20%5Csum_%7Bi=1%7D%5EN%201_%7Bx_i%20=%20j%7D">: The total number of times group <img src="https://latex.codecogs.com/png.latex?j"> was sampled</li>
<li><img src="https://latex.codecogs.com/png.latex?n_j%20=%20%5Csum_%7Bi=1%7D%5EN%20r_i1_%7Bx_i%20=%20j%7D">: The number of times an observation from group <img src="https://latex.codecogs.com/png.latex?j"> was recorded.</li>
</ul>
<p>Because of the structure of the problem, most observed values of <img src="https://latex.codecogs.com/png.latex?N_j"> and <img src="https://latex.codecogs.com/png.latex?n_j"> will be zero or one.</p>
<p>Nevertheless, we persist.</p>
<p>We now need priors on the <img src="https://latex.codecogs.com/png.latex?%5Cmu_j">. There are probably a tonne of options here, but I’m going to go with the simplest one, which is just to make them iid <img src="https://latex.codecogs.com/png.latex?N(0,%20%5Ctau%5E2)"> for some fixed and known value <img src="https://latex.codecogs.com/png.latex?%5Ctau">. We can then fit the resulting model and get the posterior for each <img src="https://latex.codecogs.com/png.latex?%5Cmu_j">. Note that because of the data sparsity, most of the posteriors will just be the same as the prior.</p>
<p>Then we can ask ourselves a much more Bayesian question: What would the average in our sample have been if we had recorded every <img src="https://latex.codecogs.com/png.latex?y_i">? Our best estimate of that quantity is <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B1%7D%7BN%7D%5Csum_%7Bj=1%7D%5EJ%20N_j%20%5Cmu_j%0A"></p>
<p>That’s all well and good. And, again, if I had small enough <img src="https://latex.codecogs.com/png.latex?J"> or large enough <img src="https://latex.codecogs.com/png.latex?N"> that I had a good estimate for all of the <img src="https://latex.codecogs.com/png.latex?%5Cmu_j">, this would be a good estimate. Moreover, for finite data this is likely to be a much better estimator than <img src="https://latex.codecogs.com/png.latex?J%5E%7B-1%7D%5Csum_%7Bj=1%7D%5EJ%20%5Cmu_j"> as it at least partially corrects for any potential imbalance in the covariate sampling.</p>
<p>It’s also worth noting here that there is nothing “Bayesian” about this. I am simply taking the knowledge I have from the sample I observed and processing the posterior to compute a quantity that I am interested in.</p>
<p>But, of course, that isn’t actually the quantity that I’m interested in. I’m interested in that quantity averaged over realisations of <img src="https://latex.codecogs.com/png.latex?r">. We can compute this if we can quantify the effect that <img src="https://latex.codecogs.com/png.latex?n_j"> has on <img src="https://latex.codecogs.com/png.latex?%5Cmu_j">.</p>
<p>We can do this pretty easily. Our priors are iid<sup>24</sup>, so this decouples into <img src="https://latex.codecogs.com/png.latex?J"> independent normal-normal models.</p>
<p>For any <img src="https://latex.codecogs.com/png.latex?j">, denote <img src="https://latex.codecogs.com/png.latex?y%5E%7B(j)%7D"> as the subset of <img src="https://latex.codecogs.com/png.latex?y"> that are in category <img src="https://latex.codecogs.com/png.latex?j">. We have that<sup>25</sup> <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Ap(%5Cmu_j%20%5Cmid%20y)%20&amp;%5Cpropto%20%5Cexp%5Cleft(-%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bi=1%7D%5E%7Bn_j%7D(y%5E%7B(j)%7D_i%20-%20%5Cmu_j)%5E2%20-%20%5Cfrac%7B1%7D%7B2%5Ctau%5E2%7D%5Cmu_j%5E2%5Cright)%5C%5C%0A&amp;%5Cpropto%20%5Cexp%5Cleft%5B-%5Cfrac%7B1%7D%7B2%7D%5Cleft(%5Cfrac%7B1%7D%7B%5Ctau%7D%20+%20n_j%5Cright)%5Cmu_j%5E2%20+%20%5Cmu_j%5Csum_%7Bi=1%7D%5E%7Bn_j%7Dy_i%5E%7B(j)%7D%5Cright%5D.%0A%5Cend%7Balign*%7D"></p>
<p>If we expand the density for a <img src="https://latex.codecogs.com/png.latex?%5Cmu_j%20%5Cmid%20y%20%5Csim%20N(m,v%5E2)"> we get <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Cmu_j%20%5Cmid%20y)%20%5Cpropto%20%5Cexp%5Cleft(-%5Cfrac%7B1%7D%7B2v%5E2%7D%5Cmu_j%5E2%20+%20%5Cfrac%7B1%7D%7Bv%5E2%7Dm%5Cmu_j%5Cright).%0A"> Matching terms in these two expressions we get that <img src="https://latex.codecogs.com/png.latex?%0Av_j%5E%5Ctext%7Bpost%7D%20=%20%5Coperatorname%7BVar%7D(%5Cmu_j%20%5Cmid%20y,%20n_j)%20=%20%20%5Cfrac%7B1%7D%7Bn_j%20+%20%5Ctau%5E%7B-2%7D%7D,%0A"> while the posterior mean is <img src="https://latex.codecogs.com/png.latex?%0Am_j%5E%5Ctext%7Bpost%7D%20=%20%5Cmathbb%7BE%7D(%5Cmu_j%20%5Cmid%20y,%20n_j)%20=%20%5Cfrac%7B1%7D%7Bn_j%20+%20%5Ctau%5E%7B-2%7D%7D%5Csum_%7Bi=1%7D%5E%7Bn_j%7Dy_i%5E%7B(j)%7D,%0A"> where I’ve suppressed the dependence on the sample <img src="https://latex.codecogs.com/png.latex?y"> in the <img src="https://latex.codecogs.com/png.latex?m_j"> and <img src="https://latex.codecogs.com/png.latex?v_j"> notation because, as a true<sup>26</sup> Bayesian, my sample is fixed and known. Hence <img src="https://latex.codecogs.com/png.latex?%0A%5Cmu_j%20%5Cmid%20y%20%5Csim%20N(m_j%5E%7B%5Ctext%7Bpost%7D%7D,%20v_j%5E%7B%5Ctext%7Bpost%7D%7D).%0A"></p>
<p>Then I get the following estimator for the mean of the complete sample <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D%5Cleft(%5Cfrac%7B1%7D%7BN%7D%5Csum_%7Bj=1%7D%5EJN_j%5Cmu_j%20%5Cmid%20y%20%5Cright)=%20%5Cfrac%7B1%7D%7BN%7D%5Csum_%7Bj=1%7D%5EJN_jm_j%5E%5Ctext%7Bpost%7D.%0A"> We can also compute the posterior variance<sup>27</sup> <img src="https://latex.codecogs.com/png.latex?%0A%5Coperatorname%7BVar%7D%5Cleft(%5Cfrac%7B1%7D%7BN%7D%5Csum_%7Bj=1%7D%5EJN_j%5Cmu_j%20%5Cmid%20y%20%5Cright)=%5Csum_%7Bj=1%7D%5EJ%5Cfrac%7BN_j%5E2%7D%7BN%5E2%7Dv_j%5E%5Ctext%7Bpost%7D.%0A"> Note that most of the groups won’t have a corresponding observation, so, recalling that <img src="https://latex.codecogs.com/png.latex?A_r"> is the set of <img src="https://latex.codecogs.com/png.latex?j">s that have been updated in the sample, we get <img src="https://latex.codecogs.com/png.latex?%0A%5Coperatorname%7BVar%7D%5Cleft(%5Cfrac%7B1%7D%7BN%7D%5Csum_%7Bj=1%7D%5EJN_j%5Cmu_j%20%5Cmid%20y%20%5Cright)=%5Csum_%7Bj%5Cin%20A_r%7D%5Cfrac%7BN_j%5E2%7D%7BN%5E2%7Dv_j%5E%5Ctext%7Bpost%7D%20+%20%5Ctau%5E2%5Csum_%7Bj%20%5Cnot%20%5Cin%20A_r%7D%5Cfrac%7BN_j%5E2%7D%7BN%5E2%7D,%0A"> where the term that multiplies <img src="https://latex.codecogs.com/png.latex?%5Ctau%5E2"> is less than 1.</p>
<p>So that’s all well and good, but that isn’t really the thing we were trying to estimate. We are actually interested in estimating the population mean, which we will get if we let <img src="https://latex.codecogs.com/png.latex?N%5Crightarrow%20%5Cinfty">.</p>
<p>So let’s see if we can do this without violating any of the universally agreed upon sacred strictures of Bayes.</p>
</section>
<section id="modelling-the-effect-of-the-ancillary-coin" class="level2">
<h2 class="anchored" data-anchor-id="modelling-the-effect-of-the-ancillary-coin">Modelling the effect of the ancillary coin</h2>
<p>Here’s the thing, though. We have computed our posterior distributions <img src="https://latex.codecogs.com/png.latex?p(%5Cmu_j%20%5Cmid%20y)"> and we can now use them as a generative model<sup>28</sup> for our data. We also have the composition of the complete data set (the <img src="https://latex.codecogs.com/png.latex?N_j">s) and full knowledge about how a new sample of the <img src="https://latex.codecogs.com/png.latex?n_j">s would come into our world.</p>
<p>We can put these things together! And that’s not in anyway violating our Bayesian oaths! We are simply using our totally legally obtained posterior distribution to compute things. We are still true committed<sup>29</sup> subjective Bayesians.</p>
<p>So we are going to ask ourselves a simple question. Imagine, for a given <img src="https://latex.codecogs.com/png.latex?N_j">, we have <img src="https://latex.codecogs.com/png.latex?n_j%20%5Csim%20%5Ctext%7BBinom%7D(N_j,%20%5Cxi_j)"> iid samples<sup>30</sup> <img src="https://latex.codecogs.com/png.latex?%0A%5Ctilde%7By%7D%5E%7B(j)%7D_i%20%5Csim%20N(m_j%5E%5Ctext%7Bpost%7D,%20v_j%5E%5Ctext%7Bpost%7D%20+%201).%0A"> What is the posterior mean <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(%5Cmu_j%20%5Cmid%20%5Ctilde%7By%7D%5E%7B(j)%7D,%20N_j)">? In fact, because this is random data drawn from a hypothetical sample, we can (and should<sup>31</sup>) ask questions about its distribution! To be brutally francis with you, I am too lazy to work out the variance of the posterior mean. So I’m just going to look at the mean of the posterior mean.</p>
<p>First things first, we need to look at the (average) posterior for <img src="https://latex.codecogs.com/png.latex?%5Cmu_j"> when <img src="https://latex.codecogs.com/png.latex?n_j%20=%20n">. The exact calculation we did before gives us <img src="https://latex.codecogs.com/png.latex?%0Am_j(n)%20=%20%5Cleft(1-%5Cfrac%7B1%7D%7B%5Ctau%5E2n%20+%201%7D%5Cright)%20m_j%5E%5Ctext%7Bpost%7D.%0A"> And, while I said I wasn’t going to focus on the variance, it’s easy enough to write down as <img src="https://latex.codecogs.com/png.latex?%0Av_j(n)%20=%20%5Cfrac%7B1%7D%7Bn%20+%20%5Ctau%5E%7B-2%7D%7D%20+%20%5Cleft(1%20-%20%5Cfrac%7B1%7D%7B%5Ctau%5E2n%20+%201%7D%5Cright)(1%20+%20v%5E%5Ctext%7Bpost%7D_j),%0A"> where the second term takes into account the variance due to the imputation.</p>
<p>With this, we can estimate sample mean for any number <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20N"> and any set of <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20N_j"> that sum to <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20N"> and any set of <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20n_j%20%5Csim%20%5Ctext%7BBinom%7D(%5Ctilde%20N_j,%20%5Cxi_j)"> as <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Cfrac%7B1%7D%7B%5Ctilde%20N%7D%5Csum_%7Bj=1%7D%5EJ%20%5Ctilde%20N_j%20m_j(n_j)%20&amp;=%20%5Cfrac%7B1%7D%7B%5Ctilde%20N%7D%5Csum_%7Bj=1%7D%5EJ%20%5Cfrac%7B%5Ctilde%20N_j%7D%7B%5Ctilde%20n_j%7D%20%5Ctilde%20n_j%20%5Ctilde%20m_j(n_j)%20%5C%5C%0A&amp;=%20%5Cfrac%7B1%7D%7B%5Ctilde%20N%7D%5Csum_%7Bj=1%7D%5EJ%20%5Cfrac%7B1%7D%7B%5Cxi_j%7D%20%5Ctilde%20n_j%20m_j%5E%5Ctext%7Bpost%7D%20+%20o(1),%0A%5Cend%7Balign*%7D"> where in the last line I’ve used the fact that the empirical proportion converges to <img src="https://latex.codecogs.com/png.latex?%5Cxi_j"> and the posterior mean converges to <img src="https://latex.codecogs.com/png.latex?m_j%5E%5Ctext%7Bpost%7D">. The little-o<sup>32</sup> error term is as <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20N"> (and hence <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20N_j"> and <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20n_j">) goes to infinity.</p>
<p>To turn this into a practical estimate, we can plug in our values of <img src="https://latex.codecogs.com/png.latex?n_j"> and <img src="https://latex.codecogs.com/png.latex?N"> to get our Bayesian approximation to the population mean <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Chat%20%5Cmu%20&amp;=%20%5Cfrac%7B1%7D%7BN%7D%5Csum_%7Bj=1%7D%5EJ%20%5Cfrac%7Bn_j%7D%7B%5Cxi_j%7Dm_j%5E%7B%5Ctext%7Bpost%7D%7D%20%5C%5C%0A&amp;=%5Cfrac%7B1%7D%7BN%7D%20%5Csum_%7Bj%20%5Cin%20A_r%7D%20%5Cfrac%7Bn_j%7D%7B%5Cxi_j%7Dm_j%5E%5Ctext%7Bpost%7D%20%5C%5C%0A&amp;=%5Cfrac%7B1%7D%7BN%7D%5Csum_%7Bj=1%7D%5EJ%5Csum_%7Bi=1%7D%5E%7Bn_j%7D%20%5Cfrac%7B1%7D%7B%5Cxi_j%7D%5Cleft(1%20-%20%5Cfrac%7B%5Ctau%5E%7B-2%7D%7D%7Bn_j%7D%5Cright)y_i%5E%7B(j)%7D,%0A%5Cend%7Balign*%7D"> which is (up to the small term in brackets) the Horvitz-Thompson estimator!</p>
</section>
<section id="is-it-bayesian" class="level2">
<h2 class="anchored" data-anchor-id="is-it-bayesian">Is it Bayesian?</h2>
<p>I stress, again, that there is nothing inherently non-Bayesian about this derivation. Except possibly the question that it is asking. What I did was compute the posterior distribution and then I took it seriously and used it to compute a quantity of interest.</p>
<p>The only oddity is that the quantity of interest (the population mean) has a slightly awkward link to the observed sample. Hence, I estimated something that had a more direct link to the population mean: the sample mean of the completely observed sample under different realisations of the randomisation <img src="https://latex.codecogs.com/png.latex?r_i">.</p>
<p>In order to estimate the sample mean under different realisations of the randomisation, I needed to use the posterior predictive distribution to impute these fictional samples. I then averaged over the imputed samples and sent the sample size to infinity to get an estimator<sup>33</sup>.</p>
<p>Or, to put it differently, I used Bayes to get a posterior estimate for new data <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Ctilde%20y,%20%5Ctilde%20r,%20%5Ctilde%20x)%20=%20%5Cint_%7B%5Cmathbb%7BR%7D%5EJ%7Dp(%5Ctilde%20y%20%5Cmid%20%5Ctilde%20x,%20%5Cmu)%5C,d%5Cmu%20p(%5Ctilde%20r%20%5Cmid%20%5Ctilde%20x)%20p(%5Ctilde%20x)%0A"> and then used this probabilistic model to estimate <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(%5Ctilde%20y)">. There was no reason to use Bayesian methods to do this. Non-Bayesian questions do not invite Bayesian answers.</p>
<p>Now, would I go to all of this effort in real life? Probably not. And in the applications that I’ve come across, I’ve never had to. I’ve done a bunch of MRP<sup>34</sup>, which is structurally quite similar to this problem except we can reasonably model the dependence structure between the <img src="https://latex.codecogs.com/png.latex?%5Cmu_j">s. <a href="https://arxiv.org/abs/1908.06716">This paper</a> I wrote with Alex Gao, Lauren Kennedy, and Andrew Gelman is an example of the type of modelling you can do.</p>
</section>
</section>
<section id="is-it-true-am-i-a-chaser" class="level1">
<h1>Is it true? Am I a chaser?</h1>
<p>Wasserman derides “frequentist chasing” Bayesians, making the point that if they want a frequentist guarantee so badly, why not just do it the easy way.</p>
<p>Now. Laz. Mate.</p>
<p>Let me tell you that a lot of my self esteem has been traditionally gathered from chasers, so I absolutely refuse to be party to the slander.</p>
<p>But more than that, let’s be clear. Bayes is a way to probabilistically describe data. That is not enough in and of itself to be useful. For it to be useful, we need to <em>do something</em> with that posterior distribution.</p>
<p>So really, let’s talk about what a <em>true committed subjective Bayesian</em> does about this. Firstly, I mean really. There is no such thing<sup>35</sup>. But leaving that aside, the closest I can get to a working definition is that a true committed subjective Bayesian is a person who understands that parameters are polite fictions that are used to describe the data. They are not, inherently, linked to any population quantity (for a true committed subjective Bayesian, such a thing does not exist).</p>
<p>The <em>only</em> way to link parameters in a Bayesian model to a population quantity of interest is to use some sort of extra-Bayesian<sup>36</sup> information.</p>
<p>For instance, in the first example (the one without the ancillary coin), I made that link in secret using assumptions about the sample. We all know that those types of assumptions are fraught and the reason that people spend so much time whispering DAG into the ears of their sleeping lovers.</p>
<p>For the ancillary coin example, we used the given information about the sampling mechanism as our extra information to link our posterior distribution to the population quantity of interest. None of this changes the <em>purity</em><sup>37</sup> of the Bayesian analysis. Or makes a non-Bayesian solution preferable. (Although, in this case, a non-Bayesian solution is a fuckload easier to come up with.)</p>
<p>Of course Wasserman (and I presume Robins and Ritov) know all of this. But it’s fun to write it all down.</p>
<p>Moreover, I think that the three lessons here are fairly transferable:</p>
<ol type="1">
<li>If you’re going to go to the trouble of computing a posterior, take it seriously. Use it to do things! You can even put it in as part of a probabilistic model.</li>
<li>If you’re going to make Bayes work for you, think in terms of observables (eg the mean of the complete sample) rather than parameters.</li>
<li>Appeals to purity are a bit of a waste of time.</li>
</ol>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Huge thanks to Sameer Deshpande for great comments!↩︎</p></li>
<li id="fn2"><p>I first came across this in a <a href="https://normaldeviate.wordpress.com/2012/10/11/the-robins-ritov-example-a-post-mortem/">series of posts</a> on Larry Wasserman’s now defunct but quite excellent blog.↩︎</p></li>
<li id="fn3"><p>It’s worth saying that these three people do fabulous statistics of the form that I don’t usually do. But that doesn’t make it less important to understand their contributions. You could say that while I am not a Lazbian, I think it’s important to know the theory.↩︎</p></li>
<li id="fn4"><p>I might have slightly reworded it.↩︎</p></li>
<li id="fn5"><p>Purity is needed in good olive oil and that’s it↩︎</p></li>
<li id="fn6"><p>A committed subjective Bayesian prefers Dutch baby to a Dutch book.↩︎</p></li>
<li id="fn7"><p>A true committed subjective Bayesian doesn’t wear anything under his kilt.↩︎</p></li>
<li id="fn8"><p>That is, an estimator where <img src="https://latex.codecogs.com/png.latex?%5CPr(%7C%5Chat%20%5Cmu_n%20-%20%5Cmu%7C%20%3E%20%5Csqrt%7Bn%7D%5Cepsilon)%20%5Crightarrow%200"> for all <img src="https://latex.codecogs.com/png.latex?%5Cepsilon%3E0">. This, roughly, means, that you can find a <img src="https://latex.codecogs.com/png.latex?C"> such that <img src="https://latex.codecogs.com/png.latex?%5Cmu%20%5Cin%20%5B%20%5Chat%20%5Cmu_n%20-%20C%5Csqrt%7Bn%7D,%20%5Chat%20%5Cmu_n%20+%20C%5Csqrt%7Bn%7D%5D"> with high probability.↩︎</p></li>
<li id="fn9"><p>The asymptotics say that we should count our data in multiples of <img src="https://latex.codecogs.com/png.latex?J">, so we’d <img src="https://latex.codecogs.com/png.latex?n%20%3E%20100J"> to get even one decimal place of accuracy.↩︎</p></li>
<li id="fn10"><p>Remember <img src="https://latex.codecogs.com/png.latex?%5Cmu_j%20=%20%5Cmathbb%7BE%7D(y%20%5Cmid%20x=j)">.↩︎</p></li>
<li id="fn11"><p>Theorem 2 of <a href="https://argmin.lis.tu-berlin.de/papers/07-harmeling-tr.pdf">Harmeling and Toussaint</a>↩︎</p></li>
<li id="fn12"><p>a↩︎</p></li>
<li id="fn13"><p>If you’ve not come across it, <em>ancillary</em> is the term used for parts of the data that don’t influence parameter estimates. It’s the opposite of a sufficient statistic. One way to see that it’s ancillary for <em>any</em> model <img src="https://latex.codecogs.com/png.latex?p(y%5Cmid%20x,%20%5Ctheta)">, is to consider the log of the joint density <img src="https://latex.codecogs.com/png.latex?%0A%5Clog(p(x,y,r%20%5Cmid%20%5Ctheta))%20=%20%5Clog%20p(y%5Cmid%20x,%20%5Ctheta)%20+%20%5Clog%20p(r%20%5Cmid%20x)%20+%20%5Clog%20p(x)%0A">, where the last two terms are constant in <img src="https://latex.codecogs.com/png.latex?%5Ctheta">.↩︎</p></li>
<li id="fn14"><p>You need to be specific here. Obviously this would be false if you were trying to do a statistical prediction. Or if you were trying to make a decision. Those things necessarily depend on extra stuff!↩︎</p></li>
<li id="fn15"><p>This is a lie. Insisting on talking about this shit rather than actually making Bayes useful and using it in new and exciting ways to do things that are hard to do without Bayesian methods is a waste of time. Worse than that, when you start pretending your method of choice is the only possible thing that a sensible and principled person would use, you start to look like a bit of a dickhead. It also turns people off trying these very flexible and useful methods. So yeah. I maybe do care a little bit. ↩︎</p></li>
<li id="fn16"><p>The expected number of samples to see one draw where <img src="https://latex.codecogs.com/png.latex?x_i%20=j"> is <img src="https://latex.codecogs.com/png.latex?J">. The expected number of draws where <img src="https://latex.codecogs.com/png.latex?x_i%20=%20j"> that you need to actually observe the corresponding <img src="https://latex.codecogs.com/png.latex?y_i"> is <img src="https://latex.codecogs.com/png.latex?%5Cxi_j%5E%7B-1%7D">. This suggests it will potentially take <em>a lot</em> of draws to even have effectively one sample from each category, let alone the 20-100 you’d need to, practically, get some sort of reasonable estimate.↩︎</p></li>
<li id="fn17"><p>Robins and Ritov have always been open that if there is a true parametric model for the <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%20%5Cmid%20x%20=%20j)"> (or if that function is “very smooth” in some technical sense, eg a realisation of a smooth Gaussian process) then the Bayesian estimator that incorporates this information will do perfectly well. ↩︎</p></li>
<li id="fn18"><p>So the RR example uses binary data, so then it’s the correlation between <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(y%20%5Cmid%20x=j)"> and <img src="https://latex.codecogs.com/png.latex?%5Cxi_j">, but the exact same argument works if <img src="https://latex.codecogs.com/png.latex?%5Cxi_j"> is correlated something like <img src="https://latex.codecogs.com/png.latex?%5Coperatorname%7Bexpit%7D(%5Cmu_j)">. I went with the Gaussian version because at one point I thought I might end up having to derive posteriors and I’m all about simplicity.↩︎</p></li>
<li id="fn19"><p>expit is the inverse of the logit transform↩︎</p></li>
<li id="fn20"><p>Check <a href="https://cdn1.sph.harvard.edu/wp-content/uploads/sites/343/2013/03/coda.pdf">the paper</a> for the details as the situation is slightly different to the one I’m sketching out here, but there’s no real substantive difference.↩︎</p></li>
<li id="fn21"><p>Of course, if this were true we could use a GP prior for the <img src="https://latex.codecogs.com/png.latex?%5Cmu_j">s and we’d probably get a decent estimator anyway.↩︎</p></li>
<li id="fn22"><p>If you want to interfere, there are plenty of ways to build priors that incorporate the <img src="https://latex.codecogs.com/png.latex?%5Cxi_j"> information. The <a href="https://projecteuclid.org/journals/statistical-science/volume-29/issue-4/The-Bayesian-Analysis-of-Complex-High-Dimensional-Models--Can/10.1214/14-STS483.full">Ritov etc paper</a> has nice references to the various things that sprung up from this example. Are these useful beyond simply making sure the posterior mean of <img src="https://latex.codecogs.com/png.latex?%5Cmu"> estimates <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(y)">? Not really. They are priors designed to solve exactly one problem.↩︎</p></li>
<li id="fn23"><p>I’m using the C/C++ ternary operator. In R this would be parsed as <code>ifelse(r[i] == 1, y[i], NA)</code>. ↩︎</p></li>
<li id="fn24"><p>Not exchangeable—there are no shared parameters!↩︎</p></li>
<li id="fn25"><p>Remember that <img src="https://latex.codecogs.com/png.latex?y%20%5Cmid%20x%20=%20j%20%5Csim%20N(%5Cmu_j,%201)">. If we wanted a more flexible variance, we could obviously have one, but it makes not real difference to anything.↩︎</p></li>
<li id="fn26"><p>I promise I’m just rolling my eyes to see if I can see my brain.↩︎</p></li>
<li id="fn27"><p>Remember everything is independent!↩︎</p></li>
<li id="fn28"><p>This is the posterior predictive distribution!↩︎</p></li>
<li id="fn29"><p>A true committed subjective Bayesian knows that DP stands for Dirichlet Process. No matter the context.↩︎</p></li>
<li id="fn30"><p>The variance is <img src="https://latex.codecogs.com/png.latex?v_j%5E%5Ctext%7Bpost%7D%20+%201"> because this is the posterior predictive distribution.↩︎</p></li>
<li id="fn31"><p>Does this seem like a frequentist question? I guess. But really it’s a question we can always ask about the posterior. Should we? Well if you are trying to estimate a population quantity you sort of have to. Because there isn’t really a concept of a population parameter within a Bayesian framework (true committed subjective or otherwise).↩︎</p></li>
<li id="fn32"><p>Remember that this means that the error (which is a random variable) goes to 0 as <img src="https://latex.codecogs.com/png.latex?n%5Crightarrow%20%5Cinfty">. A more careful person could probably work out how fast it would happen.↩︎</p></li>
<li id="fn33"><p>I only computed the mean, so feel free to pretend that I’m minimizing a loss function↩︎</p></li>
<li id="fn34"><p>Multilevel regression with poststratification, a survey modelling technique↩︎</p></li>
<li id="fn35"><p>No true Scotsman etc↩︎</p></li>
<li id="fn36"><p>or meta-Bayesian in the event that we are doing things like building a Bayesian pseudo-model of on the space of all considered model that just happens to give every model equal probability because Harold Fucking Jeffreys gave you an erection and you could either process that event like an adult or build a whole personality around it. And you chose the latter.↩︎</p></li>
<li id="fn37"><p>Can you tell that I hate this entire discussion?↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2022,
  author = {Simpson, Dan},
  title = {On That Example of {Robins} and {Ritov;} or {A} Sleeping Dog
    in Harbor Is Safe, but That’s Not What Sleeping Dogs Are For},
  date = {2022-11-15},
  url = {https://dansblog.netlify.app/posts/2022-11-12-robins-ritov/robins-ritov.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2022" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2022. <span>“On That Example of Robins and Ritov; or A
Sleeping Dog in Harbor Is Safe, but That’s Not What Sleeping Dogs Are
For.”</span> November 15, 2022. <a href="https://dansblog.netlify.app/posts/2022-11-12-robins-ritov/robins-ritov.html">https://dansblog.netlify.app/posts/2022-11-12-robins-ritov/robins-ritov.html</a>.
</div></div></section></div> ]]></description>
  <category>Fundamentals</category>
  <category>Survey sampling</category>
  <category>MRP</category>
  <category>Bayes</category>
  <guid>https://dansblog.netlify.app/posts/2022-11-12-robins-ritov/robins-ritov.html</guid>
  <pubDate>Mon, 14 Nov 2022 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2022-11-12-robins-ritov/misandrists.JPG" medium="image"/>
</item>
<item>
  <title>Priors for the parameters in a Gaussian process</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2022-09-07-priors5/priors5.html</link>
  <description><![CDATA[ 





<p>Long time readers will know that I bloody love a Gaussian process (GP). I wrote an <em>extremely detailed</em> post on the <a href="https://dansblog.netlify.app/posts/2021-11-03-yes-but-what-is-a-gaussian-process-or-once-twice-three-times-a-definition-or-a-descent-into-madness/yes-but-what-is-a-gaussian-process-or-once-twice-three-times-a-definition-or-a-descent-into-madness.html">various ways to define Gaussian processes</a>. And I did not do that because I just love inflicting Hilbert spaces on people. In fact, the only reason that I ever went beyond the standard operational definition of GPs that most people live their whole lives using is that I needed to.</p>
<p>Twice.</p>
<p>The first time was when I needed to understand approximation properties of a certain class of GPs. <a href="https://dansblog.netlify.app/posts/2021-11-24-getting-into-the-subspace/getting-into-the-subspace.html">I wrote a post about it</a>. It’s intense<sup>1</sup>.</p>
<p>The second time that I really needed to dive into their arcana and apocrypha<sup>2</sup> was when I foolishly asked the question <em>can we compute Penalised Complexity (PC) priors<sup>3</sup> <sup>4</sup> for Gaussian processes?</em>.</p>
<p>The answer was yes. But it’s a bit tricky.</p>
<p>So today I’m going to walk you through the ideas. There’s no real need to read the GP post before reading the first half of this one<sup>5</sup>, but it would be immensely useful to have at least glanced at the <a href="https://dansblog.netlify.app/posts/2022-08-29-priors4/priors4.html">post on PC priors</a>.</p>
<p>This post is <em>very</em> long, but that’s mostly because it tries to be reasonably self-contained. In particular, if you only care about the <a href="https://youtu.be/Z2HwloXqo_U?t=223">fat stuff</a>, you really only need to read the first part. After that there’s a long introduction to the theory of stationary Gaussian processes. All of this stuff is standard, but it’s hard to find collected in one place all of the things that I need to derive the PC prior. The third part actually derives the PC prior using a great deal of methods from the previous part.</p>
<section id="part-1-how-do-you-put-a-prior-on-parameters-of-a-gaussian-process" class="level2">
<h2 class="anchored" data-anchor-id="part-1-how-do-you-put-a-prior-on-parameters-of-a-gaussian-process">Part 1: How do you put a prior on parameters of a Gaussian process?</h2>
<p>We are in the situation where we have a model that looks something like this<sup>6</sup> <sup>7</sup> <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Ay_i%20%5Cmid%20%5Cbeta,%20u,%20%5Ctheta%20&amp;%5Csim%20p(y_i%20%5Cmid%20%5Cbeta,%20u,%20%5Cphi)%20%5C%5C%0Au(%5Ccdot)%20%5Cmid%20%5Ctheta%20&amp;%5Csim%20GP(0,%20c_%5Ctheta)%20%5C%5C%0A%5Cbeta,%20%5Cphi%20&amp;%5Csim%20p(%5Cbeta,%5Cphi),%0A%5Cend%7Balign*%7D"> where <img src="https://latex.codecogs.com/png.latex?c_%5Ctheta(%5Ccdot,%5Ccdot)"> is a covariance function with parameters <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> and we need to specify a joint prior on the GP parameters <img src="https://latex.codecogs.com/png.latex?%5Ctheta">.</p>
<p>The simplest case of this would be GP regression, but a key thing here is that, in general, the structure (or functional form) of the priors on <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> probably shouldn’t be too tightly tied to the specific likelihood. Why do I say that? Well the <em>scaling</em> of a GP should depend on information about the likelihood, but it’s less clear that anything else in the prior needs to know about the likelihood.</p>
<p>Now this view is predicated on us wanting to make an informative prior. In some very special cases, people with too much time on their hands have derived reference priors for specific models involving GPs. These priors care <em>deeply</em> about which likelihood you use. In fact, if you use them with a different model<sup>8</sup>, you may not end up with a proper<sup>9</sup> posterior. We will talk about those later.</p>
<p>To start, let’s look at the simplest way to build a PC prior. We will then talk about why this is not a good idea.</p>
<section id="a-first-crack-at-a-pc-prior" class="level3">
<h3 class="anchored" data-anchor-id="a-first-crack-at-a-pc-prior">A first crack at a PC prior</h3>
<p>As always, the best place to start is the simplest possible option. There’s always a hope<sup>10</sup> that we won’t need to pull out the big guns.</p>
<p>So what is the simplest solution? Well it’s to treat a GP as just a specific multivariate Gaussian distribution <img src="https://latex.codecogs.com/png.latex?%0Au%20%5Csim%20GP(0,%20%5Csigma%5E2R(%5Ctheta)),%0A"> where <img src="https://latex.codecogs.com/png.latex?R(%5Ctheta)"> is a correlation matrix.</p>
<p>The nice thing about a multivariate Gaussian is that we have a clean expression for its Kullback-Leibler divergence. Wikipedia tells us that for an <img src="https://latex.codecogs.com/png.latex?n">-dimensional multivariate Gaussian <img src="https://latex.codecogs.com/png.latex?%0A2%5Coperatorname%7BKL%7D(N(0,%20%5CSigma)%20%7C%7C%20N(0,%20%5CSigma_0))%20=%20%5Coperatorname%7Btr%7D%5Cleft(%5CSigma_0%5E%7B-1%7D%5CSigma%5Cright)%20+%20%5Clog%20%5Cdet%20%5CSigma_0%20-%20%5Clog%20%5Cdet%20%5CSigma-%20n.%0A"> To build a PC prior we need to consider a base model. That’s tricky in generality, but as we’ve assumed that the covariance matrix can be decomposed into the variance <img src="https://latex.codecogs.com/png.latex?%5Csigma%5E2"> and a correlation matrix <img src="https://latex.codecogs.com/png.latex?R(%5Ctheta)">, we can at least specify an easy base model for <img src="https://latex.codecogs.com/png.latex?%5Csigma">. As always, the simplest model is one with no GP in it, which corresponds to <img src="https://latex.codecogs.com/png.latex?%5Csigma_%5Ctext%7Bbase%7D%20=%200">. From here, we can follow the usual steps to specify the PC prior <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Csigma)%20=%20%5Clambda%20e%5E%7B-%5Clambda%20%5Csigma%7D,%0A"> where we choose <img src="https://latex.codecogs.com/png.latex?%5Clambda%20=%20%5Clog(%5Calpha)/U"> for some upper bound <img src="https://latex.codecogs.com/png.latex?U%3E0"> and some tail probability <img src="https://latex.codecogs.com/png.latex?0%3C%5Calpha%3C1"> so that <img src="https://latex.codecogs.com/png.latex?%0A%5CPr(%5Ctheta%20%3E%20U)%20=%20%5Calpha.%0A"> The specific choice of <img src="https://latex.codecogs.com/png.latex?U"> will depend on the context. For instance, if it’s logistic regression we probably want something like<sup>11</sup> <img src="https://latex.codecogs.com/png.latex?U=1">. If we have a GP on the log-mean of a Poisson distribution, then we probably want <img src="https://latex.codecogs.com/png.latex?U%20%3C%2021.5"> if you want the <em>mean</em> of the Poisson distribution to be less than the maximum integer<sup>12</sup> in R. In most data, you’re gonna want<sup>13</sup> <img src="https://latex.codecogs.com/png.latex?U%5Cll%205">. If the GP is on the mean of a normal distribution, the choice of <img src="https://latex.codecogs.com/png.latex?U"> will depend on the context and scaling of the data.</p>
<p>Without more assumptions about the form of the covariance function, it is impossible to choose a base model for the other parameters <img src="https://latex.codecogs.com/png.latex?%5Ctheta">.</p>
<p>That said, there is one special case that’s important: the case where <img src="https://latex.codecogs.com/png.latex?%5Csigma%20=%20%5Cell"> is a single parameter controlling the intrinsic length scale, that is the distance at which the correlation between two points <img src="https://latex.codecogs.com/png.latex?%5Cell"> units apart is approximately zero. The larger <img src="https://latex.codecogs.com/png.latex?%5Cell"> is, the more correlated observations of the GP are and, hence, the less wiggly its realisation is. On the other hand, as <img src="https://latex.codecogs.com/png.latex?%5Cell%20%5Crightarrow%200">, the observations GP often behaves like realisations from an iid Gaussian and the GP becomes<sup>14</sup> wilder and wilder.</p>
<p>This suggests that a good base model for the length-scale parameter would be <img src="https://latex.codecogs.com/png.latex?%5Cell_0%20=%20%5Cinfty">. We note that if both the base model and the alternative have the same value of <img src="https://latex.codecogs.com/png.latex?%5Csigma">, then it cancels out in the KL-divergence. Under this assumption, we get that <img src="https://latex.codecogs.com/png.latex?%0Ad(%5Cell%20%5Cmid%20%5Csigma)%20=%20%5Ctext%7B%60%60%7D%5Clim_%7B%5Cell_0%5Crightarrow%20%5Cinfty%7D%5Ctext%7B''%7D%20%5Csqrt%7B%5Coperatorname%7Btr%7D%5Cleft(R(%5Cell_0)%5E%7B-1%7DR(%5Cell)%5Cright)%20%20-%20%5Clog%20%5Cdet%20R(%5Cell)%20+%20%5Clog%20%5Cdet%20R(%5Cell_0)%20-%20n%7D,%0A"> where I’m being a bit cheeky putting that limit in, as we might need to do some singular model jiggery-pokery of the same type we needed to do <a href="https://dansblog.netlify.app/posts/2022-08-29-priors4/priors4.html#the-speed-of-a-battered-sav-proximity-to-the-base-model">for the standard deviation</a>. We will formalise this, I promise.</p>
<p>As the model gets more complex as the length scale decreases, we want our prior to control the smallest value <img src="https://latex.codecogs.com/png.latex?%5Cell"> can take. This suggests we want to choose <img src="https://latex.codecogs.com/png.latex?%5Clambda"> to ensure <img src="https://latex.codecogs.com/png.latex?%0A%5CPr(%5Cell%20%3C%20L)%20=%20%5Calpha.%0A"> How do we choose the lower bound <img src="https://latex.codecogs.com/png.latex?L">? One idea is that our prior should have very little probability of the length scale being smaller than the length-scale of the data. So we can chose <img src="https://latex.codecogs.com/png.latex?L"> to be the smallest distance between observations (if the data is regularly spaced) or as a low quantile of the distribution of distances between nearest neighbours.</p>
<p>All of this will specify a PC prior for a Gaussian process. So let’s now discuss why that prior is a bit shit.</p>
</section>
<section id="whats-bad-about-this" class="level3">
<h3 class="anchored" data-anchor-id="whats-bad-about-this">What’s bad about this?</h3>
<p>The prior on the standard deviation is fine.</p>
<p>The prior on the length scale is more of an issue. There are a couple of bad things about this prior. The first one might seem innocuous at first glance. We decided to treat the GP as a multivariate Gaussian with covariance matrix <img src="https://latex.codecogs.com/png.latex?%5Csigma%5E2%20R(%5Ctheta)">. This is not a neutral choice. In order to do it, we need to <em>commit</em> to a certain set of observation locations<sup>15</sup>. Why? The matrix <img src="https://latex.codecogs.com/png.latex?R(%5Ctheta)"> depends entirely on the observation locations and if we use this matrix to define the prior we are tied to those locations.</p>
<p>This means that if we change the amount of data in the model we will need to change the prior. This is going to play havoc<sup>16</sup> on any sort of cross-validation! It’s worth saying that the other two sources of information (the minimum length scale and the upper bound on <img src="https://latex.codecogs.com/png.latex?%5Csigma">) are not nearly as sensitive to small changes in the data. This information is, in some sense, fundamental to the problem at hand and, therefore, much more stable ground to build your prior upon.</p>
<p>There’s another problem, of course: this prior is expensive to compute. The KL divergence involves computing <img src="https://latex.codecogs.com/png.latex?%5Coperatorname%7Btr%7D(R(%5Cell_0)%5E%7B-1%7DR(%5Cell))"> which costs as much as another log-density evaluation for the Gaussian process (which is to say it’s very expensive).</p>
<p>So this prior is going to be <em>deeply</em> inconvenient if we have varying amounts of data (through cross-validation or sequential data gathering). It’s also going to be wildly more computationally expensive than you expect a one-dimensional prior to be.</p>
<p>All in all, it seems a bit shit.</p>
</section>
<section id="the-matérn-covariance-function" class="level3">
<h3 class="anchored" data-anchor-id="the-matérn-covariance-function">The Matérn covariance function</h3>
<p>It won’t be possible to derive a prior for a general Gaussian process, so we are going to need to make some simplifying assumptions. The assumption that we are going to make is that the covariance comes from the Whittle-Matérn<sup>17</sup> <sup>18</sup> class <img src="https://latex.codecogs.com/png.latex?%0Ac(s,%20s')%20=%20%5Csigma%5E2%20%5Cfrac%7B2%5E%7B1-%5Cnu%7D%7D%7B%5CGamma(%5Cnu)%7D%5Cleft(%5Csqrt%7B8%5Cnu%7D%5Cfrac%7B%5C%7Cs-s'%5C%7C%7D%7B%5Cell%7D%5Cright)%5E%5Cnu%20K_%5Cnu%5Cleft(%5Csqrt%7B8%5Cnu%7D%5Cfrac%7B%5C%7Cs-s'%5C%7C%7D%7B%5Cell%7D%5Cright),%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Cnu"> is the <em>smoothness</em> parameter, <img src="https://latex.codecogs.com/png.latex?%5Cell"> is the <em>length-scale</em> parameter, <img src="https://latex.codecogs.com/png.latex?%5Csigma"> is the <em>marginal standard deviation</em>, and <img src="https://latex.codecogs.com/png.latex?%0AK_%5Cnu(x)%20=%20%5Cint_0%5E%5Cinfty%20e%5E%7B-x%5Ccosh%20t%7D%5Ccosh(%5Cnu%20t)%5C,dt%0A"> is the modified Bessel<sup>19</sup> function of the second kind.</p>
<p>This class of covariance function is extremely important in practice. It interpolates between two of the most common covariance functions:</p>
<ul>
<li>when <img src="https://latex.codecogs.com/png.latex?%5Cnu%20=%201/2">, it corresponds to the exponential covariance function,</li>
<li>when <img src="https://latex.codecogs.com/png.latex?%5Cnu%20=%20%5Cinfty">, it corresponds to the squared exponential covariance.</li>
</ul>
<p>There are years of experience suggesting that Matérn covariance functions with finite <img src="https://latex.codecogs.com/png.latex?%5Cnu"> will often perform better than the squared exponential covariance.</p>
<p>Common practice is to fix<sup>20</sup> the value of <img src="https://latex.codecogs.com/png.latex?%5Cnu">. There are a few reasons for this. One of the most compelling practical reasons is that we can’t easily evaluate its derivative, which rules out most modern optimisation and MCMC algorithms. It’s also <em>very</em> difficult to think about how you would set a prior on it. The techniques in this post will not help, and as far as I’ve ever been able to tell, nothing else will either. Finally, you could expect there to be <em>horrible</em> confounding between <img src="https://latex.codecogs.com/png.latex?%5Cnu">, <img src="https://latex.codecogs.com/png.latex?%5Cell">, and <img src="https://latex.codecogs.com/png.latex?%5Csigma">, which will make inference very hard (both numerically and morally).</p>
<p>It turns out that even with <img src="https://latex.codecogs.com/png.latex?%5Cnu"> fixed, we will run into a few problems. But to understand those, we are going to need to know a bit more about how inferring parameters in a Gaussian processes actually works.</p>
<p>Just for future warning, I will occasionally refer to a GP with a Matérn covariance function as a “Matérn field”<sup>21</sup>.</p>
</section>
<section id="asymptotic-i-barely-know-her" class="level3">
<h3 class="anchored" data-anchor-id="asymptotic-i-barely-know-her">Asymptotic? I barely know her!</h3>
<p>Let’s take a brief detour into classical inference for a moment and ask ourselves <em>when can we recover the parameters of a Gaussian process</em>? For most models we run into in statistics, the answer to that question is <em>when we get enough data</em>. But for Gaussian processes, the story is more complex.</p>
<p>First of all, there is the very real question of what we mean by getting more data. When our observations are iid, this so easy that when asked how she got more data, Kylie just said she <a href="https://www.youtube.com/watch?v=jDKPvy-ZXC8">“did it again”</a>.</p>
<p>But this is more complex once data has dependence. For instance, in a multilevel model you could have the number of groups staying fixed while the number of observations in each group goes to infinity, you could have the number of observations in each group staying fixed while the number of groups go to infinity, or you could have both<sup>22</sup> going to infinity.</p>
<p>For Gaussian processes it also gets quite complicated. Here is a non-exhaustive list of options:</p>
<ul>
<li>You observe the same realisation of the GP at an increasing number of points that eventually cover the <em>whole of</em> <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5Ed"> (this is called the <em>increasing domain</em> or <em>outfill</em> regime); or</li>
<li>You observe the same realisation of the GP at an increasing number of points <em>that stay within a fixed domain</em> (this is called the <em>fixed domain</em> or <em>infill</em> regime); or</li>
<li>You observe multiple realisations of the same GP at a finite number of points that stay in the same location (this does not have a name, in space-time it’s sometimes called <em>monitoring data</em>); or</li>
<li>You observe multiple realisations of the same GP at a (possibly different) finite number of points that can be in different locations for different realisations; or</li>
<li>You observe realisations of a process that evolves in space <em>and</em> time (not really a different regime so much as a different problem).</li>
</ul>
<p>One of the truly unsettling things about Gaussian processes is that the ability to estimate the parameters depends on which of these regimes you choose!</p>
<p>Of course, we all know that asymptotic regimes are just polite fantasies that statisticians concoct in order to self-soothe. They are not reflections on reality. They serve approximately the same purpose<sup>23</sup> as watching a chain of Law and Order episodes.</p>
<p>The point of thinking about what happens when we get more data is to use it as a loose approximation of what happens with the data you have. So the real question is <em>which regime is the most realistic for my data</em>?.</p>
<p>One way you can approach this question is to ask yourself what you would do if you had the budget to get more data. My work has mostly been in spatial statistics, in which case the answer is <em>usually</em><sup>24</sup> that you would sample more points in the same area. This suggests that fixed-domain asymptotics is a good fit for my needs. I’d expect that in most GP regression cases, we’re not expecting<sup>25</sup> that further observations would be on new parts of the covariate space, which would suggest fixed-domain asymptotics are useful there too.</p>
<p>This, it turns out, is awkward.</p>
</section>
<section id="when-is-a-parameter-not-consistently-estimatable-an-aside-that-will-almost-immediately-become-relevant" class="level3">
<h3 class="anchored" data-anchor-id="when-is-a-parameter-not-consistently-estimatable-an-aside-that-will-almost-immediately-become-relevant">When is a parameter not consistently estimatable: an aside that will almost immediately become relevant</h3>
<p>The problem with a GP with the Matérn covariance function on a fixed domain is that it’s not possible<sup>26</sup> to estimate all of its parameters at the same time. This isn’t the case for the other asymptotic regimes, but you’ve got to dance with who you came to the dance with.</p>
<p>To make this more concrete, we need to think about a Gaussian process as a realisation of a function rather than as a vector of observations. Why? Because under fixed-domain asymptotics we are seeing values of the function closer and closer together until we essentially see the entire function on that domain.</p>
<p>Of course, this is why I wrote <a href="https://dansblog.netlify.app/posts/2021-11-03-yes-but-what-is-a-gaussian-process-or-once-twice-three-times-a-definition-or-a-descent-into-madness/yes-but-what-is-a-gaussian-process-or-once-twice-three-times-a-definition-or-a-descent-into-madness.html">a long and technical blog post</a> on understanding Gaussian processes as random functions. But don’t worry. You don’t need to have read that part.</p>
<p>The key thing is that because a GP is a function, we need to think of it’s probability of being in a set <img src="https://latex.codecogs.com/png.latex?A"> of functions. There will be a set of function <img src="https://latex.codecogs.com/png.latex?%5Coperatorname%7Bsupp%7D(u)">, which we call the <em>support</em> of <img src="https://latex.codecogs.com/png.latex?u(%5Ccdot)">, that is the smallest set such that <img src="https://latex.codecogs.com/png.latex?%0A%5CPr(u(%5Ccdot)%20%5Cin%20%5Coperatorname%7Bsupp%7D(u))%20=%201.%0A"> Every GP has an associated support and, while you probably don’t think much about it, GPs are <em>obsessed</em> with their supports. They love them. They hug them. They share them with their friends. They keep them from their enemies. And they are one of the key things that we need to think about in order to understand why it’s hard to estimate parameters in a Matérn covariance function.</p>
<p>There is a key theorem that is unique<sup>27</sup> to Gaussian processes. It’s usually phrased in terms of <em>Gaussian measures</em>, which are just the probability associated with a GP. For example, if <img src="https://latex.codecogs.com/png.latex?u_1(%5Ccdot)"> is a GP then <img src="https://latex.codecogs.com/png.latex?%0A%5Cmu_1(A)%20=%20%5CPr(u_1(%5Ccdot)%20%5Cin%20A)%0A"> is the corresponding Gaussian measure. We can express the support of <img src="https://latex.codecogs.com/png.latex?u(%5Ccdot)"> as the smallest set of functions such that <img src="https://latex.codecogs.com/png.latex?%5Cmu(A)=1">.</p>
<div id="thm-singular-equiv" class="theorem">
<p><span class="theorem-title"><strong>Theorem 1 (Feldman-Hájek theorem)</strong></span> Two Gaussian measures <img src="https://latex.codecogs.com/png.latex?%5Cmu_1"> and <img src="https://latex.codecogs.com/png.latex?%5Cmu_2"> with corresponding GPs <img src="https://latex.codecogs.com/png.latex?u_1(%5Ccdot)"> and <img src="https://latex.codecogs.com/png.latex?u_2(%5Ccdot)"> on a locally convex space<sup>28</sup> either satisfy, for every<sup>29</sup> set <img src="https://latex.codecogs.com/png.latex?A">,<br>
<img src="https://latex.codecogs.com/png.latex?%0A%5Cmu_2(A)%20%3E%200%20%5CRightarrow%20%5Cmu_1(A)%20%3E%200%20%5Ctext%7B%20and%20%7D%20%5Cmu_1(A)%20%3E%200%20%5CRightarrow%20%5Cmu_2(A)%20%3E%200,%0A"> in which case we say that <img src="https://latex.codecogs.com/png.latex?%5Cmu_1"> and <img src="https://latex.codecogs.com/png.latex?%5Cmu_2"> are <em>equivalent</em><sup>30</sup> (confusingly<sup>31</sup> written <img src="https://latex.codecogs.com/png.latex?%5Cmu_1%20%5Cequiv%20%5Cmu_2">) and <img src="https://latex.codecogs.com/png.latex?%5Coperatorname%7Bsupp%7D(u_1)%20=%20%5Coperatorname%7Bsupp%7D(u_2)">, <strong>or</strong> <img src="https://latex.codecogs.com/png.latex?%0A%5Cmu_2(A)%20%3E%200%20%5CRightarrow%20%5Cmu_1(A)%20=%200%20%5Ctext%7B%20and%20%7D%20%5Cmu_1(A)%20%3E%200%20%5CRightarrow%20%5Cmu_2(A)%20=%200,%0A"> in which case we say <img src="https://latex.codecogs.com/png.latex?%5Cmu_1"> and <img src="https://latex.codecogs.com/png.latex?%5Cmu_2"> are <em>singular</em> (written <img src="https://latex.codecogs.com/png.latex?%5Cmu_1%20%5Cperp%20%5Cmu_2">) and <img src="https://latex.codecogs.com/png.latex?u_1(%5Ccdot)"> and <img src="https://latex.codecogs.com/png.latex?u_2(%5Ccdot)"> have disjoint supports.</p>
</div>
<p>Later on in the post, we will see some precise conditions for when two Gaussian measures are equivalent, but for now it’s worth saying that it is a <em>very</em> delicate property. In fact, if <img src="https://latex.codecogs.com/png.latex?u_2(%5Ccdot)%20=%20%5Calpha%20u_1(%5Ccdot)"> for any <img src="https://latex.codecogs.com/png.latex?%7C%5Calpha%7C%5Cneq%201">, then<sup>32</sup> <img src="https://latex.codecogs.com/png.latex?%5Cmu_1%20%5Cperp%20%5Cmu_2">!</p>
<p>This seems like it will cause problems. And it can<sup>33</sup>. But it’s <em>fabulous</em> for inference.</p>
<p>To see this, we can use one of the implications of singularity: <img src="https://latex.codecogs.com/png.latex?%5Cmu_1%20%5Cperp%20%5Cmu_2"> if and only if <img src="https://latex.codecogs.com/png.latex?%0A%5Coperatorname%7BKL%7D(u_1(%5Ccdot)%20%7C%7C%20u_2(%5Ccdot))%20=%20%5Cinfty,%0A"> where the the Kullback-Leibler divergence can be interpreted as the expectation of the likelihood ratio of <img src="https://latex.codecogs.com/png.latex?u_1"> vs <img src="https://latex.codecogs.com/png.latex?u_2"> under <img src="https://latex.codecogs.com/png.latex?u_1">. Hence, if <img src="https://latex.codecogs.com/png.latex?u_1(%5Ccdot)"> and <img src="https://latex.codecogs.com/png.latex?u_2(%5Ccdot)"> are singular, we can (on average) choose the correct one using a likelihood ratio test. This means that we will be able to correctly recover the true<sup>34</sup> parameter.</p>
<p>It turns out the opposite is also true.</p>
<div id="thm-strong-neg" class="theorem">
<p><span class="theorem-title"><strong>Theorem 2</strong></span> If <img src="https://latex.codecogs.com/png.latex?%5Cmu_%5Ctheta">, <img src="https://latex.codecogs.com/png.latex?%5Ctheta%20%5Cin%20%5CTheta"> is a family of Gaussian measures corresponding to the GPs <img src="https://latex.codecogs.com/png.latex?u_%5Ctheta(%5Ccdot)"> and <img src="https://latex.codecogs.com/png.latex?%5Cmu_%5Ctheta%20%5Cequiv%20%5Cmu_%7B%5Ctheta'%7D"> for all values of <img src="https://latex.codecogs.com/png.latex?%5Ctheta,%20%5Ctheta'%20%5Cin%20%5CTheta">, then there is <em>no</em> sequence of estimators <img src="https://latex.codecogs.com/png.latex?%5Chat%20%5Ctheta_n"> such that, for all <img src="https://latex.codecogs.com/png.latex?%5Ctheta_0%20%5Cin%20%5CTheta"> <img src="https://latex.codecogs.com/png.latex?%0A%7B%5CPr%7D_%7B%5Ctheta_0%7D(%5Chat%20%5Ctheta_n%20%5Crightarrow%20%5Ctheta_0)%20=%201,%0A"> where <img src="https://latex.codecogs.com/png.latex?%7B%5CPr%7D_%7B%5Ctheta_0%7D(%5Ccdot)"> is the probability under data drawn with true parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta_0">. That is, there is no estimator <img src="https://latex.codecogs.com/png.latex?%5Chat%20%5Ctheta_n"> that is (strongly) consistent for all <img src="https://latex.codecogs.com/png.latex?%5Ctheta%20%5Cin%20%5CTheta">.</p>
</div>
<details>
<summary>
Click for a surprise (the proof. shit i spoiled the surprise)
</summary>
<div class="proof">
<p><span class="proof-title"><em>Proof</em>. </span>We are going to do this by contradiction. So assume that there is a sequence such that <img src="https://latex.codecogs.com/png.latex?%0A%5CPr%7B_%7B%5Ctheta_0%7D%7D(%5Chat%20%5Ctheta_n%20%5Crightarrow%20%5Ctheta_0)%20=%201.%0A"> For some <img src="https://latex.codecogs.com/png.latex?%5Cepsilon%20%3E0">, let <img src="https://latex.codecogs.com/png.latex?A_n%20=%20%5C%7B%5C%7C%5Chat%5Ctheta_n%20-%20%5Ctheta_0%5C%7C%3E%5Cepsilon%5C%7D">. Then we can re-state our almost sure convergence as <img src="https://latex.codecogs.com/png.latex?%0A%5CPr%7B_%7B%5Ctheta_0%7D%7D%5Cleft(%5Climsup_%7Bn%5Crightarrow%20%5Cinfty%7DA_n%5Cright)%20=%200,%0A"> where the limit superior is defined<sup>35</sup> as <img src="https://latex.codecogs.com/png.latex?%0A%5Climsup_%7Bn%5Crightarrow%20%5Cinfty%7DA_n%20=%20%5Cbigcap_%7Bn=1%7D%5E%5Cinfty%20%5Cleft(%5Cbigcup_%7Bm=n%7D%5E%5Cinfty%20A_n%5Cright).%0A"></p>
<p>For any <img src="https://latex.codecogs.com/png.latex?%5Ctheta'%20%5Cneq%20%5Ctheta_0"> with <img src="https://latex.codecogs.com/png.latex?%5Cmu_%7B%5Ctheta'%7D%20%5Cequiv%20%5Cmu_%7B%5Ctheta_0%7D">, the definition of equivalent measures tells us that <img src="https://latex.codecogs.com/png.latex?%0A%5CPr%7B_%7B%5Ctheta'%7D%7D%5Cleft(%5Climsup_%7Bn%5Crightarrow%20%5Cinfty%7DA_n%5Cright)%20=%200%0A"> and therefore <img src="https://latex.codecogs.com/png.latex?%0A%5CPr%7B_%7B%5Ctheta'%7D%7D%5Cleft(%5Chat%20%5Ctheta_n%20%5Crightarrow%20%5Ctheta_0%5Cright)%20=%201.%0A"> The problem with this is that is that this data is generated using <img src="https://latex.codecogs.com/png.latex?u_%7B%5Ctheta'%7D">, but the estimator converges to <img src="https://latex.codecogs.com/png.latex?%5Ctheta_0"> instead of <img src="https://latex.codecogs.com/png.latex?%5Ctheta'">. Hence, the estimator isn’t uniformly (strongly) consistent.</p>
</div>
</details>
<p>This seems bad but, you know, it’s a pretty strong version of convergence. And sometimes our brothers and sisters in Christ who are more theoretically minded like to give themselves a treat and consider weaker forms of convergence. It turns out that that’s a disaster too.</p>
<div id="thm-weak-neg" class="theorem">
<p><span class="theorem-title"><strong>Theorem 3</strong></span> If <img src="https://latex.codecogs.com/png.latex?%5Cmu_%5Ctheta">, <img src="https://latex.codecogs.com/png.latex?%5Ctheta%20%5Cin%20%5CTheta"> is a family of Gaussian measures corresponding to the GPs <img src="https://latex.codecogs.com/png.latex?u_%5Ctheta(%5Ccdot)"> and <img src="https://latex.codecogs.com/png.latex?%5Cmu_%5Ctheta%20%5Cequiv%20%5Cmu_%7B%5Ctheta'%7D"> for all values of <img src="https://latex.codecogs.com/png.latex?%5Ctheta,%20%5Ctheta'%20%5Cin%20%5CTheta">, then there is <em>no</em> sequence of estimators <img src="https://latex.codecogs.com/png.latex?%5Chat%20%5Ctheta_n"> such that, for all <img src="https://latex.codecogs.com/png.latex?%5Ctheta_0%20%5Cin%20%5CTheta"> and all <img src="https://latex.codecogs.com/png.latex?%5Cepsilon%20%3E%200"> <img src="https://latex.codecogs.com/png.latex?%0A%5Clim_%7Bn%5Crightarrow%20%5Cinfty%7D%7B%5CPr%7D_%7B%5Ctheta_0%7D(%5C%7C%5Chat%20%5Ctheta_n%20-%20%5Ctheta_0%5C%7C%20%3E%20%5Cepsilon)%20=%200.%0A"> That is there is no estimator <img src="https://latex.codecogs.com/png.latex?%5Chat%20%5Ctheta_n"> that is (weakly) consistent for all <img src="https://latex.codecogs.com/png.latex?%5Ctheta%20%5Cin%20%5CTheta">.</p>
</div>
<p>If you can’t tell the difference between these two theorems that’s ok. You probably weren’t trying to sublimate some childhood trauma and all of your sexual energy into maths just so you didn’t have to deal with the fact that you might be gay and you were pretty sure that wasn’t an option and anyway it’s not like it’s <em>that</em> important. Like whatever, you don’t need physical or emotional intimacy. You’ve got a pile of books on measure theory next to your bed. You are living your best life. Anyway. It makes almost no practical difference. BUT I WILL PROVE IT ANYWAY.</p>
<details>
<summary>
Once more, into the proof.
</summary>
<div class="proof">
<p><span class="proof-title"><em>Proof</em>. </span>This proof is based on a kinda advanced fact, which involves every mathematician’s favourite question: what happens along a sub-sequence?</p>
<div class="callout callout-style-default callout-note no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Probability Fact!
</div>
</div>
<div class="callout-body-container callout-body">
<p>If <img src="https://latex.codecogs.com/png.latex?%5Chat%20%5Ctheta_n"> converges to <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> in probability, then there exists an infinite sub-sequence <img src="https://latex.codecogs.com/png.latex?%5Chat%20%5Ctheta_%7Bn_k%7D">, where <img src="https://latex.codecogs.com/png.latex?n_k%20%5Crightarrow%20%5Cinfty"> as <img src="https://latex.codecogs.com/png.latex?k%20%5Crightarrow%20%5Cinfty">, such that <img src="https://latex.codecogs.com/png.latex?%5Chat%20%5Ctheta_%7Bn_k%7D"> converges to <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> with probability one (or almost surely).</p>
</div>
</div>
<p>This basically says that the two modes of convergence are quite similar except convergence in probability is relaxed enough to have some<sup>36</sup> values that aren’t doing so good at the whole converging thing.</p>
<p>With this in hand, let us build a contradiction. Assume that <img src="https://latex.codecogs.com/png.latex?%5Chat%20%5Ctheta_n"> is weakly consistent for all <img src="https://latex.codecogs.com/png.latex?%5Ctheta%20%5Cin%20%5CTheta">. Then, if we generate data under <img src="https://latex.codecogs.com/png.latex?%5Cmu_%7B%5Ctheta_0%7D">, then we get that, along a sub-sequence <img src="https://latex.codecogs.com/png.latex?n_k"> <img src="https://latex.codecogs.com/png.latex?%0A%5CPr%7B_%7B%5Ctheta_0%7D%7D(%5Chat%20%5Ctheta_%7Bn_k%7D%20%5Crightarrow%20%5Ctheta_0)%20=1.%0A"></p>
<p>Now, if <img src="https://latex.codecogs.com/png.latex?%5Chat%20%5Ctheta_n"> is weakly consistent for all <img src="https://latex.codecogs.com/png.latex?%5Ctheta">, then so is <img src="https://latex.codecogs.com/png.latex?%5Chat%20%5Ctheta_%7Bn_k%7D">. Then, by our assumption, for every <img src="https://latex.codecogs.com/png.latex?%5Ctheta'%20%5Cin%20%5CTheta"> and every <img src="https://latex.codecogs.com/png.latex?%5Cepsilon%3E0"> <img src="https://latex.codecogs.com/png.latex?%0A%5Clim_%7Bk%20%5Crightarrow%20%5Cinfty%7D%20%5CPr%7B_%7B%5Ctheta'%7D%7D%5Cleft(%5C%7C%5Chat%20%5Ctheta_%7Bn_k%7D%20-%20%5Ctheta'%5C%7C%20%3E%20%5Cepsilon%5Cright)%20=%200.%0A"></p>
<p>Our probability fact tells us that there is a <em>further</em> infinite sub-sub-sequence <img src="https://latex.codecogs.com/png.latex?n_%7Bk_%5Cell%7D"> such that <img src="https://latex.codecogs.com/png.latex?%0A%5CPr%7B_%7B%5Ctheta'%7D%7D%5Cleft(%5Chat%20%5Ctheta_%7Bn_%7Bk_%5Cell%7D%7D%20%5Crightarrow%20%5Ctheta'%5Cright)%20=%201.%0A"> But Theorem&nbsp;2 tells us that <img src="https://latex.codecogs.com/png.latex?%5Chat%20%5Ctheta_%7Bn_k%7D"> (and hence <img src="https://latex.codecogs.com/png.latex?%5Ctheta_%7Bn_%7Bk_l%7D%7D">) satisfies <img src="https://latex.codecogs.com/png.latex?%0A%5CPr%7B_%7B%5Ctheta'%7D%7D%5Cleft(%5Chat%20%5Ctheta_%7Bn_%7Bk_%5Cell%7D%7D%20%5Crightarrow%20%5Ctheta_0%5Cright)%20=%201.%0A"> This is a contradiction unless <img src="https://latex.codecogs.com/png.latex?%5Ctheta'=%20%5Ctheta_0">, which proves the assertion.</p>
</div>
</details>
</section>
<section id="matérn-fields-under-fixed-domain-asymptotics-the-love-that-dares-not-speak-its-name" class="level3">
<h3 class="anchored" data-anchor-id="matérn-fields-under-fixed-domain-asymptotics-the-love-that-dares-not-speak-its-name">Matérn fields under fixed domain asymptotics: the love that dares not speak its name</h3>
<p>All of that lead up immediately becomes extremely relevant once we learn one thing about Gaussian processes with Matérn covariance functions.</p>
<div id="thm-matern-sing" class="theorem">
<p><span class="theorem-title"><strong>Theorem 4</strong></span> Let <img src="https://latex.codecogs.com/png.latex?%5Cmu_%7B%5Cnu,%20%5Csigma,%20%5Cell%7D"> be the Gaussian measure corresponding to the GP with Matérn covariance function with parameters <img src="https://latex.codecogs.com/png.latex?(%5Cnu,%20%5Csigma,%20%5Cell)">, let <img src="https://latex.codecogs.com/png.latex?D"> be any finite domain in <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5Ed">, and let <img src="https://latex.codecogs.com/png.latex?d%20%5Cleq%203">. Then, restricted to <img src="https://latex.codecogs.com/png.latex?D">, <img src="https://latex.codecogs.com/png.latex?%0A%5Cmu_%7B%5Cnu,%5Csigma_1,%20%5Cell_1%7D%20%5Cequiv%20%5Cmu_%7B%5Cnu,%20%5Csigma_2,%20%5Cell_2%7D%0A"> if and only if <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Csigma_1%5E2%7D%7B%5Cell_1%5E%7B2%5Cnu%7D%7D%20=%20%5Cfrac%7B%5Csigma_2%5E2%7D%7B%5Cell_2%5E%7B2%5Cnu%7D%7D.%0A"></p>
</div>
<p>I’ll go through the proof of this later, but the techniques require a lot of warm up, so let’s just deal with the consequences for now.</p>
<p>Basically, Theorem&nbsp;4 says that we can’t consistently estimate the range and the marginal standard deviation for a one, two, or three dimensional Gaussian process. <a href="https://www.stat.purdue.edu/~zhanghao/Paper/JASA2004.pdf">Hao Zhang noted this</a> and that it remains true<sup>37</sup> when dealing with non-Gaussian data.</p>
<p>The good news, I guess, is that in more than four<sup>38</sup> dimensions the measures are always singular.</p>
<p>Now, I don’t give one single solitary shit about the existence of consistent estimators. I am doing Bayesian things and this post is supposed to be about setting prior distributions. But it is important. Let’s take a look at some simulations.</p>
<p>First up, let’s look at what happens in 2D when we directly (ie with no noise) observe a zero-mean GP with exponential covariance function (<img src="https://latex.codecogs.com/png.latex?%5Cnu%20=%201/2">) at points in the unit square. In this case, the log-likelihood is, up to an additive constant, <img src="https://latex.codecogs.com/png.latex?%0A%5Clog%20p(y%20%5Cmid%20%5Ctheta)%20=%20-%5Cfrac%7B1%7D%7B2%7D%5Clog%20%7C%5CSigma(%5Ctheta)%7C%20-%20%5Cfrac%7B1%7D%7B2%7Dy%5ET%5CSigma(%5Ctheta)%5E%7B-1%7Dy.%0A"></p>
<p>The R code is not pretty but I’m trying to be relatively efficient with my Cholesky factors.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">24601</span>)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-3">cov_fun <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> \(h,sigma, ell) sigma<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>h<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>ell)</span>
<span id="cb1-4"></span>
<span id="cb1-5">log_lik <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(sigma, ell, y, h) {</span>
<span id="cb1-6">  V <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cov_fun</span>(h, sigma, ell)</span>
<span id="cb1-7">  R <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chol</span>(V)</span>
<span id="cb1-8">  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">diag</span>(R))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">backsolve</span>(R, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">backsolve</span>(R, y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">transpose =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)))</span>
<span id="cb1-9">}</span></code></pre></div>
</div>
<p>We can now simulate 500 data points on the unit square, compute their distances, and simulate from the GP.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb2-1">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">500</span></span>
<span id="cb2-2">dat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s1 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">runif</span>(n), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s2 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">runif</span>(n), </span>
<span id="cb2-3">              <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dist_mat =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dist</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cbind</span>(s1,s2))),</span>
<span id="cb2-4">              <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> MASS<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mvrnorm</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mu=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,n), </span>
<span id="cb2-5">                      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Sigma =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cov_fun</span>(dist_mat, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>)))</span></code></pre></div>
</div>
<p>With all of this in hand, let’s look at the likelihood surface along<sup>39</sup> the line <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Csigma%5E2%7D%7B%5Cell%7D%20=%20c%0A"> for various values of <img src="https://latex.codecogs.com/png.latex?c">. I’m using some <code>purrr</code> trickery<sup>40</sup> here to deal with the fact that sometimes the Cholesky factorisation will throw an error.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb3-1">m <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span></span>
<span id="cb3-2">f_direct <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">partial</span>(log_lik, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">h =</span> dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>dist_mat)</span>
<span id="cb3-3"></span>
<span id="cb3-4">pars <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> \(c) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ell =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length.out =</span> m),</span>
<span id="cb3-5">                    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sigma =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(c <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> ell), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">c =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(c, m))</span>
<span id="cb3-6"></span>
<span id="cb3-7"> ll <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_df</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>,pars) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">contour =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(c), </span>
<span id="cb3-9">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ll =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map2_dbl</span>(sigma, ell, </span>
<span id="cb3-10">                       <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">possibly</span>(f_direct, </span>
<span id="cb3-11">                                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">otherwise =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA_real_</span>)))</span>
<span id="cb3-12"></span>
<span id="cb3-13"></span>
<span id="cb3-14">ll <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(ell, ll, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> contour)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb3-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_brewer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">palette =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Set1"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>()</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-07-priors5/priors5_files/figure-html/unnamed-chunk-3-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>We can see the same thing in 2D (albeit at a lower resolution for computational reasons). I’m also not computing a bunch of values that I know will just be massively negative.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb4-1">f_trim <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> \(sigma, ell) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(sigma<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>ell <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> sigma<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>ell,</span>
<span id="cb4-2">                               <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA_real_</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f_direct</span>(sigma, ell))</span>
<span id="cb4-3">m <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span></span>
<span id="cb4-4">surf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expand_grid</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ell =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length.out =</span> m),</span>
<span id="cb4-5">                    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sigma =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length.out =</span> m)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ll =</span>  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map2_dbl</span>(sigma, ell, </span>
<span id="cb4-7">                       <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">possibly</span>(f_trim, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">otherwise =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA_real_</span>)))</span>
<span id="cb4-8"></span>
<span id="cb4-9">surf <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(ll <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(ell, sigma, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> ll)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb4-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_raster</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_fill_viridis_c</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>()</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-07-priors5/priors5_files/figure-html/unnamed-chunk-4-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Clearly there is a ridge in the likelihood surface, which suggests that our posterior is going to be driven by the prior along that ridge.</p>
<p>For completeness, let’s run the same experiment again when we have some known observation noise, that is <img src="https://latex.codecogs.com/png.latex?y_i%20%5Csim%20N(u(s_i),%201)">. In this case, the log-likelihood is <img src="https://latex.codecogs.com/png.latex?%0A%5Clog%20p(y%5Cmid%20%5Csigma,%20%5Cell)%20=%20-%5Cfrac%7B1%7D%7B2%7D%20%5Clog%20%5Cdet(%5CSigma(%5Ctheta)%20+%20I)%20-%20%5Cfrac%7B1%7D%7B2%7Dy%5E%7BT%7D(%5CSigma(%5Ctheta)%20+%20I)%5E%7B-1%7Dy.%0A"></p>
<p>Let us do the exact same thing again!</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb5-1">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">500</span></span>
<span id="cb5-2">dat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s1 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">runif</span>(n), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s2 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">runif</span>(n), </span>
<span id="cb5-3">              <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dist_mat =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dist</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cbind</span>(s1,s2))),</span>
<span id="cb5-4">              <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mu =</span> MASS<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mvrnorm</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mu=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,n), </span>
<span id="cb5-5">                      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Sigma =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cov_fun</span>(dist_mat, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>)),</span>
<span id="cb5-6">              <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, mu, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb5-7"></span>
<span id="cb5-8">log_lik <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(sigma, ell, y, h) {</span>
<span id="cb5-9">  V <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cov_fun</span>(h, sigma, ell)</span>
<span id="cb5-10">  R <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chol</span>(V <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">diag</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dim</span>(V)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]))</span>
<span id="cb5-11">  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">diag</span>(R))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">backsolve</span>(R, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">backsolve</span>(R, y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">transpose =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)))</span>
<span id="cb5-12">}</span>
<span id="cb5-13"></span>
<span id="cb5-14">m <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span></span>
<span id="cb5-15">f <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">partial</span>(log_lik, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">h =</span> dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>dist_mat)</span>
<span id="cb5-16"></span>
<span id="cb5-17">pars <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> \(c) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ell =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length.out =</span> m),</span>
<span id="cb5-18">                    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sigma =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(c <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> ell), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">c =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(c, m))</span>
<span id="cb5-19"></span>
<span id="cb5-20"> ll <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_df</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length.out =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>),pars) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">contour =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(c), </span>
<span id="cb5-22">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ll =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map2_dbl</span>(sigma, ell, </span>
<span id="cb5-23">                       <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">possibly</span>(f, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">otherwise =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA_real_</span>)))</span>
<span id="cb5-24"></span>
<span id="cb5-25"></span>
<span id="cb5-26">ll <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(ell, ll, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> contour)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb5-27">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">show.legend =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-28">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#scale_color_brewer(palette = "Set1") +</span></span>
<span id="cb5-29">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>()</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-07-priors5/priors5_files/figure-html/unnamed-chunk-5-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb6-1">f_trim <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> \(sigma, ell) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(sigma<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>ell <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> sigma<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>ell,</span>
<span id="cb6-2">                               <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA_real_</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(sigma, ell))</span>
<span id="cb6-3">m <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span></span>
<span id="cb6-4">surf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expand_grid</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ell =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length.out =</span> m),</span>
<span id="cb6-5">                    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sigma =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length.out =</span> m)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ll =</span>  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map2_dbl</span>(sigma, ell, </span>
<span id="cb6-7">                       <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">possibly</span>(f_trim, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">otherwise =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA_real_</span>)))</span>
<span id="cb6-8"></span>
<span id="cb6-9">surf <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(ll <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">360</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(ell, sigma, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> ll)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb6-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_raster</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_fill_viridis_c</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>()</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-07-priors5/priors5_files/figure-html/unnamed-chunk-6-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Once again, we can see that there is going to be a ridge in the likelihood surface! It’s a bit less disastrous this time, but it’s not excellent even with 500 observations (which is a decent number on a unit square). The weird structure of the likelihood is still going to lead to a long, non-elliptical shape in your posterior that your computational engine (and your person interpreting the results) are going to have to come to terms with. In particular, if you only look at the posterior marginal distributions for <img src="https://latex.codecogs.com/png.latex?%5Csigma"> and <img src="https://latex.codecogs.com/png.latex?%5Cell"> you may miss the fact that <img src="https://latex.codecogs.com/png.latex?%5Csigma%20%5Cell%5E%7B%5Cnu%7D"> is quite well estimated by the data even though the marginals for both <img src="https://latex.codecogs.com/png.latex?%5Csigma"> and <img src="https://latex.codecogs.com/png.latex?%5Cell"> are very wide.</p>
<p>This ridge in the likelihood is going to translate somewhat into a ridge in the prior. We will see below that how much of that ridge we see is going to be very dependent on how we specify the prior. The entire purpose of the PC prior is to meaningfully resolve this ridge using sensible prior information.</p>
<p>But before we get to the (improved) PC prior, it’s worthwhile to survey some other priors that have been proposed in the literature.</p>
</section>
<section id="so-the-prior-is-important-then-what-do-other-people-do" class="level3">
<h3 class="anchored" data-anchor-id="so-the-prior-is-important-then-what-do-other-people-do">So the prior is important then! What do other people do?</h3>
<p>That ridge in the likelihood surface does not go away in low dimensions, which essentially means that our inference along that ridge is going to be driven by the prior.</p>
<p>Possibly the worst choice you could make in this situation is trying to make a minimally informative prior. Of course, that’s what <a href="https://www.google.com/search?client=safari&amp;rls=en&amp;q=Objective+Bayesian+Analysis+of+Spatially+Correlated+Data%2C&amp;ie=UTF-8&amp;oe=UTF-8">somebody did when they made a reference prior for the problem</a>. In fact it was the first paper<sup>41</sup> that looks rigorously at prior distributions on the parameters of GPs. It’s just unfortunate that it’s quite shit. It has still been cited quite a lot. And there are some technical advances to the theory of reference priors, but if you use it you just find yourself mapping out that damn ridge.</p>
<p>On top of being, structurally, a bad choice, the reference prior has a few other downsides:</p>
<ul>
<li>It is very computationally intensive and quite complex. Not unlike the bad version of the PC prior!</li>
<li>It requires <em>strong</em> assumptions about the likelihood. The first version assumed that there was no observation noise. Later papers allowed there to be observation noise. But only if it’s Gaussian.</li>
<li>It is derived under the asymptotic regime where an infinite sequence of different independent realisations of the GP are observed at the same finite set of points. This is not the most useful regime for GPs.</li>
</ul>
<p>All in all, it’s a bit of a casserole.</p>
<p>From the other end, there’s a very interesting contribution from <a href="https://arxiv.org/pdf/0908.3556.pdf">Aad van der Vaart and Harry van Zanten</a> wrote a very lovely theoretical paper that looked at which priors on <img src="https://latex.codecogs.com/png.latex?%5Cell"> could result in theoretically optimal contraction rates for the posterior of <img src="https://latex.codecogs.com/png.latex?u(%5Ccdot)">. They argued that <img src="https://latex.codecogs.com/png.latex?%5Cell%5E%7B-d%7D"> should have a Gamma distribution. Within the Matérn class, their results are only valid for the squared exponential contrivance function.</p>
<p>One of the stranger things that I have never fully understood is that the argument I’m going to make below ends up with a gamma distribution on <img src="https://latex.codecogs.com/png.latex?%5Cell%5E%7B-d/2%7D">, which is somewhat different to van der Vaart and van Zanten. If I was to being forced to bullshit some justification I’d probably say something about the Matérn process depending only on the distance between observations makes the <img src="https://latex.codecogs.com/png.latex?d">-sphere the natural geometry (the volume of which scales like <img src="https://latex.codecogs.com/png.latex?%5Cell%5E%7B-d/2%7D">) rather than the <img src="https://latex.codecogs.com/png.latex?d">-cube (the volume of which scales lie <img src="https://latex.codecogs.com/png.latex?%5Cell%5E%7B-d%7D">). But that would be total bullshit. I simply have no idea. They’re proposal comes via the time-honoured tradition of “constant chasing” in some fairly tricky proofs, so I have absolutely no intuition for it.</p>
<p>We also found in other contexts that use the KL divergence rather than its square root tended to perform worse. So I’m kinda happy with our scaling and, really, their paper doesn’t cover the covariance functions I’m considering in this post.</p>
<p>Neither<sup>42</sup> of these papers consider that ridge in the likelihood surface.</p>
<p>This lack of consideration—as well as their success in everything else we tried them on—was a big part of our push to make a useful version of a PC prior for Gaussian processes.</p>
</section>
<section id="rescuing-the-pc-prior-on-ell-or-what-i-recommend-you-do" class="level3">
<h3 class="anchored" data-anchor-id="rescuing-the-pc-prior-on-ell-or-what-i-recommend-you-do">Rescuing the PC prior on <img src="https://latex.codecogs.com/png.latex?%5Cell">; or What I recommend you do</h3>
<p>It has been a long journey, but we are finally where I wanted us to be. So let’s talk about how to fix the PC prior. In particular, I’m going to go through how to derive a prior on the length scale <img src="https://latex.codecogs.com/png.latex?%5Cell"> that has a simple form.</p>
<p>In order to solve this problem, we are going to do three things in the rest of this post:</p>
<ol type="1">
<li>Restrict our attention to the stationary<sup>43</sup> GPs</li>
<li>Restrict our attention to the Matérn class of covariance functions.</li>
<li>Greatly increase our mathematical<sup>44</sup> sophistication.</li>
</ol>
<p>But before we do that, I’m going to walk you through the punchline.</p>
<p>This work was originally done with the magnificent <a href="https://www.ntnu.edu/employees/fuglstad">Geir-Arne Fuglstad</a>, the glorious <a href="https://www.maths.ed.ac.uk/~flindgre/">Finn Lindren</a>, and the resplendent <a href="https://www.kaust.edu.sa/en/study/faculty/haavard-rue">Håvard Rue</a>. If you want to read the original paper, <a href="https://arxiv.org/abs/1503.00256">the preprint is here</a><sup>45</sup>.</p>
<p>The PC prior is derived using the base model <img src="https://latex.codecogs.com/png.latex?%5Cell%20=%20%5Cinfty">, which might seem like a slightly weird choice. The intuition behind it is that if there is strong dependence between far away points, the realisations of <img src="https://latex.codecogs.com/png.latex?u(%5Ccdot)"> cannot be too wiggly. In some context people talk about <img src="https://latex.codecogs.com/png.latex?%5Cell"> as a <em>“smoothness”</em><sup>46</sup> parameter because realisations with large <img src="https://latex.codecogs.com/png.latex?%5Cell"> “look”<sup>47</sup> smoother than realisations with small <img src="https://latex.codecogs.com/png.latex?%5Cell">.</p>
<p>Another way to see the same thing is to note that a Matérn field approaches a<sup>48</sup> smoothing spline prior, in which case <img src="https://latex.codecogs.com/png.latex?%5Csigma%5E%7B-2%7D"> plays the role of the “smoothing parameter” of the spline. In that case, the natural base model of <img src="https://latex.codecogs.com/png.latex?%5Csigma=0"> interacts with the base model of <img src="https://latex.codecogs.com/png.latex?%5Cell%20=%20%5Cinfty"> to shrink towards an increasingly flat surface centred on zero.</p>
<p>We still need to choose a quantity of interest in order to encode some explicit information in the prior. In this case, I’m going to use the idea that for any data set, we only have information up to a certain spatial resolution. In that case, we don’t want to put prior mass on the length scale being less than that resolution. Why? Well any inference about <img src="https://latex.codecogs.com/png.latex?%5Cell"> at a smaller scale than the data resolution is going to be driven entirely by unverifiable model assumptions. And that feels a bit awkward. This suggests that we chose a minimum<sup>49</sup> length scale <img src="https://latex.codecogs.com/png.latex?L"> and choose the scaling parameter in the PC prior so that <img src="https://latex.codecogs.com/png.latex?%0A%5CPr(%5Cell%20%3C%20L)%20%3C%20%5Calpha_%5Cell.%0A"></p>
<p>Under these assumptions, the PC prior for the length scale in a <img src="https://latex.codecogs.com/png.latex?d">-dimensional space is<sup>50</sup> a Fréchet distribution<sup>51</sup> with shape parameter <img src="https://latex.codecogs.com/png.latex?d/2"> and scale parameter <img src="https://latex.codecogs.com/png.latex?%5Clambda_%5Cell%5E%7B2/d%7D">. That is, <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Cell)%20=%20%5Cfrac%7Bd%5Clambda_%5Cell%7D%7B2%7D%20%5Cell%5E%7B-(d/2+1)%7De%5E%7B-%5Clambda_%7B%5Cell%7D%5Cell%5E%7B-d/2%7D%7D,%0A"> where we choose <img src="https://latex.codecogs.com/png.latex?%5Clambda_%5Cell%20=%20-%5Clog(%5Calpha_%5Cell)L%5E%7Bd/2%7D"> to ensure that <img src="https://latex.codecogs.com/png.latex?%0A%5CPr(%5Cell%20%3C%20L)%20=%20e%5E%7B-%5Clambda%20L%5E%7B-d/2%7D%7D%20%3C%20%5Calpha_%5Cell.%0A"></p>
<p>In two dimensions, this is an inverse gamma prior, which gives rigorous justification to a commonly used prior in spatial statistics.</p>
</section>
<section id="comparing-it-with-the-reference-prior" class="level3">
<h3 class="anchored" data-anchor-id="comparing-it-with-the-reference-prior">Comparing it with the reference prior</h3>
<p>Ok, so let’s actually see how much of a difference using a weakly informative prior makes relative to using the reference prior.</p>
<p>In the interest of computational speed, I’m going to use the simplest possible model setup, <img src="https://latex.codecogs.com/png.latex?%0Ay%20%5Cmid%20%5Csigma,%5Cell%20%5Csim%20N(0,%20%5Csigma%5E2%20R(%5Cell)),%0A"> and I’m only going to use 25 observations.</p>
<p>In this case<sup>52</sup> is <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Cell,%20%5Csigma)%20=%20%5Csigma%5E%7B-1%7D%5Cleft(%5Coperatorname%7Btr%7D%5Cleft%5B%5Cleft(%5Cfrac%7B%5Cpartial%20R%7D%7B%5Cpartial%20%5Cell%7DR%5E%7B-1%7D%5Cright)%5E2%5Cright%5D%20-%20%5Cfrac%7B1%7D%7Bn%7D%5Coperatorname%7Btr%7D%5Cleft(%5Cfrac%7B%5Cpartial%20R%7D%7B%5Cpartial%20%5Cell%7DR%5E%7B-1%7D%5Cright)%5E2%5Cright)%5E%7B1/2%7D.%0A"></p>
<p>Even with this limited setup, it took a lot of work to make Stan sample this posterior. You’ll notice that I did a ridge-aware reparameterisation. I also had to run twice as much warm up as I ordinarily would.</p>
<p>The Stan code is under the fold.</p>
<div class="cell" data-output.var="fake">
<details class="code-fold">
<summary>Show the Stan code!</summary>
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode numberSource stan number-lines code-with-copy"><code class="sourceCode stan"><span id="cb7-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">functions</span> {</span>
<span id="cb7-2">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">matrix</span> cov(<span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> N, <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">matrix</span> s,  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">real</span> ell) {</span>
<span id="cb7-3">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">matrix</span>[N,N] R;</span>
<span id="cb7-4">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">row_vector</span>[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] s1, s2;</span>
<span id="cb7-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (i <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:N) {</span>
<span id="cb7-6">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (j <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:N){</span>
<span id="cb7-7">        s1 = s[i, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>];</span>
<span id="cb7-8">        s2 = s[j, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>];</span>
<span id="cb7-9">        R[i,j] = exp(-sqrt(dot_self(s1-s2))/ell);</span>
<span id="cb7-10">      }</span>
<span id="cb7-11">    }</span>
<span id="cb7-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span> * (R + R');</span>
<span id="cb7-13">  }</span>
<span id="cb7-14">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">matrix</span> cov_diff(<span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> N, <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">matrix</span> s,  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">real</span> ell) {</span>
<span id="cb7-15">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// dR /d ell = cov(N, p ,s, sigma2*|x-y|/ell^2, ell)</span></span>
<span id="cb7-16">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">matrix</span>[N,N] R;</span>
<span id="cb7-17">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">row_vector</span>[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] s1, s2;</span>
<span id="cb7-18">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (i <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:N) {</span>
<span id="cb7-19">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (j <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:N){</span>
<span id="cb7-20">        s1 = s[i, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>];</span>
<span id="cb7-21">        s2 = s[j, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>];</span>
<span id="cb7-22">        R[i,j] =  sqrt(dot_self(s1-s2)) * exp(-sqrt(dot_self(s1-s2))/ell) / ell^<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> ;</span>
<span id="cb7-23">      }</span>
<span id="cb7-24">    }</span>
<span id="cb7-25">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span> * (R + R');</span>
<span id="cb7-26">  }</span>
<span id="cb7-27"></span>
<span id="cb7-28">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">real</span> log_prior(<span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> N, <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">matrix</span> s, <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">real</span> sigma2, <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">real</span> ell) {</span>
<span id="cb7-29">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">matrix</span>[N,N] R = cov(N, s,  ell);</span>
<span id="cb7-30">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">matrix</span>[N,N] W = (cov_diff(N, s, ell)) / R;</span>
<span id="cb7-31">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span> * log(trace(W * W) - (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span> / (N)) * (trace(W))^<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) - log(sigma2);</span>
<span id="cb7-32">  }</span>
<span id="cb7-33">}</span>
<span id="cb7-34"></span>
<span id="cb7-35"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">data</span> {</span>
<span id="cb7-36">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span>&lt;<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lower</span>=<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>&gt; N;</span>
<span id="cb7-37">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">vector</span>[N] y;</span>
<span id="cb7-38">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">matrix</span>[N,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] s;</span>
<span id="cb7-39">}</span>
<span id="cb7-40"></span>
<span id="cb7-41"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">parameters</span> {</span>
<span id="cb7-42">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">real</span>&lt;<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lower</span>=<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>&gt; sigma2;</span>
<span id="cb7-43">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">real</span>&lt;<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lower</span>=<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>&gt; ell;</span>
<span id="cb7-44">}</span>
<span id="cb7-45"></span>
<span id="cb7-46"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">model</span> {</span>
<span id="cb7-47">  {</span>
<span id="cb7-48">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">matrix</span>[N,N] R = cov(N, s, ell);</span>
<span id="cb7-49">    <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">target +=</span> multi_normal_lpdf(y | rep_vector(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>, N), sigma2 * R);</span>
<span id="cb7-50">  }</span>
<span id="cb7-51">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">target +=</span> log_prior(N,  s, sigma2, ell);</span>
<span id="cb7-52">}</span>
<span id="cb7-53"></span>
<span id="cb7-54"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">generated quantities</span> {</span>
<span id="cb7-55">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">real</span> sigma = sqrt(sigma2);</span>
<span id="cb7-56">}</span></code></pre></div>
</details>
</div>
<p>By comparison, the code for the PC prior is fairly simple.</p>
<div class="cell" data-output.var="fake">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode numberSource stan number-lines code-with-copy"><code class="sourceCode stan"><span id="cb8-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">functions</span> {</span>
<span id="cb8-2">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">matrix</span> cov(<span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> N, <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">matrix</span> s, <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">real</span> sigma, <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">real</span> ell) {</span>
<span id="cb8-3">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">matrix</span>[N,N] R;</span>
<span id="cb8-4">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">row_vector</span>[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] s1, s2;</span>
<span id="cb8-5">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">real</span> sigma2 = sigma * sigma;</span>
<span id="cb8-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (i <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:N) {</span>
<span id="cb8-7">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (j <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:N){</span>
<span id="cb8-8">        s1 = s[i, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>];</span>
<span id="cb8-9">        s2 = s[j, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>];</span>
<span id="cb8-10">        R[i,j] = sigma2 * exp(-sqrt(dot_self(s1-s2))/ell);</span>
<span id="cb8-11">      }</span>
<span id="cb8-12">    }</span>
<span id="cb8-13">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span> * (R + R');</span>
<span id="cb8-14">  }</span>
<span id="cb8-15">}</span>
<span id="cb8-16"></span>
<span id="cb8-17"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">data</span> {</span>
<span id="cb8-18">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span>&lt;<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lower</span>=<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>&gt; N;</span>
<span id="cb8-19">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">vector</span>[N] y;</span>
<span id="cb8-20">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">matrix</span>[N,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] s;</span>
<span id="cb8-21">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">real</span>&lt;<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lower</span> = <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>&gt; lambda_ell;</span>
<span id="cb8-22">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">real</span>&lt;<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lower</span> = <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>&gt; lambda_sigma;</span>
<span id="cb8-23">}</span>
<span id="cb8-24"></span>
<span id="cb8-25"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">parameters</span> {</span>
<span id="cb8-26">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">real</span>&lt;<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lower</span>=<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>&gt; sigma;</span>
<span id="cb8-27">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">real</span>&lt;<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lower</span>=<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>&gt; ell;</span>
<span id="cb8-28">}</span>
<span id="cb8-29"></span>
<span id="cb8-30"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">model</span> {</span>
<span id="cb8-31">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">matrix</span>[N,N] R = cov(N, s, sigma, ell);</span>
<span id="cb8-32">  y ~ multi_normal(rep_vector(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>, N), R);</span>
<span id="cb8-33">  sigma ~ exponential(lambda_sigma);</span>
<span id="cb8-34">  ell ~ frechet(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, lambda_ell); <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Only in 2D</span></span>
<span id="cb8-35">}</span>
<span id="cb8-36"></span>
<span id="cb8-37"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// generated quantities {</span></span>
<span id="cb8-38"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//   real check = 0.0; // should be the same as lp__</span></span>
<span id="cb8-39"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//   { // I don't want to print R!</span></span>
<span id="cb8-40"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//     matrix[N,N] R = cov(N, s, sigma, ell);</span></span>
<span id="cb8-41"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//     check -= 0.5* dot_product(y,(R\ y)) + 0.5 * log_determinant(R);</span></span>
<span id="cb8-42"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//     check += log(sigma) - lambda_sigma * sigma;</span></span>
<span id="cb8-43"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//     check += log(ell) - 2.0 * log(ell) - lambda_ell / ell;</span></span>
<span id="cb8-44"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//   }</span></span>
<span id="cb8-45"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// }</span></span></code></pre></div>
</div>
<p>This is <em>a lot</em> easier than the code for the reference prior.</p>
<p>Let’s compare the results on some simulated data. Here I’m choosing <img src="https://latex.codecogs.com/png.latex?%5Calpha_%5Cell%20=%20%5Calpha_%5Csigma%20=%200.05">, <img src="https://latex.codecogs.com/png.latex?L_%5Cell%20=%200.05">, and <img src="https://latex.codecogs.com/png.latex?U_%5Csigma%20=%205">.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(cmdstanr)</span>
<span id="cb9-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(posterior)</span>
<span id="cb9-3">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">25</span></span>
<span id="cb9-4"></span>
<span id="cb9-5">dat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s1 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">runif</span>(n), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s2 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">runif</span>(n), </span>
<span id="cb9-6">              <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dist_mat =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dist</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cbind</span>(s1,s2))),</span>
<span id="cb9-7">              <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> MASS<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mvrnorm</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mu=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,n), </span>
<span id="cb9-8">                                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Sigma =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cov_fun</span>(dist_mat, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>)))</span>
<span id="cb9-9"></span>
<span id="cb9-10">stan_dat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>y,</span>
<span id="cb9-11">                 <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cbind</span>(dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>s1,dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>s2),</span>
<span id="cb9-12">                 <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">N =</span> n,</span>
<span id="cb9-13">                 <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lambda_ell =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>),</span>
<span id="cb9-14">                 <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lambda_sigma =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb9-15"></span>
<span id="cb9-16">mod_ref <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cmdstan_model</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gp_ref_no_mean.stan"</span>)</span>
<span id="cb9-17">mod_pc <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cmdstan_model</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gp_pc_no_mean.stan"</span>)</span></code></pre></div>
</div>
<p>First off, let’s look at the parameter estimates from the reference prior</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb10-1">fit_ref <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mod_ref<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> stan_dat, </span>
<span id="cb10-2">                          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">seed =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30127</span>, </span>
<span id="cb10-3">                          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">parallel_chains =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, </span>
<span id="cb10-4">                          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">iter_warmup =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2000</span>,</span>
<span id="cb10-5">                          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">iter_sampling =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2000</span>,</span>
<span id="cb10-6">                          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">refresh =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) </span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Running MCMC with 4 parallel chains...

Chain 1 finished in 41.6 seconds.
Chain 2 finished in 43.4 seconds.
Chain 4 finished in 44.8 seconds.
Chain 3 finished in 47.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 44.2 seconds.
Total execution time: 47.2 seconds.</code></pre>
</div>
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb12-1">fit_ref<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code> variable   mean median     sd  mad     q5    q95 rhat ess_bulk ess_tail
   lp__   -30.95 -30.57   1.24 0.89 -33.46 -29.79 1.00     1397      896
   sigma2  32.56   1.28 823.19 0.58   0.69   7.19 1.00      979      562
   ell      9.04   0.26 240.39 0.16   0.11   1.88 1.00      927      542
   sigma    1.67   1.13   5.46 0.27   0.83   2.68 1.00      979      562</code></pre>
</div>
</div>
<p>It also took a bloody long time.</p>
<p>Now let’s check in with the PC prior.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb14-1">fit_pc <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mod_pc<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> stan_dat, </span>
<span id="cb14-2">                          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">seed =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30127</span>, </span>
<span id="cb14-3">                          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">parallel_chains =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,</span>
<span id="cb14-4">                          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">iter_sampling =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2000</span>,</span>
<span id="cb14-5">                          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">refresh =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) </span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Running MCMC with 4 parallel chains...

Chain 1 finished in 4.9 seconds.
Chain 4 finished in 5.1 seconds.
Chain 3 finished in 5.4 seconds.
Chain 2 finished in 5.5 seconds.

All 4 chains finished successfully.
Mean chain execution time: 5.2 seconds.
Total execution time: 5.6 seconds.</code></pre>
</div>
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb16-1">fit_pc<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code> variable   mean median   sd  mad     q5   q95 rhat ess_bulk ess_tail
    lp__  -10.36 -10.05 1.02 0.76 -12.42 -9.36 1.00     2160     3228
    sigma   1.52   1.36 0.60 0.41   0.92  2.72 1.00     1424     1853
    ell     0.67   0.45 0.72 0.27   0.19  1.89 1.00     1338     1694</code></pre>
</div>
</div>
<p>You’ll notice two things there: it did a much better job at sampling and it was <em>much</em> faster.</p>
<p>Finally, let’s look at some plots. First off, let’s look at some 2D density plots.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb18-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(cowplot)</span>
<span id="cb18-2">samps_ref <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> fit_ref<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">draws</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"draws_df"</span>)</span>
<span id="cb18-3">samps_pc <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> fit_pc<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">draws</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"draws_df"</span>)</span>
<span id="cb18-4"></span>
<span id="cb18-5">p1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> samps_ref <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span>  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(ell, sigma)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb18-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_hex</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb18-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_viridis_c</span>()</span>
<span id="cb18-8">p2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> samps_pc <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span>  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(ell, sigma)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb18-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_hex</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb18-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_viridis_c</span>()</span>
<span id="cb18-11"></span>
<span id="cb18-12"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_grid</span>(p1,p2)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-07-priors5/priors5_files/figure-html/unnamed-chunk-12-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>It would be interesting to look at how different the densities for <img src="https://latex.codecogs.com/png.latex?%5Cell"> are.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb19-1">samps_pc <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(ell)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_density</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_density</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(samps_ref<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ell), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlim</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-07-priors5/priors5_files/figure-html/unnamed-chunk-13-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>As expected, the PC prior (black) pulls the posterior towards the base model (<img src="https://latex.codecogs.com/png.latex?%5Cell%20=%20%5Cinfty">), but what is interesting to me is that the posterior for the reference prior (red) has so much mass near zero. <a href="https://www.youtube.com/watch?v=_U-7L1tmBAo">That’s the one thing we didn’t want to happen</a>.</p>
<p>We can look closer at this by looking at the posterior for <img src="https://latex.codecogs.com/png.latex?%5Ckappa%20=%202%5Cell%5E%7B-1%7D">.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb20-1">p3 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> samps_ref <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb20-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">kappa =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>ell) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb20-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(kappa, sigma)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb20-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_hex</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb20-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_viridis_c</span>()</span>
<span id="cb20-6">p4 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> samps_pc <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span>  </span>
<span id="cb20-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">kappa =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>ell) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb20-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(kappa, sigma)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb20-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_hex</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb20-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_viridis_c</span>()</span>
<span id="cb20-11"></span>
<span id="cb20-12"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_grid</span>(p3, p4)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-07-priors5/priors5_files/figure-html/unnamed-chunk-14-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>To be brutally francis with you all, I’m not sure how much I trust that Stan posterior, so I’m going to look at the posterior along the ridge.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb21-1">log_prior <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(sigma, ell) {</span>
<span id="cb21-2">  V <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cov_fun</span>(dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>dist_mat, sigma, ell)</span>
<span id="cb21-3">  dV <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> (V <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> dat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>dist_mat)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>ell<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb21-4">  U <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">solve</span>(V, dV))</span>
<span id="cb21-5">  lprior <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">diag</span>(U <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> U)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">diag</span>(U))<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>n) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(sigma)</span>
<span id="cb21-6">}</span>
<span id="cb21-7"></span>
<span id="cb21-8">log_posterior <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> \(sigma, ell) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log_prior</span>(sigma, ell) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f_direct</span>(sigma, ell)</span>
<span id="cb21-9"></span>
<span id="cb21-10">m <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">500</span></span>
<span id="cb21-11">pars <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> \(c) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ell =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.001</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length.out =</span> m),</span>
<span id="cb21-12">                    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sigma =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(c <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> ell), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">c =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(c, m))</span>
<span id="cb21-13"></span>
<span id="cb21-14">lpost <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_df</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.001</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length.out =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>),pars) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb21-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tau =</span> c, </span>
<span id="cb21-16">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">log_posterior =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map2_dbl</span>(sigma, ell, </span>
<span id="cb21-17">                       <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">possibly</span>(log_posterior, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">otherwise =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA_real_</span>)))</span>
<span id="cb21-18"></span>
<span id="cb21-19"></span>
<span id="cb21-20">lpost <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb21-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(log_posterior <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb21-22">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(ell, log_posterior, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> tau, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">group =</span> tau)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb21-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb21-24">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#scale_color_brewer(palette = "Set1") +</span></span>
<span id="cb21-25">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>() </span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-07-priors5/priors5_files/figure-html/unnamed-chunk-15-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>We can compare this with the likelihood surface.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb22-1">llik <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_df</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.001</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length.out =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>),pars) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb22-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tau =</span> c, </span>
<span id="cb22-3">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">log_likelihood =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map2_dbl</span>(sigma, ell, </span>
<span id="cb22-4">                       <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">possibly</span>(f_direct, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">otherwise =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA_real_</span>)))</span>
<span id="cb22-5"></span>
<span id="cb22-6">lprior <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_df</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.001</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length.out =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>),pars) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb22-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tau =</span> c, </span>
<span id="cb22-8">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">log_prior =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map2_dbl</span>(sigma, ell, </span>
<span id="cb22-9">                       <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">possibly</span>(log_prior, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">otherwise =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA_real_</span>)))</span>
<span id="cb22-10"></span>
<span id="cb22-11">p1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> llik <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb22-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(log_likelihood <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb22-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(ell, log_likelihood, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> tau, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">group =</span> tau)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb22-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb22-15">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#scale_color_brewer(palette = "Set1") +</span></span>
<span id="cb22-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>() </span>
<span id="cb22-17"></span>
<span id="cb22-18">p2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> lprior <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb22-19">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(log_prior <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb22-20">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(ell, log_prior, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> tau, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">group =</span> tau)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb22-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb22-22">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#scale_color_brewer(palette = "Set1") +</span></span>
<span id="cb22-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>() </span>
<span id="cb22-24"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_grid</span>(p1, p2)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-07-priors5/priors5_files/figure-html/unnamed-chunk-16-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>You can see here that the prior is putting <em>a lot</em> of weight at zero relative to the likelihood surface, which is relatively flat.</p>
<p>It’s also important to notice that the ridge isn’t as flat with <img src="https://latex.codecogs.com/png.latex?n=25"> as it is with <img src="https://latex.codecogs.com/png.latex?n=500">. It would be very interesting to repeat this with larger values of <img src="https://latex.codecogs.com/png.latex?n">, but frankly I do not have the time.</p>
</section>
<section id="moving-beyond-the-matérn" class="level3">
<h3 class="anchored" data-anchor-id="moving-beyond-the-matérn">Moving beyond the Matérn</h3>
<p>There is <em>a lot</em> more to say on this topic. But honestly this blog post is already enormous (you are a bit over halfway if you choose to read the technical guff). So I’m just going to summarise some of the things that I think are important here.</p>
<p>Firstly, the rigorous construction of the PC prior only makes sense when <img src="https://latex.codecogs.com/png.latex?d%20%5Cleq%203">. This is a bit annoying, but it is what it is. I would argue that this construction is still fairly reasonable in moderate dimensions. (In high dimensions I think we need more research.)</p>
<p>There are two ways to see that. Firstly, if you look at the derivation of the distance, it involves an infinite sum that only converges when <img src="https://latex.codecogs.com/png.latex?d%20%3C%204">. But mathematically, if we can show<sup>53</sup> that the partial sums can be bounded independently of <img src="https://latex.codecogs.com/png.latex?%5Cell">, then we can just send another thing to infinity when we send the domain size and the base model length scale there.</p>
<p>A different way is to see this is to note that the PC prior distance is <img src="https://latex.codecogs.com/png.latex?d(%5Cell)%20=%20%5Cell%5E%7B-d/2%7D">. This is proportional to the inverse of the volume of the <img src="https://latex.codecogs.com/png.latex?d">-sphere<sup>54</sup> of radius <img src="https://latex.codecogs.com/png.latex?%5Cell">. This doesn’t seem like a massively useful observation, but just wait.</p>
<p>What if we ask ourselves “what is the average variance of <img src="https://latex.codecogs.com/png.latex?u(s)"> over a ball of radius <img src="https://latex.codecogs.com/png.latex?r">?”. If we write <img src="https://latex.codecogs.com/png.latex?c_%7B%5Cell,%5Csigma%7D(h)"> as the Matérn covariance function, then<sup>55</sup> <img src="https://latex.codecogs.com/png.latex?%0A%5Coperatorname%7BVar%7D%5Cleft(%5Cfrac%7B1%7D%7B%5Coperatorname%7BVol%7D(%5Cmathbb%7BB%7D_d(r))%7D%5Cint_%7B%5Cmathbb%7BB%7D_d(r)%7Du(s)%5C,ds%5Cright)%20=%20%5Cfrac%7B1%7D%7B%5Coperatorname%7BVol%7D(%5Cmathbb%7BB%7D_d(r))%7D%20%5Cint_0%5E%5Cinfty%20%5Ctilde%7Bc%7D_%7B%5Cell,%20%5Csigma%7D(t)%20t%5E%7Bd-1%7D%5C,dt,%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20c_%7B%5Cell,%20%5Csigma%7D(t)%20=%20c_%7B%5Cell,%20%5Csigma%7D(h)"> for all <img src="https://latex.codecogs.com/png.latex?%5C%7Ch%5C%7C%20=%20t">. If we remember that <img src="https://latex.codecogs.com/png.latex?c_%7B%5Cell,%20%5Csigma%7D(s)%20=%20c_%7B1,%20%5Csigma%7D(%5Cell%20s)">, then we can write this as <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B1%7D%7B%5Coperatorname%7BVol%7D(%5Cmathbb%7BB%7D_d(r))%7D%20%5Cint_0%5E%5Cinfty%20%5Ctilde%7Bc%7D_%7B1,%20%5Csigma%7D(%5Cell%20t)%20t%5E%7Bd-1%7D%5C,dt%20=%20%5Cfrac%7B%5Cell%5E%7B-d%7D%7D%7B%5Coperatorname%7BVol%7D(%5Cmathbb%7BB%7D_d(r))%7D%20%5Cint_0%5E%5Cinfty%20%5Ctilde%7Bc%7D_%7B1,%20%5Csigma%7D(v)%20v%5E%7Bd-1%7D%5C,dv.%0A"> Hence the PC prior on <img src="https://latex.codecogs.com/png.latex?%5Cell"> is penalising the change in average standard deviation over a ball relative to the unit length scale. With this interpretation, the base model is, once again, zero standard deviation. This reasoning carries over to the length scale parameter in <em>any</em><sup>56</sup> Gaussian process.</p>
<p>This post only covers the simplest version of Matérn GPs. One simple extension is to construct a non-stationary GP by replacing the Euclidean distance with the distance on a manifold with volume element <img src="https://latex.codecogs.com/png.latex?R(s)%5C,ds">. This might seem like a weird and abstract thing to do, but it’s an intrinsic specification of the popular deformation method due to <a href="https://www.jstor.org/stable/2290458">Guttorp and Samson</a>. <a href="https://arxiv.org/abs/1503.00256">Our paper</a> covers the prior specification in this case.</p>
<p>The other common case that I’ve not considered here is the extension where there is a different length scale<sup>57</sup> in each dimension. In this case, we could compute a PC prior independently for each dimension (so <img src="https://latex.codecogs.com/png.latex?d=1"> for each prior). To be completely honest with you, I worry a little bit about that choice in high dimensions<sup>58</sup> (products of independent priors being notoriously weird), but I don’t have a better suggestion.</p>
</section>
<section id="whats-in-the-rest-of-the-post" class="level3">
<h3 class="anchored" data-anchor-id="whats-in-the-rest-of-the-post">What’s in the rest of the post?</h3>
<p>So you might have noticed that even though the previous section is a “conclusion” section, there is quite a bit more blog to go. I shan’t lie: this whole thing up to this point is a tl;dr that got wildly out of control.</p>
<p>The rest of the post is the details.</p>
<p>There are two parts. The first part covers enough<sup>59</sup> of the theory of stationary GPs to allow us to understand the second part, which actually derives the PC prior.</p>
<p>It’s going to get a bit hairy and I’m going to assume you’ve at least skimmed through the first 2 definitions in my <a href="https://dansblog.netlify.app/posts/2021-11-03-yes-but-what-is-a-gaussian-process-or-once-twice-three-times-a-definition-or-a-descent-into-madness/yes-but-what-is-a-gaussian-process-or-once-twice-three-times-a-definition-or-a-descent-into-madness.html">previous post defining GPs</a>.</p>
<p>I fully expect that most people will want to stop reading here. But you shouldn’t. Because if I had to suffer you all have to suffer.</p>
</section>
</section>
<section id="part-2-an-invitation-to-the-theory-of-stationary-gaussian-processes" class="level2">
<h2 class="anchored" data-anchor-id="part-2-an-invitation-to-the-theory-of-stationary-gaussian-processes">Part 2: An invitation to the theory of Stationary Gaussian processes</h2>
<p>Gaussian processes with the Matérn covariance function are an excellent example of a stationary<sup>60</sup> Gaussian process, which are characterised<sup>61</sup> <sup>62</sup> by have covariance functions of the form <img src="https://latex.codecogs.com/png.latex?%0Ac(s,%20s')%20=%20c(s-%20s'),%0A"> where I am abusing notation and using <img src="https://latex.codecogs.com/png.latex?c"> for both the two parameter and one parameter functions. This assumption means that the correlation structure does not depend on where you are in space, only on the distance between points.</p>
<p>The assumption of stationarity massively simplifies GPs. Firstly, the stationarity assumption greatly reduces the number of parameters you need to describe a GP as we don’t need to worry about location-specific parameters. Secondly, it increases the statistical power of the data. If two subsets of the domain are more than <img src="https://latex.codecogs.com/png.latex?2%5Cell"> apart, they are essentially independent replicates of the GP with the same parameters. This means that if the locations <img src="https://latex.codecogs.com/png.latex?s"> vary across a large enough area (relative to the natural length scale), we get multiple effective replicates<sup>63</sup> from the same realisation of the process.</p>
<p>In practice, stationarity<sup>64</sup> is often a <em>good enough</em> assumption when the mean has been modelled carefully, <a href="https://arxiv.org/abs/1409.0743">especially given the limitations of the data</a>. That said, priors on non-stationary processes can be set using the PC prior methodology by using a stationary process as the base model. The <a href="https://arxiv.org/abs/1503.00256">supplementary material</a> of our paper gives a simple, but useful, example of this.</p>
<section id="stationary-covariance-functions-and-bochners-theorem" class="level3">
<h3 class="anchored" data-anchor-id="stationary-covariance-functions-and-bochners-theorem">Stationary covariance functions and Bochner’s theorem</h3>
<p>The restriction to stationary processes is <em>extremely</em> powerful. It opens us up to using Fourier analysis as a potent tool for understanding GPs. We are going to need this to construct our KL divergence, and so with some trepidation, let’s dive into the moonee ponds of spectral representations.</p>
<p>The first thing that we need to do is remember what a <em>Fourier transform</em> is. A Fourier transform of a square integrable function <img src="https://latex.codecogs.com/png.latex?%5Cphi(s)"> is<sup>65</sup> <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%20%5Cphi(%5Comega)%20=%20%5Cmathcal%7BF%7D(%5Cphi)(%5Comega)%20=%5Cfrac%7B1%7D%7B(2%5Cpi)%5Ed%7D%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%20e%5E%7B-i%5Comega%5ETs%7D%5Cphi(s)%20%5C,ds.%0A"></p>
<p>If you have bad memories<sup>66</sup> of desperately trying to compute Fourier integrals in undergrad, I promise you that we are not doing that today. We are simply affirming their right to exist (and my right to look them up in a table).</p>
<p>The reason I care about Fourier<sup>67</sup> transforms is that if I have a non-negative measure<sup>68</sup> <img src="https://latex.codecogs.com/png.latex?%5Cnu">, I can define a function <img src="https://latex.codecogs.com/png.latex?%0Ac(h)%20=%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7De%5E%7Bi%5Comega%5ETh%7D%5C,d%5Cnu(%5Comega).%0A"> If measures freak you out, you can—with some loss of generality—assume that there is a function <img src="https://latex.codecogs.com/png.latex?f(%5Comega)%5Cgeq%200"> such that <img src="https://latex.codecogs.com/png.latex?%0Ac(h)%20=%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7De%5E%7Bi%5Comega%5ETh%7Df(%5Comega)%5C,d%5Comega.%0A"> We are going to call <img src="https://latex.codecogs.com/png.latex?%5Cnu"> the spectral measure and the corresponding <img src="https://latex.codecogs.com/png.latex?f">, if it exists, is called the spectral density.</p>
<p>I put it to you that, defined this way, <img src="https://latex.codecogs.com/png.latex?c(s,s')%20=%20c(s%20-%20s')"> is a (complex) positive definite function.</p>
<p>Recall<sup>69</sup> that a function is positive definite if, for every for every <img src="https://latex.codecogs.com/png.latex?k%3E0">, every <img src="https://latex.codecogs.com/png.latex?s_1,%20%5Cldots,%20s_k%20%5Cin%20%5Cmathbb%7BR%7D%5Ed">, and every <img src="https://latex.codecogs.com/png.latex?a_1,%20%5Cldots,%20a_k%20%5Cin%20%5Cmathbb%7BC%7D"> <img src="https://latex.codecogs.com/png.latex?%0A%5Csum_%7Bi%20=%201%7D%5Ek%5Csum_%7Bj=1%7D%5Ek%20a_i%5Cbar%7Ba%7D_j%20c(s_i,%20s_j)%20%5Cgeq%200,%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Cbar%20a"> is the complex conjugate of <img src="https://latex.codecogs.com/png.latex?a">.</p>
<p>Using our assumption about <img src="https://latex.codecogs.com/png.latex?c(%5Ccdot)"> we can write the left hand side as <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Csum_%7Bi%20=%201%7D%5Ek%5Csum_%7Bj=1%7D%5Ek%20a_i%5Cbar%7Ba%7D_j%20c(s_i,%20s_j)%20&amp;=%20%5Csum_%7Bi%20=%201%7D%5Ek%5Csum_%7Bj=1%7D%5Ek%20a_i%5Cbar%7Ba%7D_j%20c(s_i-%20s_j)%20%5C%5C%0A&amp;=%5Csum_%7Bi%20=%201%7D%5Ek%5Csum_%7Bj=1%7D%5Ek%20a_i%5Cbar%7Ba%7D_j%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%20e%5E%7Bi%5Comega%5ET(s_i-s_j)%7D%5C,d%5Cnu(%5Comega)%20%5C%5C%0A&amp;=%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%5Csum_%7Bi%20=%201%7D%5Ek%5Csum_%7Bj=1%7D%5Ek%20a_i%5Cbar%7Ba%7D_j%20e%5E%7Bi%5Comega%5ET(s_i-s_j)%7D%5C,d%5Cnu(%5Comega)%20%5C%5C%0A&amp;=%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%5Cleft(%5Csum_%7Bi%20=%201%7D%5Ek%20a_i%20e%5E%7Bi%5Comega%5ETs_i%7D%5Cright)%5Cleft(%5Csum_%7Bj%20=%201%7D%5Ek%20%5Cbar%7Ba_j%7D%20e%5E%7B-i%5Comega%5ETs_j%7D%5Cright)%20%5C,d%5Cnu(%5Comega)%5C%5C%0A&amp;=%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%5Cleft(%5Csum_%7Bi%20=%201%7D%5Ek%20a_i%20e%5E%7Bi%5Comega%5ETs_i%7D%5Cright)%5Coverline%7B%5Cleft(%5Csum_%7Bj%20=%201%7D%5Ek%20a_j%20e%5E%7Bi%5Comega%5ETs_j%7D%5Cright)%7D%20%5C,d%5Cnu(%5Comega)%20%5C%5C%0A&amp;=%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%5Cleft%7C%5Csum_%7Bi%20=%201%7D%5Ek%20a_i%20e%5E%7Bi%5Comega%5ETs_i%7D%5Cright%7C%5E2%5C,d%5Cnu(%5Comega)%20%5Cgeq%200,%0A%5Cend%7Balign*%7D"> where <img src="https://latex.codecogs.com/png.latex?%7Ca%7C%5E2%20=%20a%5Cbar%7Ba%7D">.</p>
<p>We have shown that if <img src="https://latex.codecogs.com/png.latex?c(s,s')%20=%20c(s-s')%20=%20%5Cint%20e%5E%7Bi%5Comega%5ET(s-s')%7D%5C,d%20%5Cnu(%5Comega)"> , then it is a valid covariance function. This is also true, although much harder to prove, in the other direction and the result is known as Bochner’s theorem.</p>
<div id="thm-bochner" class="theorem">
<p><span class="theorem-title"><strong>Theorem 5 (Bochner’s theorem)</strong></span> A function <img src="https://latex.codecogs.com/png.latex?c(%5Ccdot)"> is positive definite, ie for every <img src="https://latex.codecogs.com/png.latex?k%3E0">, every <img src="https://latex.codecogs.com/png.latex?s_1,%20%5Cldots,%20s_k%20%5Cin%20%5Cmathbb%7BR%7D%5Ed">, and every <img src="https://latex.codecogs.com/png.latex?a_1,%20%5Cldots,%20a_k%20%5Cin%20%5Cmathbb%7BC%7D"> <img src="https://latex.codecogs.com/png.latex?%0A%5Csum_%7Bi%20=%201%7D%5Ek%5Csum_%7Bj=1%7D%5Ek%20a_i%5Cbar%7Ba%7D_j%20c(s_i-%20s_j)%20%5Cgeq%200,%0A"> if and only if there is a non-negative finite measure <img src="https://latex.codecogs.com/png.latex?%5Cnu"> such that <img src="https://latex.codecogs.com/png.latex?%0Ac(h)%20=%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%20e%5E%7Bi%5Comega%5ETh%7D%5C,d%5Cnu(%5Comega).%0A"></p>
</div>
<p>Just as a covariance function<sup>70</sup> is enough to completely specify a zero-mean Gaussian process, a spectral measure is enough to completely specify a zero mean <em>stationary</em> Gaussian process.</p>
<p>Our lives are mathematically much easier when <img src="https://latex.codecogs.com/png.latex?%5Cnu"> represents a density <img src="https://latex.codecogs.com/png.latex?f(%5Comega)"> that satisfies <img src="https://latex.codecogs.com/png.latex?%0A%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%5Cphi(%5Comega)%5C,d%5Cnu(%5Comega)%20=%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%5Cphi(%5Comega)f(%5Comega)%5C,d%5Comega.%0A"> This function, when it exists, is precisely the Fourier transform of <img src="https://latex.codecogs.com/png.latex?c(h)">. Unfortunately, this will not exist<sup>71</sup> for all possible positive definite functions. But as we drift further and further down this post, we will begin to assume that we’re only dealing with cases where <img src="https://latex.codecogs.com/png.latex?f"> exists.</p>
<p>The case of particular interest to us is the Matérn covariance function. The parameterisation used above is really lovely, but for mathematical convenience, we are going to set<sup>72</sup> <img src="https://latex.codecogs.com/png.latex?%5Ckappa%20=%20%5Csqrt%7B8%5Cnu%7D%5Cell%5E%7B-1%7D">, which has<sup>73</sup> Fourier transform <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Af(%5Comega)%20&amp;=%20%5Cfrac%7B%5CGamma(%5Cnu+d/2)%5Ckappa%5E%7B2%5Cnu%7D%5Csigma%5E2%7D%7B4%5E%7Bd%7D%5Cpi%5E%7Bd/2%7D%5CGamma(%5Cnu)%7D%5Cfrac%7B1%7D%7B(%5Ckappa%5E2%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%7B%5Cnu+d/2%7D%7D%5C%5C%0A&amp;=%20C_%5Ctext%7BMat%C3%A9rn%7D(%5Cnu,d).%5Ckappa%5E%7B2%5Cnu%7D%5Csigma%5E2%20%5Cfrac%7B1%7D%7B(%5Ckappa%5E2%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%7B%5Cnu+d/2%7D%7D,%0A%5Cend%7Balign*%7D"> where <img src="https://latex.codecogs.com/png.latex?C_%5Ctext%7BMat%C3%A9rn%7D(%5Cnu,d)"> is defined implicitly above and is a constant (as we are keeping <img src="https://latex.codecogs.com/png.latex?%5Cnu"> fixed).</p>
</section>
<section id="spectral-representations-and-the-simplest-of-the-many-many-versions-of-a-stochastic-integral" class="level3">
<h3 class="anchored" data-anchor-id="spectral-representations-and-the-simplest-of-the-many-many-versions-of-a-stochastic-integral">Spectral representations (and the simplest of the many many versions of a stochastic integral)</h3>
<p>To see this, we need a tiny bit of machinery. Specifically, we need the concept of a Gaussian <img src="https://latex.codecogs.com/png.latex?%5Cnu">-noise and its corresponding integral.</p>
<div id="def-nu-noise" class="theorem definition">
<p><span class="theorem-title"><strong>Definition 1 (Complex <img src="https://latex.codecogs.com/png.latex?%5Cnu">-noise)</strong></span> A (complex) <img src="https://latex.codecogs.com/png.latex?%5Cnu">-noise<sup>74</sup> is a random measure<sup>75</sup> <img src="https://latex.codecogs.com/png.latex?Z_%5Cnu(%5Ccdot)"> such that, for every<sup>76</sup> disjoint<sup>77</sup> pair of sets <img src="https://latex.codecogs.com/png.latex?A,%20B"> satisfies the following properties</p>
<ol type="1">
<li><img src="https://latex.codecogs.com/png.latex?Z_%5Cnu(A)"> has mean zero and variance <img src="https://latex.codecogs.com/png.latex?%5Cnu(A)">,</li>
<li>If <img src="https://latex.codecogs.com/png.latex?A"> and <img src="https://latex.codecogs.com/png.latex?B"> are disjoint then <img src="https://latex.codecogs.com/png.latex?Z_%5Cnu(A%5Ccup%20B)%20=%20Z_%5Cnu(A)%20+%20Z_%5Cnu(B)"></li>
<li>If <img src="https://latex.codecogs.com/png.latex?A"> and <img src="https://latex.codecogs.com/png.latex?B"> are disjoint then <img src="https://latex.codecogs.com/png.latex?Z_%5Cnu(A)"> and <img src="https://latex.codecogs.com/png.latex?Z_%5Cnu(B)"> are uncorrelated<sup>78</sup>, ie <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Z_%5Cnu(A)%20%5Coverline%7BZ_%5Cnu(B)%7D)%20=%200">.</li>
</ol>
</div>
<p>This definition might not seem like much, but imagine a simple<sup>79</sup> piecewise constant function <img src="https://latex.codecogs.com/png.latex?%0Af(%5Comega)%20=%20%5Csum_%7Bi=1%7D%5E%7Bn%7D%20f_i%201_%7BA_i%7D(%5Comega),%5Cquad%20g(%5Comega)%20=%20%20%5Csum_%7Bi=1%7D%5E%7Bn%7D%20g_i%201_%7BA_i%7D(%5Comega)%0A"> where <img src="https://latex.codecogs.com/png.latex?f_i,%20g_i%5Cin%20%5Cmathbb%7BC%7D"> and the sets <img src="https://latex.codecogs.com/png.latex?A_i"> are pairwise disjoint and <img src="https://latex.codecogs.com/png.latex?%5Cbigcup_%7Bi=1%7D%5En%20A_i%20%20=%20%5Cmathbb%7BR%7D%5Ed">. Then we can define an integral with respect to the <img src="https://latex.codecogs.com/png.latex?%5Cnu">-noise as <img src="https://latex.codecogs.com/png.latex?%0A%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%20f(%5Comega)%5C,dZ_%5Cnu(%5Comega)%20=%20%5Csum_%7Bi=1%7D%5En%20f_i%20Z_%5Cnu(A_i),%0A"> which has mean <img src="https://latex.codecogs.com/png.latex?0"> and variance <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D%5Cleft(%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%20f(%5Comega)%5C,dZ_%5Cnu(%5Comega)%5Cright)%5E2%20=%20%5Csum_%7Bi=1%7D%5En%20f_i%5E2%20%5Cnu(A_i)%20=%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7Df(%5Comega)%5E2%5C,d%5Cnu(%5Comega),%0A"> where the first equality comes from noting that <img src="https://latex.codecogs.com/png.latex?%5Cint_%7BA_i%7D%20%5C,dZ_v(%5Comega)"> and <img src="https://latex.codecogs.com/png.latex?%5Cint_%7BA_j%7D%20%5C,%20dZ_v(%5Comega)"> are uncorrelated and the last equality comes from the definition of an integral of a piecewise constant function.</p>
<p>Moreover, we get the covariance <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Cmathbb%7BE%7D%5Cleft(%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%20f(%5Comega)%5C,dZ_%5Cnu(%5Comega)%5Coverline%7B%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%20g(%5Comega)%5C,dZ_%5Cnu(%5Comega)%7D%5Cright)%20&amp;=%20%5Csum_%7Bi=1%7D%5En%20%5Csum_%7Bj=1%7D%5En%20f_i%20g_j%20%5Cnu(A_i%20%5Ccap%20A_j)%20%5C%5C%0A&amp;=%20%5Csum_%7Bi=1%7D%5En%20f_i%5Coverline%7Bg%7D_i%20%5Cnu(A_i)%20%5C%5C%0A&amp;=%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7Df(%5Comega)%5Coverline%7Bg(%5Comega)%7D%5C,d%5Cnu(%5Comega).%0A%5Cend%7Balign*%7D"></p>
<p>A nice thing is that while these piecewise constant functions are quite simple, we can approximate <em>any</em><sup>80</sup> function arbitrarily well by a simple function. This is the same fact we use to build ourselves ordinary<sup>81</sup> integrals.</p>
<p>In particular, the brave and the bold among you might just say “we can take limits here and <em>define</em>” an integral with respect to the <img src="https://latex.codecogs.com/png.latex?%5Cnu">-noise this way. And, indeed, that works. You get that, for any <img src="https://latex.codecogs.com/png.latex?f%5Cin%20L%5E2(%5Cnu)">,</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D%5Cleft(%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%20f(%5Comega)%5C,d%20Z_%5Cnu(%5Comega)%5Cright)%20=%200%0A"> and, for any <img src="https://latex.codecogs.com/png.latex?f,g%20%5Cin%20L%5E2(%5Cnu)">, <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D%5Cleft(%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%20f(%5Comega)%5C,d%20Z_%5Cnu(%5Comega)%5Coverline%7B%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%20g(%5Comega)%5C,d%20Z_%5Cnu(%5Comega)%7D%5Cright)%20=%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%20f(%5Comega)%5Coverline%7Bg(%5Comega)%7D%5C,d%20%5Cnu(%5Comega).%0A"></p>
<p>If we define <img src="https://latex.codecogs.com/png.latex?%0Au(s)%20=%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7De%5E%7Bi%5Comega%5ETs%7D%5C,dZ_%5Cnu(%5Comega),%0A"> then it follows immediately that <img src="https://latex.codecogs.com/png.latex?u(s)"> is mean zero and has covariance function <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(u(s)%5Coverline%7Bu(s')%7D)%20=%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7De%5E%7Bi%5Comega%5ET(s%20-%20s')%7D%5C,%20d%5Cnu(%5Comega)%20=%20c(s-s').%0A"> That is <img src="https://latex.codecogs.com/png.latex?%5Cnu"> is the spectral measure associated with the correlation function.</p>
<p>Combining this with Bochner’s theorem, we have just proved<sup>82</sup> the spectral representation theorem for general<sup>83</sup> (weakly) stationary<sup>84</sup> random fields<sup>85</sup>.</p>
<div id="thm-spectral-rep" class="theorem">
<p><span class="theorem-title"><strong>Theorem 6 (Spectral representation theorem)</strong></span> If <img src="https://latex.codecogs.com/png.latex?%5Cnu"> is a finite, non-negative measure on <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5Ed"> and <img src="https://latex.codecogs.com/png.latex?W"> is a complex <img src="https://latex.codecogs.com/png.latex?%5Cnu">-noise, then the complex-valued process <img src="https://latex.codecogs.com/png.latex?%0Au(s)%20=%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7De%5E%7Bi%5Comega%5ETs%7D%5C,dZ_%5Cnu(%5Comega)%0A"> has mean zero an covariance <img src="https://latex.codecogs.com/png.latex?%0Ac(s,s')%20=%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7De%5E%7Bi%5Comega%5ET(s-s')%7D%5C,d%5Cnu(%5Comega)%0A"> and is therefore weakly stationary. If <img src="https://latex.codecogs.com/png.latex?Z_%5Cnu(A)%20%5Csim%20N(0,%20%5Cnu(A))"> then <img src="https://latex.codecogs.com/png.latex?u(s)"> is a Gaussian process.</p>
<p>Furthermore, every mean-square continuous mean zero stationary Gaussian process with covariance function <img src="https://latex.codecogs.com/png.latex?c(s,s')=%20c(s-s')"> and corresponding spectral measure <img src="https://latex.codecogs.com/png.latex?%5Cnu"> has an associated <img src="https://latex.codecogs.com/png.latex?%5Cnu">-noise <img src="https://latex.codecogs.com/png.latex?Z_%5Cnu(%5Ccdot)"> such that <img src="https://latex.codecogs.com/png.latex?%0Au(s)%20=%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7De%5E%7Bi%5Comega%5ETs%7D%5C,dZ_%5Cnu(%5Comega)%0A"> holds in the mean-square sense for all <img src="https://latex.codecogs.com/png.latex?s%20%5Cin%20%5Cmathbb%7BR%7D%5Ed">.</p>
<p><img src="https://latex.codecogs.com/png.latex?Z_%5Cnu(%5Ccdot)"> is called the <em>spectral process</em> <sup>86</sup> associated with <img src="https://latex.codecogs.com/png.latex?u(%5Ccdot)">. When it exists, the density of <img src="https://latex.codecogs.com/png.latex?%5Cnu">, denoted by <img src="https://latex.codecogs.com/png.latex?f(%5Comega)">, is called the <em>spectral density</em> or the <em>power spectrum</em>.</p>
</div>
<p>All throughout here I used complex numbers and complex Gaussian processes because, believe it or not, it makes things easier. But you will be pleased to know that <img src="https://latex.codecogs.com/png.latex?u(%5Ccdot)"> will be real-valued as long as the spectral density <img src="https://latex.codecogs.com/png.latex?f(%5Comega)"> is symmetric around the origin. And it always is.</p>
</section>
<section id="the-cameron-martin-space-of-a-stationary-gaussian-process" class="level3">
<h3 class="anchored" data-anchor-id="the-cameron-martin-space-of-a-stationary-gaussian-process">The Cameron-Martin<sup>87</sup> space of a stationary Gaussian process</h3>
<p>One particular advantage of stationary processes is that we get a straightforward characterization of the Cameron-Martin space inner product. Recall that the Cameron-Martin space (or reproducing kernel Hilbert space) associated with a Gaussian process is the<sup>88</sup> space of all functions of the form <img src="https://latex.codecogs.com/png.latex?%0Ah(s)%20=%20%5Csum_%7Bk=1%7D%5EK%20c_k%20c(s,%20s_k),%0A"> where <img src="https://latex.codecogs.com/png.latex?K"> is finite, <img src="https://latex.codecogs.com/png.latex?c_k"> are real, and <img src="https://latex.codecogs.com/png.latex?s_k"> are distinct points in <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5Ed">. This is the space that the posterior mean for GP regression lives in.</p>
<p>The inner product associated with this space can be written in terms of the spectral density <img src="https://latex.codecogs.com/png.latex?f"> as<sup>89</sup> <img src="https://latex.codecogs.com/png.latex?%0A%5Clangle%20h,%20h'%5Crangle%20=%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%20%5Chat%20h(%5Comega)%20%5Coverline%7B%5Chat%20%7Bh'%7D(%5Comega)%7D%20%5Cfrac%7B1%7D%7Bf(%5Comega)%7D%5C,d%5Comega.%0A"> In particular, for a Matérn Gaussian process, the corresponding norm is <img src="https://latex.codecogs.com/png.latex?%0A%5C%7C%20h%5C%7C_%7BH_u%7D%20=%20C_%5Ctext%7BMat%C3%A9rn%7D%5Ckappa%5E%7B2%5Cnu%7D%5Csigma%5E2%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%7C%5Chat%20h(%5Comega)%7C%5E2%20(%5Ckappa%5E2%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%7B%5Cnu+d/2%7D%5C,d%5Comega.%0A"> For those of you familiar with function spaces, this is equivalent to the norm on <img src="https://latex.codecogs.com/png.latex?H%5E%7B%5Cnu+d/2%7D(%5Cmathbb%7BR%7D%5Ed)">. One way to interpret this is that the <em>set</em> of functions in the Cameron-Martin space for a Matérn GP only depends on <img src="https://latex.codecogs.com/png.latex?%5Cnu">, while the norm and inner product (and hence the posterior mean and all that stuff) depend on <img src="https://latex.codecogs.com/png.latex?%5Cnu">, <img src="https://latex.codecogs.com/png.latex?%5Ckappa">, and <img src="https://latex.codecogs.com/png.latex?%5Csigma">. This observation is going to be important.</p>
</section>
<section id="another-look-at-equivalence-and-singularity" class="level3">
<h3 class="anchored" data-anchor-id="another-look-at-equivalence-and-singularity">Another look at equivalence and singularity</h3>
<p>It would’ve been a bit of an odd choice to spend all this time talking about spectral representations and never using them. So in this section, I’m going to cover the reason for the season: singularity or absolute continuity of Gaussian measures.</p>
<p>The Feldman-Hájek theorem quoted is true on quite general sets of functions. However, if we are willing to restrict ourselves to a separable<sup>90</sup> Hilbert<sup>91</sup> space there is a much more refined version of the theorem that we can use.</p>
<div id="thm-continuity2" class="theorem">
<p><span class="theorem-title"><strong>Theorem 7 (Feldman-Hájek theorem (Taylor’s<sup>92</sup> version))</strong></span> Two Gaussian measures <img src="https://latex.codecogs.com/png.latex?%5Cmu_1"> (mean <img src="https://latex.codecogs.com/png.latex?m_1">, covariance operator<sup>93</sup> <img src="https://latex.codecogs.com/png.latex?C_1">) and <img src="https://latex.codecogs.com/png.latex?%5Cmu_2"> (mean <img src="https://latex.codecogs.com/png.latex?m_2">, covariance operator <img src="https://latex.codecogs.com/png.latex?C_2">) on a <em>separable Hilbert space</em> <img src="https://latex.codecogs.com/png.latex?X"> are absolutely continuous <em>if and only if</em></p>
<ol type="1">
<li><p>The Cameron-Martin spaces associated with <img src="https://latex.codecogs.com/png.latex?%5Cmu_1"> and <img src="https://latex.codecogs.com/png.latex?%5Cmu_2"> are the same (considered as sets of functions. They usually will not have the same inner products.),</p></li>
<li><p><img src="https://latex.codecogs.com/png.latex?m_1%20-%20m_2"> is in the<sup>94</sup> Cameron-Martin space, and</p></li>
<li><p>The operator <img src="https://latex.codecogs.com/png.latex?T%20=%20C_1%5E%7B-1/2%7DC_2C_1%5E%7B-1/2%7D%20-%20I"> is a Hilbert-Schmidt operator, that is it has a countable set of eigenvalues <img src="https://latex.codecogs.com/png.latex?%5Cdelta_k"> and corresponding eigenfunctions <img src="https://latex.codecogs.com/png.latex?%5Cphi_k"> that satisfy <img src="https://latex.codecogs.com/png.latex?%5Cdelta_k%20%3E%20-1"> and <img src="https://latex.codecogs.com/png.latex?%0A%5Csum_%7Bk=1%7D%5E%7B%5Cinfty%7D%5Cdelta_k%5E2%20%3C%20%5Cinfty.%0A"></p></li>
</ol>
<p>When these three conditions are fulfilled, the Radon-Nikodym derivative is <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7Bd%5Cmu_2%7D%7Bd%5Cmu_1%7D%20=%20%5Cexp%5Cleft(-%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bk=1%7D%5E%5Cinfty%20%5Cleft(%5Cfrac%7B%5Cdelta_k%7D%7B1%20+%20%5Cdelta_k%7D%5Ceta_k%5E2%20-%20%5Clog(1+%5Cdelta_k)%5Cright)%5Cright%5D,%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Ceta_k"> is an sequence of N(0,1) random variables<sup>95</sup> <sup>96</sup> (under <img src="https://latex.codecogs.com/png.latex?%5Cmu_1">).</p>
<p>Otherwise, the two measures are singular.</p>
</div>
<p>This version of Feldman-Hájek is considerably more useful than its previous incarnation. The first condition basically says that the posterior means from the two priors will have the same smoothness and is rarely a problem. Typically the second condition is fulfilled in practice (for example, we always set the mean to zero).</p>
<p>The third condition is where all of the action is. This is, roughly speaking, a condition that says that <img src="https://latex.codecogs.com/png.latex?C_1"> and <img src="https://latex.codecogs.com/png.latex?C_2"> aren’t toooooo different. To understand this, we need to look a little at what the <img src="https://latex.codecogs.com/png.latex?%5Cdelta_k"> values actually are. It turns out to actually be easier to ask about <img src="https://latex.codecogs.com/png.latex?1+%20%5Cdelta_k">, which are the eigenvalues of <img src="https://latex.codecogs.com/png.latex?C_1%5E%7B-1/2%7DC_2%20C_1%5E%7B-1/2%7D">. In that case, we are trying to find the orthonormal system of functions <img src="https://latex.codecogs.com/png.latex?%5Cphi_k%5Cin%20X"> such that <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0AC_1%5E%7B-1/2%7DC_2%20C_1%5E%7B-1/2%7D%5Cphi_k%20&amp;=%20(1+%5Cdelta_k)%20%5Cphi_k%20%5C%5C%0AC%5E%7B-1/2%7DC_2%20%5Cpsi_k%20&amp;=%20(1+%5Cdelta_k)%20C_1%5E%7B1/2%7D%5Cpsi_k%20%5C%5C%0AC_2%5Cpsi_k%20&amp;=(1+%5Cdelta_k)%20C_1%5Cpsi_k,%0A%5Cend%7Balign*%7D"> where <img src="https://latex.codecogs.com/png.latex?%5Cpsi_k%20=%20C_1%5E%7B-1/2%7D%5Cphi_k">.</p>
<p>Hence, we can roughly interpret the <img src="https://latex.codecogs.com/png.latex?%5Cdelta_k"> as the eigenvalues of <img src="https://latex.codecogs.com/png.latex?%0AC_1%5E%7B-1%7DC_2%20-%20I.%0A"> The Hilbert-Schmidt condition is then requiring that <img src="https://latex.codecogs.com/png.latex?C_1%5E%7B-1%7DC_2"> is not infinitely far from the identity mapping.</p>
<p>A particularly nice version of this theorem occurs when <img src="https://latex.codecogs.com/png.latex?C_1"> and <img src="https://latex.codecogs.com/png.latex?C_2"> have the <em>same</em> eigenvectors. This is a fairly restrictive assumption, but we are going to end up using it later, so it’s worth specialising. In that case, assuming <img src="https://latex.codecogs.com/png.latex?C_j"> has eigenvalues <img src="https://latex.codecogs.com/png.latex?%5Clambda_k%5E%7B(j)%7D"> and corresponding <img src="https://latex.codecogs.com/png.latex?L%5E2">-orthogonal eigenfunctions <img src="https://latex.codecogs.com/png.latex?%5Cphi_k(%5Ccdot)">, we can write<sup>97</sup> <img src="https://latex.codecogs.com/png.latex?%0A%5BC_jh%5D(s)%20=%20%5Csum_%7Bk=1%7D%5E%5Cinfty%20%5Clambda_k%5E%7B(j)%7D%20%5Clangle%5Cphi_k,%20h%5Crangle%20%5Cphi_k(s).%0A"> Using the orthogonality of the eigenfunctions, we can show<sup>98</sup> that <img src="https://latex.codecogs.com/png.latex?%0A%5BC_j%5E%7B%5Cbeta%7Dh%5D(s)=%5Csum_%7Bk=1%7D%5E%5Cinfty%20(%5Clambda_k%5E%7B(j)%7D)%5E%5Cbeta%20%5Clangle%5Cphi_k,%20h%5Crangle%20%5Cphi_k(s).%0A"></p>
<p>With a bit of effort, we can see that <img src="https://latex.codecogs.com/png.latex?%0A(C_1%5E%7B-1/2%7DC_2C_1%5E%7B-1/2%7D%20-%20I)h%20=%20%5Csum_%7Bk=1%7D%5E%5Cinfty%20%5Cfrac%7B%5Clambda_k%5E%7B(2)%7D%20-%20%5Clambda_k%5E%7B(1)%7D%7D%7B%5Clambda_k%5E%7B(1)%7D%7D%20%5Clangle%5Cphi_k,%20h%5Crangle%20%5Cphi_k%0A"> and so <img src="https://latex.codecogs.com/png.latex?%0A%5Cdelta_k%20=%20%5Cfrac%7B%5Clambda_k%5E%7B(2)%7D%20-%20%5Clambda_k%5E%7B(1)%7D%7D%7B%5Clambda_k%5E%7B(1)%7D%7D.%0A"> From that, we get<sup>99</sup> the KL divergence <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Coperatorname%7BKL%7D(%5Cmu_1%20%7C%7C%20%5Cmu_2)%20&amp;=%20%5Cmathbb%7BE%7D_%7B%5Cmu_1%7D%5Clog%5Cleft(%5Cfrac%7Bd%5Cmu_1%7D%7Bd%5Cmu_2%7D%5Cright)%20%5C%5C%0A&amp;=-%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bk=1%7D%5E%5Cinfty%20%5Cleft(%5Cfrac%7B%5Cdelta_k%7D%7B1%20+%20%5Cdelta_k%7D%20-%20%5Clog(1+%5Cdelta_k)%5Cright)%20%5C%5C%0A&amp;=%20%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bk=1%7D%5E%5Cinfty%20%5Cleft%5B%5Cfrac%7B%5Clambda_k%5E%7B(1)%7D%7D%7B%5Clambda_k%5E%7B(2)%7D%7D%20-1+%20%5Clog%5Cleft(%5Cfrac%7B%5Clambda_k%5E%7B(1)%7D%7D%7B%5Clambda_k%5E%7B(2)%7D%7D%5Cright)%5Cright%5D.%0A%5Cend%7Balign*%7D"></p>
<p>Possibly unsurprisingly, this is simply the sum of the one dimensional divergences <img src="https://latex.codecogs.com/png.latex?%0A%5Csum_%7Bk=1%7D%5E%5Cinfty%5Coperatorname%7BKL%7D(N(0,%5Clambda_k%5E%7B(1)%7D)%20%7C%7C%20N(0,%5Clambda_k%5E%7B(2)%7D)).%0A"> It’s fun to convince yourself that that <img src="https://latex.codecogs.com/png.latex?%5Csum_%7Bk=1%7D%5E%5Cinfty%20%5Cdelta_k%5E2%20%3C%20%5Cinfty"> is sufficient to ensure the sum converges.</p>
</section>
<section id="a-convenient-suffient-condition-for-absolute-continuity-which-turns-out-to-be-necessary-for-matérn-fields" class="level3">
<h3 class="anchored" data-anchor-id="a-convenient-suffient-condition-for-absolute-continuity-which-turns-out-to-be-necessary-for-matérn-fields">A convenient suffient condition for absolute continuity, which turns out to be necessary for Matérn fields</h3>
<p>Ok. So I lied. I suggested that we’d use all of that spectral stuff in the last section. And we didn’t! Because I’m dastardly. But this time I promise we will!</p>
<p>It turns out that even with our fancy version of Feldman-Hájek, it can be difficult<sup>100</sup> to work out whether two Gaussian processes are singular or equivalent. One of the big challenges is that the eigenvalues and eigenfunctions depend on the domain <img src="https://latex.codecogs.com/png.latex?D"> and so we would, in principle, have to check this quite complex condition for every single domain.</p>
<p>Thankfully, there is an easy to parse sufficient condition that we can use that show when two GPs are equivalent on <em>every</em> bounded domain. These conditions are stated in terms of the spectral densities.</p>
<div id="thm-sufficient" class="theorem">
<p><span class="theorem-title"><strong>Theorem 8 (Sufficent condition for equivalence (Thm 4 of <a href="https://www.google.com/search?client=safari&amp;rls=en&amp;q=on+absolute+continuity+of+measures+with+application+to+homogenous+gaussian+fields&amp;ie=UTF-8&amp;oe=UTF-8">Skorokhod and Yadrenko</a>))</strong></span> Let <img src="https://latex.codecogs.com/png.latex?u_1(%5Ccdot)"> and <img src="https://latex.codecogs.com/png.latex?u_2(%5Ccdot)"> be mean-zero Gaussian processes with spectral densities <img src="https://latex.codecogs.com/png.latex?f_j(%5Comega)">, <img src="https://latex.codecogs.com/png.latex?j=1,2">. Assume that <img src="https://latex.codecogs.com/png.latex?f_1(%5Comega)%5C%7C%5Comega%5C%7C%5E%5Calpha"> is bounded away from zero and infinity for some<sup>101</sup> <img src="https://latex.codecogs.com/png.latex?%5Calpha%3E0"> and <img src="https://latex.codecogs.com/png.latex?%0A%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%5Cleft(%5Cfrac%7Bf_2(%5Comega)%20-%20f_1(%5Comega)%7D%7Bf_1(%5Comega)%7D%5Cright)%5E2%5C,d%5Comega%20%3C%20%5Cinfty.%0A"> Then the joint distributions of <img src="https://latex.codecogs.com/png.latex?%5C%7Bu_1(s):%20s%20%5Cin%20D%5C%7D"> and <img src="https://latex.codecogs.com/png.latex?%5C%7Bu_2(s):%20s%20%5Cin%20D%5C%7D"> are equivalent measures for every bounded region <img src="https://latex.codecogs.com/png.latex?D">.</p>
</div>
<p>The <a href="https://pages.stat.wisc.edu/~wahba/stat860public/pdf1/skorokhod.yadrenko.1973.pdf">proof</a> of this is pretty nifty. Essentially it constructs the operator <img src="https://latex.codecogs.com/png.latex?T+I"> in a sneaky<sup>102</sup> way and then bounds its trace on rectangle containing <img src="https://latex.codecogs.com/png.latex?D">. That upper bound is finite precisely when the above integral is finite.</p>
<p>Now that we have a relatively simple condition for equivalence, let’s look at Matérn fields. In particular, we will assume <img src="https://latex.codecogs.com/png.latex?u_j(%5Ccdot)">, <img src="https://latex.codecogs.com/png.latex?j=1,2"> are two Matérn GPs with the same smoothness parameter <img src="https://latex.codecogs.com/png.latex?%5Cnu"> and other parameters<sup>103</sup> <img src="https://latex.codecogs.com/png.latex?(%5Ckappa_j,%20%5Csigma_j)">. <img src="https://latex.codecogs.com/png.latex?%0A%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%5Cleft(%5Cfrac%7Bf_2(%5Comega)%20-%20f_1(%5Comega)%7D%7Bf_1(%5Comega)%7D%5Cright)%5E2%5C,d%5Comega%20%20=%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%5Cleft(%5Cfrac%7B%5Ckappa_2%5E%7B2%5Cnu%7D%5Csigma_2%5E2(%5Ckappa_2%5E2%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%7B-%5Cnu%20-%20d/2%7D%20%7D%7B%5Ckappa_1%5E%7B2%5Cnu%7D%5Csigma_1%5E2(%5Ckappa_1%5E2%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%7B-%5Cnu%20-%20d/2%7D%7D-1%5Cright)%5E2%5C,d%5Comega.%0A"> We can save ourselves some trouble by considering two cases separately.</p>
<p><strong>Case 1:</strong> <img src="https://latex.codecogs.com/png.latex?%5Ckappa_1%5E%7B2%5Cnu%7D%5Csigma_1%5E2%20=%20%5Ckappa_2%5E%7B2%5Cnu%7D%5Csigma_2%5E2">.</p>
<p>In this case, we can make the change to spherical coordinates via the substitution <img src="https://latex.codecogs.com/png.latex?r%20=%20%5C%7C%5Comega%5C%7C"> and, again to save my poor fingers, let’s set <img src="https://latex.codecogs.com/png.latex?%5Calpha%20=%20%5Cnu%20+%20d/2">. The condition becomes <img src="https://latex.codecogs.com/png.latex?%0A%5Cint_0%5E%5Cinfty%5Cleft%5B%5Cleft(%5Cfrac%7B%5Ckappa_1%5E2%20+%20r%5E2%20%7D%7B%5Ckappa_2%5E2%20+%20r%5E2%7D%5Cright)%5E%7B%5Calpha%7D-1%5Cright%5D%5E2r%5E%7Bd-1%7D%5C,dr%20%3C%20%5Cinfty.%0A"> To check that this integral is finite, first note that, near <img src="https://latex.codecogs.com/png.latex?r=0">, the integrand is<sup>104</sup> <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(%7Br%5E%7Bd-1%7D%7D)">, so there is no problem there. Near <img src="https://latex.codecogs.com/png.latex?r%20=%20%5Cinfty"> (aka the other place bad stuff can happen), the integrand is <img src="https://latex.codecogs.com/png.latex?%0A2%5Calpha(%5Ckappa_1%5E2%20-%20%5Ckappa_2%5E2)%5E2%20r%5E%7Bd-5%7D%20+%20%5Cmathcal%7BO%7D(r%5E%7Bd-7%7D).%0A"> This is integrable for large <img src="https://latex.codecogs.com/png.latex?r"> whenever<sup>105</sup> <img src="https://latex.codecogs.com/png.latex?d%20%5Cleq%203">. Hence, the two fields are equivalent whenever <img src="https://latex.codecogs.com/png.latex?d%5Cleq%203"> and <img src="https://latex.codecogs.com/png.latex?%5Ckappa_1%5E%7B2%5Cnu%7D%5Csigma_1%5E2%20=%20%5Ckappa_2%5E%7B2%5Cnu%7D%5Csigma_2%5E2">. It is harder, but possible to show that the fields are singular when <img src="https://latex.codecogs.com/png.latex?d%3E4">. The case with <img src="https://latex.codecogs.com/png.latex?d=4"> is boring and nobody cares.</p>
<p><strong>Case 2: </strong> <img src="https://latex.codecogs.com/png.latex?%5Ckappa_1%5E%7B2%5Cnu%7D%5Csigma_1%5E2%20%5Cneq%20%5Ckappa_2%5E%7B2%5Cnu%7D%5Csigma_2%5E2">.</p>
<p>Let’s define <img src="https://latex.codecogs.com/png.latex?%5Csigma_3%20=%20%5Csigma_2(%5Ckappa_2/%5Ckappa_1)%5E%5Cnu">. Then it’s clear that <img src="https://latex.codecogs.com/png.latex?%5Ckappa_1%5E%7B2%5Cnu%7D%5Csigma_3%5E2%20=%20%5Ckappa_2%5E%7B2%5Cnu%7D%5Csigma_2%5E2"> and therefore the Matérn field <img src="https://latex.codecogs.com/png.latex?u_3"> with parameters <img src="https://latex.codecogs.com/png.latex?(%5Ckappa_1,%20%5Csigma_3,%20%5Cnu)"> is equivalent to <img src="https://latex.codecogs.com/png.latex?u_2(%5Ccdot)">.</p>
<p>We will now show that <img src="https://latex.codecogs.com/png.latex?u_1"> and <img src="https://latex.codecogs.com/png.latex?u_3"> are singular, which implies that <img src="https://latex.codecogs.com/png.latex?u_1"> and <img src="https://latex.codecogs.com/png.latex?u_2"> are singular. To do this, we just need to note that, as <img src="https://latex.codecogs.com/png.latex?u_1"> and <img src="https://latex.codecogs.com/png.latex?u_3"> have the <em>same</em> value of <img src="https://latex.codecogs.com/png.latex?%5Ckappa">, <img src="https://latex.codecogs.com/png.latex?%0Au_3(s)%20=%20%5Cfrac%7B%5Csigma_3%7D%7B%5Csigma_1%7Du_1(s).%0A"> We know, from the previous blog post, that <img src="https://latex.codecogs.com/png.latex?u_3"> and <img src="https://latex.codecogs.com/png.latex?u_1"> will be singular unless <img src="https://latex.codecogs.com/png.latex?%5Csigma_1%20=%20%5Csigma_3">, but this only happens when <img src="https://latex.codecogs.com/png.latex?%5Ckappa_1%5E%7B2%5Cnu%7D%5Csigma_1%5E2%20=%20%5Ckappa_2%5E%7B2%5Cnu%7D%5Csigma_2%5E2">, which is not true by assumption.</p>
<p>Hence we have proved the first part of the following Theorem due, in this form, to Zhang<sup>106</sup> (2004) and Anderes<sup>107</sup> (2010).</p>
<div id="thm-matern-equiv" class="theorem">
<p><span class="theorem-title"><strong>Theorem 9 (Thm 2 of <a href="https://www.stat.purdue.edu/~zhanghao/Paper/JASA2004.pdf">Zhang (2004)</a>)</strong></span> Two Gaussian process on <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5Ed">, <img src="https://latex.codecogs.com/png.latex?d%5Cleq%203">, with Matérn covariance functions with parameters <img src="https://latex.codecogs.com/png.latex?(%5Cell_j,%20%5Csigma_j,%20%5Cnu)">, <img src="https://latex.codecogs.com/png.latex?j=1,2"> induce equivalent Gaussian measures if and only if <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Csigma_1%5E2%7D%7B%5Cell_1%5E%7B2%5Cnu%7D%7D%20=%20%5Cfrac%7B%5Csigma_2%5E2%7D%7B%5Cell_2%5E%7B2%5Cnu%7D%7D.%0A"> When <img src="https://latex.codecogs.com/png.latex?d%20%3E%204">, the measures are always singular (<a href="https://projecteuclid.org/journals/annals-of-statistics/volume-38/issue-2/On-the-consistent-separation-of-scale-and-variance-for-Gaussian/10.1214/09-AOS725.full">Anderes, 2010</a>).</p>
</div>
</section>
</section>
<section id="part-3-deriving-the-pc-prior" class="level2">
<h2 class="anchored" data-anchor-id="part-3-deriving-the-pc-prior">Part 3: Deriving the PC prior</h2>
<p>With all of that in hand, we are finally (finally!) in a position to show that, in 3 or fewer dimensions, the PC prior distance is <img src="https://latex.codecogs.com/png.latex?d(%5Ckappa)%20=%20%5Ckappa%5E%7Bd/2%7D">. After this, we can put everything together! Hooray!</p>
<section id="approximating-the-kullback-leibler-divergence-for-a-matérn-random-field" class="level3">
<h3 class="anchored" data-anchor-id="approximating-the-kullback-leibler-divergence-for-a-matérn-random-field">Approximating the Kullback-Leibler divergence for a Matérn random field</h3>
<p>Now, you can find a proof of this in the appendix of our JASA paper, but to be honest it’s quite informal. But although you can sneak any old shite into JASA, this is a blog goddammit and a blog has integrity. So let’s do a significantly more rigorous proof of our argument.</p>
<p>To do this, we will need to find the KL divergence between <img src="https://latex.codecogs.com/png.latex?u_1">, with parameters <img src="https://latex.codecogs.com/png.latex?(%5Ckappa,%20%5Ctau%20%5Ckappa_1%5E%7B-%5Cnu%7D,%20%5Cnu)"> and a base model <img src="https://latex.codecogs.com/png.latex?u_0"> with parameters <img src="https://latex.codecogs.com/png.latex?(%5Ckappa_0,%20%5Ctau%20%5Ckappa_0%5E%7B-%5Cnu%7D,%20%5Cnu)">, where <img src="https://latex.codecogs.com/png.latex?%5Ckappa_0"> is some fixed, small number and <img src="https://latex.codecogs.com/png.latex?%5Ctau%20%3E0"> is fixed. We will actually be interested in the behaviour of the KL divergence as <img src="https://latex.codecogs.com/png.latex?%5Ckappa_0"> goes to zero. Why? Because <img src="https://latex.codecogs.com/png.latex?%5Ckappa_0%20=%200"> is our base model.</p>
<p>The specific choice of standard deviation in both models ensures that <img src="https://latex.codecogs.com/png.latex?%5Ckappa%5E%7B2%5Cnu%7D%5Csigma%5E2%20=%20%5Ckappa_0%5E%7B2%5Cnu%7D%5Csigma_0%5E2"> and so the KL divergence is finite.</p>
<p>In order to approximate the KL divergence, we are going to find a basis that simultaneously diagonalises both processes. In the paper, we simply declared that we could do this. And, morally, we can. But as I said a blog aims to a higher standard than mere morality. Here we strive for meaningless rigour.</p>
<p>To that end, we are going to spend a moment thinking about how this can be done in a way that isn’t intrinsically tied to a given domain <img src="https://latex.codecogs.com/png.latex?D">. There may well be a lot of different ways to do this, but the most obvious one is to notice that if <img src="https://latex.codecogs.com/png.latex?u(%5Ccdot)"> is <em>periodic</em> on the cube <img src="https://latex.codecogs.com/png.latex?%5B-L,L%5D%5Ed"> for some <img src="https://latex.codecogs.com/png.latex?L%20%5Cgg%200">, then it can be considered as a GP on a <img src="https://latex.codecogs.com/png.latex?d">-dimensional torus. If <img src="https://latex.codecogs.com/png.latex?L"> is large enough that <img src="https://latex.codecogs.com/png.latex?D%20%5Csubset%20%5B-L,L%5D%5Ed">, then we might be able to focus on our cube and forget all about the specific domain <img src="https://latex.codecogs.com/png.latex?D">.</p>
<p>A nice thing about periodic GPs is that we actually know their Karhunen-Loève<sup>108</sup> representation. In particular, if <img src="https://latex.codecogs.com/png.latex?c_p(%5Ccdot)"> is a stationary covariance function on a torus, then we<sup>109</sup> know that it’s eigenfunctions are <img src="https://latex.codecogs.com/png.latex?%0A%5Cphi_k(s)%20=%20e%5E%7B-%5Cfrac%7B2%5Cpi%20i%7D%7BL%7D%20k%5ETh%7D,%20%5Cquad%20k%20%5Cin%20%5Cmathbb%7BZ%7D%5Ed%0A"> and its eigenvalues are <img src="https://latex.codecogs.com/png.latex?%0A%5Clambda_k%20=%20%5Cint_%7B%5Cmathbb%7BT%7D%5Ed%7D%20e%5E%7B-%5Cfrac%7B2%5Cpi%20i%7D%7BL%7D%20k%5ETh%7D%20c_p(h)%5C,dh.%0A"> This gives<sup>110</sup> <img src="https://latex.codecogs.com/png.latex?%0Ac_p(h)%20=%20%5Cleft(%5Cfrac%7B2%5Cpi%7D%7BL%7D%5Cright)%5Ed%20%5Csum_%7Bk%20%5Cin%20%5Cmathbb%7BZ%7D%5Ed%7D%5Clambda_k%20%20e%5E%7B-%5Cfrac%7B2%5Cpi%20i%7D%7BL%7D%20k%5ETh%7D.%0A"></p>
<p>Now we have some work to do. Firstly, our process is not periodic<sup>111</sup> on <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5Ed">. That’s a bit of a barrier. Secondly, even if it were, we don’t actually know what <img src="https://latex.codecogs.com/png.latex?%5Clambda_k"> is going to be. This is probably<sup>112</sup> an issue.</p>
<p>So let’s make this sucker periodic. The trick is to note that, at long enough distances, <img src="https://latex.codecogs.com/png.latex?u(s)"> and <img src="https://latex.codecogs.com/png.latex?u(s')"> are almost uncorrelated. In particular, if <img src="https://latex.codecogs.com/png.latex?%5C%7Cs%20-%20s'%5C%7C%20%5Cgg%20%5Cell">, then <img src="https://latex.codecogs.com/png.latex?%5Coperatorname%7BCov%7D(u(s),%20u(s'))%20%5Capprox%200">. This means that if we are interested in <img src="https://latex.codecogs.com/png.latex?u(%5Ccdot)"> on a fixed domain <img src="https://latex.codecogs.com/png.latex?D">, then we can replace it with <img src="https://latex.codecogs.com/png.latex?u_p(s)"> that is a GP where the covariance function <img src="https://latex.codecogs.com/png.latex?c_p(%5Ccdot)"> is the periodic extension of <img src="https://latex.codecogs.com/png.latex?c(h)"> from <img src="https://latex.codecogs.com/png.latex?%5B-L,L%5D%5Ed"> to <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5Ed"> (aka we just repeat it!).</p>
<p>This repetition won’t be noticed on <img src="https://latex.codecogs.com/png.latex?D"> as long as <img src="https://latex.codecogs.com/png.latex?L"> is big enough. But we can run into the small<sup>113</sup> problem. This procedure can lead to a covariance function <img src="https://latex.codecogs.com/png.latex?c_p(%5Ccdot)"> that is <em>not</em> positive definite. Big problem. Huge.</p>
<p>It turns out that one way to fix this is is to use a smooth cutoff function <img src="https://latex.codecogs.com/png.latex?%5Cdelta(h)"> that is 1 on <img src="https://latex.codecogs.com/png.latex?%5B-L,L%5D%5Ed"> and 0 outside of <img src="https://latex.codecogs.com/png.latex?%5B-%5Cgamma,%5Cgamma%5D%5Ed">, where <img src="https://latex.codecogs.com/png.latex?L%3E0"> is big enough so that <img src="https://latex.codecogs.com/png.latex?D%20%5Csubset%20%5B-L,%20L%5D%5Ed"> and <img src="https://latex.codecogs.com/png.latex?%5Cgamma%20%3E%20L">. We can then build the periodic extension of a stationary covariance function <img src="https://latex.codecogs.com/png.latex?c(%5Ccdot)"> as <img src="https://latex.codecogs.com/png.latex?%0Ac_p(h)%20=%20%5Csum_%7Bk%20%5Cin%20%5Cmathbb%7BZ%7D%5Ed%7Dc(x%20+%202Lk)%5Cdelta(x%20+%202%20Lk).%0A"> It’s important<sup>114</sup> to note that this is not the same thing as simply repeating the covariance function in a periodic manner. Near the boundaries (but outside of the domain) there will be some reach-around contamination. <a href="https://arxiv.org/abs/1603.05559">Bachmayr, Cohen, and Migliorati</a> show that this <em>does not work</em> for general stationary covariance functions, but does work under the additional condition that <img src="https://latex.codecogs.com/png.latex?%5Cgamma"> is big enough and there exist some <img src="https://latex.codecogs.com/png.latex?s%20%5Cgeq%20r%20%3E%20d/2"> and <img src="https://latex.codecogs.com/png.latex?0%20%3C%20%5Cunderline%7BC%7D%20%5Cleq%20%5Coverline%7BC%7D%20%3C%20%5Cinfty"> such that <img src="https://latex.codecogs.com/png.latex?%0A%5Cunderline%7BC%7D(1%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%7B-s%7D%20%5Cleq%20f(%5Comega)%5Cleq%20%5Coverline%7BC%7D(1%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%7B-r%7D.%0A"> This condition obviously holds for the Matérn covariance function and <a href="https://arxiv.org/abs/1905.13522">Bachmayr, Graham, Nguyen, and Scheichl</a><sup>115</sup> showed that <img src="https://latex.codecogs.com/png.latex?%5Cgamma%20%3E%20A(d,%20%5Cnu)%5Cell"> for some explicit function <img src="https://latex.codecogs.com/png.latex?A"> that only depends on <img src="https://latex.codecogs.com/png.latex?d"> and <img src="https://latex.codecogs.com/png.latex?%5Cnu"> is sufficient to make this work.</p>
<p>The nice thing about this procedure is that <img src="https://latex.codecogs.com/png.latex?c_p(s-s')%20=%20c(s-s')"> as long as <img src="https://latex.codecogs.com/png.latex?s,%20s'%20%5Cin%20D">, which means that our inference is going to be <em>identical</em> on our sample as it would be with the non-periodic covariance function! Splendid!</p>
<p>Now that we have made a valid periodic extension (and hence we know what the eigenfunctions are), we need to work out what the corresponding eigenvalues are.</p>
<p>We know that <img src="https://latex.codecogs.com/png.latex?%0A%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%20e%5E%7B-%5Cfrac%7B%5Cpi%20i%7D%7BL%7Dk%5ETh%7Dc(h)%5C,dh%20=%20f%5Cleft(%5Cfrac%7B%5Cpi%7D%7BL%7Dk%5Cright).%0A"> But it is not clear what will happen when we take the Fourier transform of <img src="https://latex.codecogs.com/png.latex?c_p(%5Ccdot)">.</p>
<p>Thankfully, the convolution theorem is here to help us and we know that, if <img src="https://latex.codecogs.com/png.latex?%5Ctheta(s)%20=%201%20-%20%5Cdelta(s)">, then <img src="https://latex.codecogs.com/png.latex?%0A%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%20e%5E%7B-%5Cfrac%7B%5Cpi%20i%7D%7BL%7Dk%5ETh%7D(c(h)%20-%20c_p(h))%5C,dh%20=%20(%5Chat%7B%5Ctheta%7D*f)%5Cleft(%5Cfrac%7B%5Cpi%7D%7BL%7Dk%5Cright),%0A"> where <img src="https://latex.codecogs.com/png.latex?*"> is the convolution operator.</p>
<p>In the perfect world, <img src="https://latex.codecogs.com/png.latex?(%5Chat%7B%5Ctheta%7D*f)(%5Comega)"> would be very close to zero, so we can just replace the Fourier transform of <img src="https://latex.codecogs.com/png.latex?c_p"> with the Fourier transform of <img src="https://latex.codecogs.com/png.latex?c">. And thank god we live in a perfect world.</p>
<p>The specifics here are a bit tedious<sup>116</sup>, but you can show that <img src="https://latex.codecogs.com/png.latex?(%5Chat%7B%5Ctheta%7D*f)(%5Comega)%20%5Crightarrow%200"> as <img src="https://latex.codecogs.com/png.latex?%5Cgamma%20%5Crightarrow%20%5Cinfty">. For Matérn fields, Bachmayr etc performed some heroic calculations to show that the difference is exponentially small as <img src="https://latex.codecogs.com/png.latex?%5Cgamma%20%5Crightarrow%20%5Cinfty"> and that, as long as <img src="https://latex.codecogs.com/png.latex?%5Cgamma%20%3E%20A(%5Cnu)%20%5Cell">, everything is positive definite and lovely.</p>
<p>So after a bunch of effort and a bit of a literature dive, we have finally got a simultaneous eigenbasis and we can write our KL divergence as <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Coperatorname%7BKL%7D(u_1%20%7C%7C%20u_0)%20&amp;=%20%5Cfrac%7B1%7D%7B2%7D%20%5Csum_%7B%5Comega%20%5Cin%20%5Cfrac%7B2%5Cpi%7D%7BL%7D%5Cmathbb%7BZ%7D%7D%5Cleft%5B%5Cfrac%7Bf_1(%5Comega)%7D%7Bf_0(%5Comega)%7D%20-%201%20-%20%5Clog%20%5Cleft(%5Cfrac%7Bf_1(%5Comega)%7D%7Bf_0(%5Comega)%7D%5Cright)%5Cright%5D%20%5C%5C%0A&amp;=%20%5Cfrac%7B1%7D%7B2%7D%20%5Csum_%7B%5Comega%20%5Cin%20%5Cfrac%7B2%5Cpi%7D%7BL%7D%5Cmathbb%7BZ%7D%7D%5Cleft%5B%5Cfrac%7B(%5Ckappa_0%5E2%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%5Calpha%7D%7B(%5Ckappa%5E2%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%5Calpha%7D%20-%201%20-%20%5Clog%20%5Cleft(%5Cfrac%7B(%5Ckappa_0%5E2%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%5Calpha%7D%7B(%5Ckappa%5E2%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%5Calpha%7D%20%5Cright)%5Cright%5D.%0A%5Cend%7Balign*%7D"> We can write this as <img src="https://latex.codecogs.com/png.latex?%0A%5Coperatorname%7BKL%7D(u_1%20%7C%7C%20u_0)%20=%5Cfrac%7B1%7D%7B2%7D%20%5Cleft(%5Cfrac%7BL%20%5Ckappa%7D%7B2%5Cpi%7D%5Cright)%5Ed%20%5Csum_%7B%5Comega%20%5Cin%20%5Cfrac%7B2%5Cpi%7D%7BL%7D%5Cmathbb%7BZ%7D%7D%5Cleft(%5Cleft%5B%5Cfrac%7B(%5Ckappa_0%5E2%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%5Calpha%7D%7B(%5Ckappa%5E2%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%5Calpha%7D%20-%201%20-%20%5Clog%20%5Cleft(%5Cfrac%7B(%5Ckappa_0%5E2%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%5Calpha%7D%7B(%5Ckappa%5E2%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%5Calpha%7D%20%5Cright)+%5Cmathcal%7BO%7D(e%5E%7B-C%5Cgamma%7D)%5Cright%5D%5Cleft(%5Cfrac%7B2%5Cpi%7D%7BL%20%5Ckappa%7D%5Cright)%5Ed%5Cright)%20,%0A"> for some constant <img src="https://latex.codecogs.com/png.latex?C"> that you can actually work out but I really don’t need to. The important thing is that the error is exponentially small in <img src="https://latex.codecogs.com/png.latex?%5Cgamma">, which is very large and spiraling rapidly out towards infinity.</p>
<p>Then, noticing that the sum is just a trapezium rule approximation to a <img src="https://latex.codecogs.com/png.latex?d">-dimensional integral, we get, as <img src="https://latex.codecogs.com/png.latex?%5Ckappa_0%20%5Crightarrow%200"> (and hence <img src="https://latex.codecogs.com/png.latex?L,%20%5Cgamma%5Crightarrow%20%5Cinfty">), <img src="https://latex.codecogs.com/png.latex?%0A%5Coperatorname%7BKL%7D(u_1%20%7C%7C%20u_0)%20=%20%5Cfrac%7B1%7D%7B2%7D%20%5Cleft(%5Cfrac%7BL%20%5Ckappa%7D%7B2%5Cpi%7D%5Cright)%5Ed%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%5Cleft%5B%5Cfrac%7B((%5Ckappa_0/%5Ckappa)%5E2%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%5Calpha%7D%7B(1%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%5Calpha%7D%20-%201%20-%20%5Clog%20%5Cleft(%5Cfrac%7B((%5Ckappa_0/%5Ckappa)%5E2%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%5Calpha%7D%7B(1%20+%20%5C%7C%5Comega%5C%7C%5E2)%5E%5Calpha%7D%20%5Cright)%5Cright%5D%20+%20%5Cmathcal%7BO%7D(1).%0A"> The integral converges whenever <img src="https://latex.codecogs.com/png.latex?d%20%5Cleq%203">.</p>
<p>This suggests that we can re-scale the distance by absorbing the <img src="https://latex.codecogs.com/png.latex?(L/(2%5Cpi%5Ed))"> into the constant in the PC prior, and get <img src="https://latex.codecogs.com/png.latex?%0Ad(%5Ckappa)%20=%20%5Ckappa%5E%7Bd/2%7D.%0A"></p>
<p>This distance does not depend on the specific domain <img src="https://latex.codecogs.com/png.latex?D"> (or the observation locations), which is an improvement over the PC prior I derived in the introduction. Instead, it only assumes that <img src="https://latex.codecogs.com/png.latex?D"> is bounded, which isn’t really a big restriction in practice.</p>
</section>
<section id="the-pc-prior-for-sigma-ell" class="level3">
<h3 class="anchored" data-anchor-id="the-pc-prior-for-sigma-ell">The PC prior for <img src="https://latex.codecogs.com/png.latex?(%5Csigma,%20%5Cell)"></h3>
<p>With all of this in hand, we can now construct the PC prior. Instead of working directly with <img src="https://latex.codecogs.com/png.latex?(%5Csigma,%20%5Cell)">, we will instead derive the prior for the estimable parameter <img src="https://latex.codecogs.com/png.latex?%5Ctau%20=%20%5Ckappa%5E%5Cnu%20%5Csigma">, and the non-estimable parameter <img src="https://latex.codecogs.com/png.latex?%5Ckappa">.</p>
<p>We know that <img src="https://latex.codecogs.com/png.latex?%5Ctau%5E2"> multiplies the covariance function of <img src="https://latex.codecogs.com/png.latex?u(%5Ccdot)">, so it makes sense to treat <img src="https://latex.codecogs.com/png.latex?%5Ctau"> like a standard deviation parameter. In this case, the PC prior is <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Ctau%20%5Cmid%20%5Ckappa)%20=%20%5Clambda_%5Ctau(%5Ckappa)e%5E%7B-%5Clambda_%5Ctau(%5Ckappa)%20%5Ctau%7D.%0A"> The canny among you would have noticed that I have made the scaling parameter <img src="https://latex.codecogs.com/png.latex?%5Ctau"> depend on <img src="https://latex.codecogs.com/png.latex?%5Ckappa">. I have done this because the quantity of interest that we want our prior to control is the marginal standard deviation <img src="https://latex.codecogs.com/png.latex?%5Csigma%20=%20%5Ckappa%5E%5Cnu%20%5Ctau">, which is a function of <img src="https://latex.codecogs.com/png.latex?%5Ckappa">. If we want to ensure <img src="https://latex.codecogs.com/png.latex?%5CPr(%5Csigma%20%3C%20U_%5Csigma)%20=%20%5Calpha_%5Csigma">, we need <img src="https://latex.codecogs.com/png.latex?%0A%5Clambda_%5Ctau(%5Ckappa)%20=%20-%5Ckappa%5E%5Cnu%5Cfrac%7B%5Clog%20%5Calpha_%5Csigma%7D%7BU_%5Csigma%7D.%0A"></p>
<p>We can now derive the PC prior for <img src="https://latex.codecogs.com/png.latex?%5Ckappa">. The distance that we just spent all that effort calculating, and an exponential prior on <img src="https://latex.codecogs.com/png.latex?%5Ckappa%5E%7Bd/2%7D"> leads<sup>117</sup> to the prior <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Ckappa)%20=%20%5Cfrac%7Bd%7D%7B2%7D%5Clambda_%5Cell%20%5Ckappa%5E%7Bd/2-1%7De%5E%7B-%5Clambda_%5Cell%20%5Ckappa%5E%7Bd/2%7D%7D.%0A"> Note that in this case, <img src="https://latex.codecogs.com/png.latex?%5Clambda_%5Cell"> does not depend on any other parameters: this is because <img src="https://latex.codecogs.com/png.latex?%5Cell%20=%20%5Csqrt%7B8%5Cnu%7D%5Ckappa%5E%7B-1%7D"> is our identifiable parameter. If we require <img src="https://latex.codecogs.com/png.latex?%5CPr(%5Cell%20%3C%20L_%5Cell)%20=%20%5Calpha_%5Cell">, we get <img src="https://latex.codecogs.com/png.latex?%0A%5Clambda_%5Cell%20=%20-%5Cleft(%5Cfrac%7BL_%5Cell%7D%7B%5Csqrt%7B8%5Cnu%7D%7D%5Cright)%5E%7Bd/2%7D%20%5Clog%20%5Calpha_%5Cell.%0A"></p>
<p>Hence the joint PC prior on <img src="https://latex.codecogs.com/png.latex?(%5Ckappa,%20%5Ctau)">, which is emphatically <em>not</em> the product of two independent priors, is <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Ckappa,%20%5Ctau)%20=%20%5Cfrac%7Bd%7D%7B2U_%5Csigma%7D%5Clog%20(%5Calpha_%5Cell)%5Clog(%5Calpha_%5Csigma)%5Cleft(%5Cfrac%7BL_%5Cell%7D%7B%5Csqrt%7B8%5Cnu%7D%7D%5Cright)%5E%7Bd/2%7D%20%5Ckappa%5E%7B%5Cnu%20+%20d/2-1%7D%5Cexp%5Cleft%5B-%5Cleft(%5Cfrac%7BL_%5Cell%7D%7B%5Csqrt%7B8%5Cnu%7D%7D%5Cright)%5E%7Bd/2%7D%7C%20%5Clog%20(%5Calpha_%5Cell)%7C%20%5Ckappa%5E%7Bd/2%7D%20-%5Cfrac%7B%7C%5Clog%20%5Calpha_%5Csigma%7C%7D%7BU_%5Csigma%7D%20%5Ctau%5Ckappa%5E%5Cnu%5Cright%5D.%0A"></p>
<p>Great gowns, beautiful gowns.</p>
<p>Of course, we don’t want the prior on some weird parameterisation (even though we needed that parameterisation to derive it). We want it on the original parameterisation. And here is where some magic happens! When we transform this prior to <img src="https://latex.codecogs.com/png.latex?(%5Cell,%20%5Csigma)">-space it magically<sup>118</sup> becomes the product of two independent priors! In particular, the PC prior that encodes <img src="https://latex.codecogs.com/png.latex?%5CPr(%5Cell%20%3C%20L_%5Cell)%20=%20%5Calpha_%5Cell"> and <img src="https://latex.codecogs.com/png.latex?%5CPr(%5Csigma%20%3E%20U_%5Csigma)%20=%20%5Calpha_%5Csigma"> is <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Cell,%20%5Csigma)%20=%20%5Cleft%5B%5Cfrac%7Bd%7D%7B2%7D%7C%5Clog(%5Calpha_%5Cell)%7CL_%5Cell%5E%7Bd/2%7D%20%5Cell%5E%7B-d/2-1%7D%5Cexp%5Cleft(-%7C%5Clog(%5Calpha_%5Cell)%7CL_%5Cell%5E%7Bd/2%7D%20%5Cell%5E%7B-d/2%7D%5Cright)%5Cright%5D%20%5Ctimes%20%5Cleft%5B%5Cfrac%7B%7C%5Clog(%5Calpha_%5Csigma)%7C%7D%7BU_%5Csigma%7D%5Cexp%5Cleft(-%5Cfrac%7B%7C%5Clog(%5Calpha_%5Csigma)%7C%7D%7BU_%5Csigma%7D%5Csigma%5Cright)%5Cright%5D.%0A"></p>
<p>It. Is. Finished.</p>


</section>
</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>The most common feedback was “I hung in for as long as I could”.↩︎</p></li>
<li id="fn2"><p>If you don’t think we’re gonna get our Maccabees on you’re dreamin’. Hell, I might have to post Enoch-ussy on main.↩︎</p></li>
<li id="fn3"><p><a href="https://projecteuclid.org/journals/statistical-science/volume-32/issue-1/Penalising-Model-Component-Complexity--A-Principled-Practical-Approach-to/10.1214/16-STS576.full">Penalised Complexity priors</a> (or PC priors) are my favourite thing. If you’re unfamilliar with them, I strongly recommend you read the <a href="https://dansblog.netlify.app/posts/2022-08-29-priors4/priors4.html">previous post</a> on PC priors to get a good grip on what they are, but essentially they’re a way to construct principled, weakly informative prior distributions. The key tool for PC priors is the Kullback-Leibler divergence between a model with parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> and a fixed base model with parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta_0">. Computing the KL divergence between two GPs is, as we will see, a challenge.↩︎</p></li>
<li id="fn4"><p>Fun fact: when we were starting to work on PC priors we were calling them PCP priors, but then I remembered that one episode of CSI where some cheerleaders took PCP and ate their friend and we all agreed that that wasn’t the vibe we were going for.↩︎</p></li>
<li id="fn5"><p>you might just need to trust me at some points↩︎</p></li>
<li id="fn6"><p>It could be easily more complex with multilevel component, multiple GPs, time series components etc etc. But the simplest example is a GP regression.↩︎</p></li>
<li id="fn7"><p>The GP has mean zero for the same reason we usually centre our covariates: it lets the intercept model the overall mean.↩︎</p></li>
<li id="fn8"><p>Not just the likelihood but also everything else in the model↩︎</p></li>
<li id="fn9"><p>A challenge with reference priors is that they are often improper (aka they don’t integrate to 1). This causes some conceptual difficulties, but there is a whole theory of Bayes that’s mostly fine with this as long as the resulting posterior integrates to one. But this is by no means guaranteed and is typically only checked in very specific cases. Jim Berger, one of the bigger proponents of reference prior, used to bring his wife to conference poster sessions. When she got bored, she would simply find a grad student and ask them if they’d checked if the posterior was proper. Sometimes you need to make your own fun.↩︎</p></li>
<li id="fn10"><p>Hope has no place in statistics.↩︎</p></li>
<li id="fn11"><p>Remember that any number on the logit scale outside of <img src="https://latex.codecogs.com/png.latex?%5B-3,3%5D"> might as well be the same number↩︎</p></li>
<li id="fn12"><p><code>log(.Machine$integer.max) = 21.48756</code>↩︎</p></li>
<li id="fn13"><p><img src="https://latex.codecogs.com/png.latex?e%5E5%20%5Capprox%20148">, so 70% of the prior mass is less than that. 90% of the prior mass is less than <img src="https://latex.codecogs.com/png.latex?e%5E%7B10%7D%20%5Capprox%2022026"> and 99% is less than <img src="https://latex.codecogs.com/png.latex?10%5E%7B13%7D">. This is still a weak prior.↩︎</p></li>
<li id="fn14"><p>Conceptually. The mathematics of what happens as <img src="https://latex.codecogs.com/png.latex?%5Cell%20%5Crightarrow%200"> aren’t really worth focusing on.↩︎</p></li>
<li id="fn15"><p>Or, you know, linear functionals↩︎</p></li>
<li id="fn16"><p>You can find Bayesians who say that they don’t care if cross validation works or not. You can find Bayesians who will say just about anything.↩︎</p></li>
<li id="fn17"><p>There are lots of parameterisations, but they’re all easy to move between. Compared to wikipedia, we use the <img src="https://latex.codecogs.com/png.latex?%5Csqrt%7B8%7D"> scaling rather than the <img src="https://latex.codecogs.com/png.latex?%5Csqrt%7B2%7D"> scaling.↩︎</p></li>
<li id="fn18"><p>Everything in this post can be easily generalised to having different length scales on each dimension.↩︎</p></li>
<li id="fn19"><p>If you’ve not run into these before, <img src="https://latex.codecogs.com/png.latex?x%5E%7B%5Cnu%7DK_%5Cnu(x)"> is <a href="https://functions.wolfram.com/Bessel-TypeFunctions/BesselK/06/01/04/01/03/">finite at zero</a> and decreases monotonically in an exponential-ish fashion as <img src="https://latex.codecogs.com/png.latex?x%5Crightarrow%20%5Cinfty">.↩︎</p></li>
<li id="fn20"><p>Possibly trying several values and either selecting the best or stacking all of the models↩︎</p></li>
<li id="fn21"><p>Field because by rights GPs with multidimensional parameter spaces should be called <em>Gaussian Fields</em> but we can’t have nice things so whatever. Live your lives.↩︎</p></li>
<li id="fn22"><p>At which point you need to ask yourself if one goes their faster. It’s chaos.↩︎</p></li>
<li id="fn23"><p>Asymptotics as copaganda.↩︎</p></li>
<li id="fn24"><p>I mean, if you can repeat experiments that’s obviously amazing, but there are lots of situations where that is either not possible or not the greatest use of resources. There’s an interesting sub-field of statistical earth sciences that focuses on working out the value of getting new types of observations in spatial data. This particular variant of the value of information problem throws up some fun corners.↩︎</p></li>
<li id="fn25"><p>or hoping↩︎</p></li>
<li id="fn26"><p>in 3 or fewer dimensions↩︎</p></li>
<li id="fn27"><p>I have not fact checked this↩︎</p></li>
<li id="fn28"><p>Basically everything you care about. Feel free to google the technical definition. But any space with a metric is locally convex. Lots of things that aren’t metric spaces are too.↩︎</p></li>
<li id="fn29"><p>measurable↩︎</p></li>
<li id="fn30"><p>This will seem a bit weird if it’s the first time you’ve seen the concept. In finite dimensions (aka most of statistics) <em>every</em> Gaussian is equivalent to every other Gaussian. In fact, it’s equivalent to every other continuous distribution with non-zero density on the whole of <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5Ed">. But shit gets weird when you’re dealing with functions and we just need to take a hit of the video head cleaner and breathe until we get used to it.↩︎</p></li>
<li id="fn31"><p>These measures <em>are not the same</em>. They just happen to be non-zero on the same sets.↩︎</p></li>
<li id="fn32"><p>This was proven in the monster GP blog post.↩︎</p></li>
<li id="fn33"><p>eg, computationally where Metropolis-Hastings acceptance probabilities have an annoying tendency to go to zero unless you are extraordinarily careful.↩︎</p></li>
<li id="fn34"><p>if it exists↩︎</p></li>
<li id="fn35"><p>This can be interpreted as the event that <img src="https://latex.codecogs.com/png.latex?%7C%5Chat%5Ctheta_n%20-%20%5Ctheta_0%7C%20%3E%20%5Cepsilon"> infinity many times for every epsilon. If this event occurs with any probability, it would strongly suggest that the estimator is not bloody converging.↩︎</p></li>
<li id="fn36"><p>or even many↩︎</p></li>
<li id="fn37"><p>Technically, a recent paper in JRSSSB said that if you add an iid Gaussian process you will get identifiability, but that’s maybe not the most realistic asymptotic approximation.↩︎</p></li>
<li id="fn38"><p>The fourth dimension is where mathematicians go to die↩︎</p></li>
<li id="fn39"><p>It’s computationally pretty expensive to plot the whole likelihood surface, so I’m just doing it along lines↩︎</p></li>
<li id="fn40"><p><code>partial</code> freezes a few parameter values, and <code>possibly</code> replaces any calls that return an error with an NA↩︎</p></li>
<li id="fn41"><p>That I could find↩︎</p></li>
<li id="fn42"><p>To be fair to van der Vaart and van Zanten their particular problem doesn’t necessarily have a ridge!↩︎</p></li>
<li id="fn43"><p>Saddle up for some spectral theory.↩︎</p></li>
<li id="fn44"><p>I’m terribly sorry.↩︎</p></li>
<li id="fn45"><p>I’m moderately sure that the preprint is pretty similar to the published version but I am not going to check.↩︎</p></li>
<li id="fn46"><p>Can’t stress enough that this is smoothness in a qualitative sense rather than in the more technical “how differentiable is it?” sense.↩︎</p></li>
<li id="fn47"><p>Truly going wild with the scare quotes. Always a sign of excellent writing.↩︎</p></li>
<li id="fn48"><p>For the usual smoothing spline with the square of the Laplacian, you need <img src="https://latex.codecogs.com/png.latex?%5Cnu%20=%202%20-%20d/2">. Other values of <img src="https://latex.codecogs.com/png.latex?%5Cnu"> still give you splines, just with different differentiability assumptions.↩︎</p></li>
<li id="fn49"><p>If your data is uniformly spaced, you can use the minimum. Otherwise, I suggest a low quantile of the distribution of distances. Or just a bit of nous.↩︎</p></li>
<li id="fn50"><p>The second half of this post is devoted to proving this. And it is <em>long</em>.↩︎</p></li>
<li id="fn51"><p>With this parameterisation it’s sometimes known as a Type-II Gumbel distribution. Because why not.↩︎</p></li>
<li id="fn52"><p>And <em>only</em> in this case! The reference prior changes a lot when there is a non-zero mean, when there are other covariates, when there is observation noise, etc etc. It really is quite a wobbly construction.↩︎</p></li>
<li id="fn53"><p>Readers, I have not bothered to show.↩︎</p></li>
<li id="fn54"><p>Part of why I’m reluctant to claim this is a good idea in particularly high dimensions is that volume in high dimensions is frankly a bit gross.↩︎</p></li>
<li id="fn55"><p>I, for one, love a sneaky transformation to spherical coordinates.↩︎</p></li>
<li id="fn56"><p>So why do all the technical shit to derive the PC prior when this option is just sitting there? Fuck you, that’s why.↩︎</p></li>
<li id="fn57"><p>This is sometimes called “automatic relevance determination” because words don’t have meaning anymore. Regardless, it’s a pretty sensible idea when you have a lot of covariates that can be quite different.↩︎</p></li>
<li id="fn58"><p>It is possible that a horseshoe-type prior on <img src="https://latex.codecogs.com/png.latex?%5Clog(%5Cell_j)"> would serve better, but there are going to be some issues as that will shrink the geometric mean of the length scales towards 1.↩︎</p></li>
<li id="fn59"><p>Part of the motivation for writing this was to actually have enough of the GP theory needed to think about these priors in a single place.↩︎</p></li>
<li id="fn60"><p>In fact, it’s isotropic, which is a stricter condition on most spaces. But there’s no real reason to specialise to isotropic processes so we simply won’t.↩︎</p></li>
<li id="fn61"><p>We are assuming that the mean is zero, but absent that assumption, we need to assume that the mean is constant.↩︎</p></li>
<li id="fn62"><p>For non-Gaussian processes, this property is known as <em>second-order</em> stationarity. For GPs this corresponds to strong stationary, which is a property of the distribution rather than the covariance function ↩︎</p></li>
<li id="fn63"><p>If you’ve been exposed to the concept of ergodicity of random fields you may be eligible for compensation.↩︎</p></li>
<li id="fn64"><p>Possibly with different length scales in different directions or some other form of anisotropy↩︎</p></li>
<li id="fn65"><p>This is normalisation is to make my life easier.↩︎</p></li>
<li id="fn66"><p>Let’s not lie, I just jumped straight to complex numbers. Some of you are having flashbacks.↩︎</p></li>
<li id="fn67"><p>Fourier-Stieljes↩︎</p></li>
<li id="fn68"><p>countably additive set-valued function. Like a probability but it doesn’t have to total to one↩︎</p></li>
<li id="fn69"><p>and complexify↩︎</p></li>
<li id="fn70"><p>or a Cameron-Martin space↩︎</p></li>
<li id="fn71"><p>That is, this measure bullshit isn’t just me pretending to be smart. It’s necessary.↩︎</p></li>
<li id="fn72"><p>Feeling annoyed by a reparameterisation this late in the blog post? Well tough. I’ve got to type this shit out and if I had to track all of those <img src="https://latex.codecogs.com/png.latex?%5Csqrt%7B8%5Cnu%7D">s I would simply curl up and die.↩︎</p></li>
<li id="fn73"><p>In my whole damn life I have never successfully got the constant correct, so maybe check that yourself. But truly it does not matter. All that matters for the purposes of this post is the density as a function of <img src="https://latex.codecogs.com/png.latex?(%5Comega,%20%5Csigma,%5Ckappa)">.↩︎</p></li>
<li id="fn74"><p>This is not restricted to being Gaussian, but for all intents and porpoises it is.↩︎</p></li>
<li id="fn75"><p>Countably additive set-valued function taking any value in <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BC%7D">↩︎</p></li>
<li id="fn76"><p><img src="https://latex.codecogs.com/png.latex?%5Cnu">-measurable↩︎</p></li>
<li id="fn77"><p><img src="https://latex.codecogs.com/png.latex?A%20%5Ccap%20B%20=%20%5Cemptyset">↩︎</p></li>
<li id="fn78"><p>If <img src="https://latex.codecogs.com/png.latex?Z_%5Cnu(A)"> is also Gaussian then this is the same as them being independent↩︎</p></li>
<li id="fn79"><p>This is the technical term for this type of function because mathematicians weren’t hugged enough as children.↩︎</p></li>
<li id="fn80"><p>for a particular value of “any”↩︎</p></li>
<li id="fn81"><p>for a particular value of “ordinary”↩︎</p></li>
<li id="fn82"><p>Well enough for a statistician anyway. You can look it up the details but if you desperately need to formalise it, you build an isomorphism between <img src="https://latex.codecogs.com/png.latex?%5Coperatorname%7Bspan%7D%5C%7Bu(s),%20s%20%5Cin%20%5Cmathbb%7BR%7D%5Ed%5C%7D"> and <img src="https://latex.codecogs.com/png.latex?%5Coperatorname%7Bspan%7D%5C%7Be%5E%7Bi%5Comega%5ETs%7D,%20s%20%5Cin%20%5Cmathbb%7BR%7D%5Ed%5C%7D"> and use that to construct <img src="https://latex.codecogs.com/png.latex?W">. It’s not <em>wildly</em> difficult but it’s also not actually interesting except for mathturbatory reasons.↩︎</p></li>
<li id="fn83"><p>Non-Gaussian!↩︎</p></li>
<li id="fn84"><p>On more spaces, the same construction still works. Just use whatever Fourier transform you have available.↩︎</p></li>
<li id="fn85"><p>or stochastic processes↩︎</p></li>
<li id="fn86"><p>Yes, it’s a stochastic process over some <img src="https://latex.codecogs.com/png.latex?%5Csigma">-algebra of sets in my definition. <em>Sometimes</em> people use <img src="https://latex.codecogs.com/png.latex?%0A%5Ctilde%7BZ%7D_%5Cnu(s)%20=%20Z_%5Cnu((-%5Cinfty,%20s_1%5D%5Ctimes%5Ccdots%20%5Ctimes%20(-%5Cinfty,%20s_d%5D)%0A"> as the spectral process and interpret the integrals as Lebesgue-Stieltjes integrals. All power to them! So cute! It makes literally no difference and truly I do not think it makes anything easier. By the time you’re like “you know what, I reckon Stieltjes integrals are the way to go” you’ve left “easier” a few miles back. You’ve still got to come up with an appropriate concept of an integral.↩︎</p></li>
<li id="fn87"><p>Also known as the Reproducing Kernel Hilbert Space even though it doesn’t actually have to be one. This is the space of all means. See the previous GP blog.↩︎</p></li>
<li id="fn88"><p>closure of the↩︎</p></li>
<li id="fn89"><p>In <a href="https://dansblog.netlify.app/posts/2021-11-03-yes-but-what-is-a-gaussian-process-or-once-twice-three-times-a-definition-or-a-descent-into-madness/yes-but-what-is-a-gaussian-process-or-once-twice-three-times-a-definition-or-a-descent-into-madness.html">the previous post</a>, I wrote this in terms of the inverse of the covariance operator. For a stationary operator, the covariance operator is (by the convolution theorem) <img src="https://latex.codecogs.com/png.latex?%0ACh(s)%20=%20%5Cint_%7B%5Cmathbb%7BR%7D%7De%5E%7Bi%5Comega%20s%7D%5Chat%7Bh%7D(%5Comega)%20f(%5Comega)%5C,d%5Comega%0A"> and it should be pretty easy to convince yourself that <img src="https://latex.codecogs.com/png.latex?%0AC%5E%7B-1%7Dh(s)%20=%20%5Cint_%7B%5Cmathbb%7BR%7D%7De%5E%7Bi%5Comega%20s%7D%5Chat%7Bh%7D(%5Comega)%20%5Cfrac%7B1%7D%7Bf(%5Comega)%7D%5C,d%5Comega.%0A">↩︎</p></li>
<li id="fn90"><p>ie one where we can represent functions using a Fourier series rather than a Fourier transform↩︎</p></li>
<li id="fn91"><p>ie one with an inner product↩︎</p></li>
<li id="fn92"><p>Bogachev’s Gaussian Measures book, Corollary 6.4.11 with some interpretation work to make it slightly more human-readable. I also added the minus sign he missed in the density.↩︎</p></li>
<li id="fn93"><p>Recall that this is the integral operator <img src="https://latex.codecogs.com/png.latex?C_1%20f%20=%20%5Cint_D%20c_1(x,x')f(x')%5C,d%20x'">.↩︎</p></li>
<li id="fn94"><p>Because of condition 1 if it’s in one of them it’s in the other too!↩︎</p></li>
<li id="fn95"><p>Technically, they are an orthonormal basis in the closure of <img src="https://latex.codecogs.com/png.latex?%5C%7B%5Cell%20-%5Cmu(%5Cell)%20:%20%5Cell%20%5Cin%20X%5E*%20%5C%7D"> under the <img src="https://latex.codecogs.com/png.latex?R_%7Bu_1%7D"> norm, but let’s just be friendly to ourselves and pretend <img src="https://latex.codecogs.com/png.latex?u_j"> have zero mean so these spaces are the same. The theorem is very explicit about what they are. If <img src="https://latex.codecogs.com/png.latex?%5Cphi_k"> are the (<img src="https://latex.codecogs.com/png.latex?X">-orthonormal) eigenfunctions corresponding to <img src="https://latex.codecogs.com/png.latex?%5Cdelta_k">, then <img src="https://latex.codecogs.com/png.latex?%0A%5Ceta_k%20=%20%5Cint_%7B%5Cmathbb%7BR%7D%5Ed%7D%20C_1%5E%7B1/2%7D%5Cphi_k(s)%5C,dW_1(s),%0A"> where <img src="https://latex.codecogs.com/png.latex?W_1(s)"> is the spectral process associated with <img src="https://latex.codecogs.com/png.latex?u_1">. Give or take, this the same thing I said in the main text.↩︎</p></li>
<li id="fn96"><p>After reading all of that, let me tell you that it simply does not matter even a little bit.↩︎</p></li>
<li id="fn97"><p>Yes - this is Mercer’s theorem again. The only difference is that we are assuming that the eigenfunctions are the same for each <img src="https://latex.codecogs.com/png.latex?j"> so they don’t need an index.↩︎</p></li>
<li id="fn98"><p><img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0AC_j%5E%5Cbeta%5BC_j%5E%7B-%5Cbeta%7Dh%5D%20&amp;=%20%5Csum_%7Bm=1%7D%5E%5Cinfty%20(%5Clambda_m%5E%7B(j)%7D)%5E%5Cbeta%20%5Cleft%5Clangle%5Cphi_m,%20%5Csum_%7Bk=1%7D%5E%5Cinfty%20(%5Clambda_k%5E%7B(j)%7D)%5E%7B-%5Cbeta%7D%20%5Clangle%5Cphi_k,%20h%5Crangle%20%5Cphi_k%5Cright%5Crangle%20%5Cphi_m%20%5C%5C%0A&amp;=%20%5Csum_%7Bm=1%7D%5E%5Cinfty%20(%5Clambda_m%5E%7B(j)%7D)%5E%5Cbeta%5Csum_%7Bk=1%7D%5E%5Cinfty%20(%5Clambda_k%5E%7B(j)%7D)%5E%7B-%5Cbeta%7D%20%5Clangle%5Cphi_k,%20h%5Crangle%20%5Cleft%5Clangle%5Cphi_m,%20%20%20%5Cphi_k%5Cright%5Crangle%20%5Cphi_m%20%5C%5C%0A&amp;=%5Csum_%7Bm=1%7D%5E%5Cinfty%20(%5Clambda_m%5E%7B(j)%7D)%5E%5Cbeta%20(%5Clambda_m%5E%7B(j)%7D)%5E%7B-%5Cbeta%7D%20%5Clangle%5Cphi_m,%20h%5Crangle%20%5Cphi_m%20%5C%5C%0A&amp;=%20h%0A%5Cend%7Balign*%7D">↩︎</p></li>
<li id="fn99"><p>You simply cannot make me care enough to prove that we can swap summation and expectation. Of course we bloody can. Also <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D_%7B%5Cmu_1%7D%20%5Ceta_k%5E2%20=%201">.↩︎</p></li>
<li id="fn100"><p>But not impossible. <a href="https://arxiv.org/abs/2005.08904">Kristin Kirchner and David Bolin</a> have done some very nice work on this recently.↩︎</p></li>
<li id="fn101"><p>This is a stronger condition than the one in the paper, but it’s a) readily verifiable and b) domain independent.↩︎</p></li>
<li id="fn102"><p>This is legitimately quite hard to parse. You’ve got to back-transform their orthogonal basis <img src="https://latex.codecogs.com/png.latex?g_k"> to an orthogonal basis on <img src="https://latex.codecogs.com/png.latex?L%5E2(D)">, which is where those inverse square roots come from!↩︎</p></li>
<li id="fn103"><p>Remember <img src="https://latex.codecogs.com/png.latex?%5Ckappa%20=%20%5Csqrt%7B8%5Cnu%7D%5Cell%5E%7B-1%7D"> because Daddy hates typing.↩︎</p></li>
<li id="fn104"><p>Through the magical power of WolframAlpha or, you know, my own ability to do simple Taylor expansions.↩︎</p></li>
<li id="fn105"><p><img src="https://latex.codecogs.com/png.latex?d-5%3C-1">↩︎</p></li>
<li id="fn106"><p><img src="https://latex.codecogs.com/png.latex?d%5Cleq%203">↩︎</p></li>
<li id="fn107"><p><img src="https://latex.codecogs.com/png.latex?d%3E4">↩︎</p></li>
<li id="fn108"><p>The other KL. The spicy, secret KL. KL after dark. What Loève but a second-hand Karhunen?↩︎</p></li>
<li id="fn109"><p>This is particularly bold use of the inclusive voice here. You may or may not know. Nevertheless it is true.↩︎</p></li>
<li id="fn110"><p>Specifically, this kinda funky set of normalisation choices that statisticians love to make gives↩︎</p></li>
<li id="fn111"><p>If you think a bit about it, a periodic function on <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5Ed"> can be thought of as a process on a torus by joining the approrpriate edges together!↩︎</p></li>
<li id="fn112"><p>We will see that this is not an issue, but you better bloody believe that our JASA paper just breezed the fuck past these considerations. Proof by citations that didn’t actually say what we needed them to say but were close enough for government work. Again, this is one of those situations where the thing we are doing is obviously valid, but the specifics (which are unimportant for our situation because we are going to send <img src="https://latex.codecogs.com/png.latex?%5Ckappa_0%5Crightarrow%200"> and <img src="https://latex.codecogs.com/png.latex?L%20%5Crightarrow%20%5Cinfty"> in a way that’s <em>much</em> faster than <img src="https://latex.codecogs.com/png.latex?%5Ckappa_0%5E%7B-1%7D">) are tedious and, I cannot stress this enough, completely unimportant in this context. But it’s a fucking blog and a blog has a type of fucking integrity that the Journal of the American Fucking Statistical Association does not even almost claim to have. I’ve had some red wine.↩︎</p></li>
<li id="fn113"><p>big↩︎</p></li>
<li id="fn114"><p>I cannot stress enough that we’re not bloody implementing this scheme, so it’s not even slightly important. Scan on, McDuff.↩︎</p></li>
<li id="fn115"><p>Fun fact. I worked in the same department as authors 2 and 4 for a while and they are both very lovely.↩︎</p></li>
<li id="fn116"><p>Check out either of the Bachmayr <em>et al.</em> papers if you’re interested.↩︎</p></li>
<li id="fn117"><p>Thanks Mr Jacobian!↩︎</p></li>
<li id="fn118"><p>I feel like I’ve typed enough, if you want to see the Jacobian read the appendices of the paper.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2022,
  author = {Simpson, Dan},
  title = {Priors for the Parameters in a {Gaussian} Process},
  date = {2022-09-27},
  url = {https://dansblog.netlify.app/posts/2022-09-07-priors5/priors5.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2022" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2022. <span>“Priors for the Parameters in a Gaussian
Process.”</span> September 27, 2022. <a href="https://dansblog.netlify.app/posts/2022-09-07-priors5/priors5.html">https://dansblog.netlify.app/posts/2022-09-07-priors5/priors5.html</a>.
</div></div></section></div> ]]></description>
  <category>Prior distributions</category>
  <category>Gaussian Processes</category>
  <category>PC priors</category>
  <guid>https://dansblog.netlify.app/posts/2022-09-07-priors5/priors5.html</guid>
  <pubDate>Mon, 26 Sep 2022 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2022-09-07-priors5/chair.JPG" medium="image"/>
</item>
<item>
  <title>A first look at multilevel regression; or Everybody’s got something to hide except me and my macaques</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey/everybodys-got-something-to-hide-except-me-and-my-monkey.html</link>
  <description><![CDATA[ 





<p><a href="https://www.elizablissmoreau.com">Eliza</a> knows a little something about monkeys. This will become relevant in a moment.</p>
<p>In about 2016, <a href="https://www.cell.com/current-biology/fulltext/S0960-9822(16)30460-2">Almeling <em>et al.</em></a> published a paper that suggested aged Barbary macaques maintained interest in members of their own species while losing interest in novel non-social stimuli (eg toys or puzzles with food inside).</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey/fx1.jpeg" class="img-fluid figure-img"></p>
<figcaption>I’d never come across the concept of a Graphical Abstract before, but here is the one for this paper. Graphic design is my passion. <a href="https://www.cell.com/current-biology/fulltext/S0960-9822(16)30460-2">Source</a></figcaption>
</figure>
</div>
<p>This is where Eliza—who knows a little something about monkeys—comes into frame: this did not gel with her experiences at all.</p>
<p>So Eliza (and <a href="https://scholar.google.com.au/citations?hl=en&amp;user=yg9U_okAAAAJ">Mark</a><sup>1</sup> <sup>2</sup>, who also knows a little something about monkeys) decided to look into it.</p>
<section id="what-are-the-stakes-according-to-the-papers-not-according-to-me-who-knows-exactly-nothing-about-this-type-of-work" class="level2">
<h2 class="anchored" data-anchor-id="what-are-the-stakes-according-to-the-papers-not-according-to-me-who-knows-exactly-nothing-about-this-type-of-work">What are the stake?s (According to the papers, not according to me, who knows exactly nothing<sup>3</sup> about this type of work)</h2>
<p>A big motivation for studying macaques and other non-human primates is that they’re good models of humans. This means that if there was solid evidence of macaques becoming less interested in novel stimuli as they age (while maintaining interest in people), this could suggest an evolutionary reason from this (commonly observed) behaviour in humans.</p>
<p>So if this result is true, it could help us understand the psychology of humans as they age (and in particular, the learned vs evolved trade off they are making).</p>
</section>
<section id="so-what-did-eliza-and-mark-do" class="level2">
<h2 class="anchored" data-anchor-id="so-what-did-eliza-and-mark-do">So what did Eliza and Mark do?</h2>
<p>There are a few things you can do when confronted with a result that contradicts your experience: you can complain about it on the Internet, you can mobilize a direct replication effort, or you can conduct your own experiments. Eliza and Mark opted for the third option, designing a <em>conceptual replication</em>.</p>
<p>Direct replications tell you more about the specific experiment that was conducted, but not necessarily more about the phenomenon under investigation. In a study involving aged monkeys<sup>4</sup>, it’s difficult to imagine how a direct replication could take place.</p>
<p>On the other hand, a conceptual replication has a lot more flexibility. It allows you to probe the question in a more targeted manner, appropriate for incremental science. In this case, Eliza and Mark opted to study only the claim that the monkeys lose interest in novel stimuli as they age (<a href="https://royalsocietypublishing.org/doi/full/10.1098/rsos.182237">paper here</a>). They did not look into the social claim. They also used a slightly different species of macaque (<em>M. mulatta</em> rather than <em>M. butterfly</em>). This is reasonable insofar as understanding macaques as a model for human behaviour.</p>
</section>
<section id="what-does-the-data-look-like" class="level2">
<h2 class="anchored" data-anchor-id="what-does-the-data-look-like">What does the data look like?</h2>
<p>The experiment used 243<sup>5</sup> monkeys aged between 4 and 30 and gave them a novel puzzle task (opening a fancy tube with food in it) for twenty minutes over two days. The puzzle was fitted with an activity tracker. Each monkey had two tries at the puzzle over two days. Monkeys had access to the puzzle for around<sup>6</sup> 20 minutes.</p>
<p>In order to match the original study’s analysis, Eliza and Mark divided the first two minutes into 15 second intervals and counted the number of intervals where the monkey interacted with the puzzle. They also measured the same thing over 20 minutes in order to see if there was a difference between short-term curiosity and more sustained exploration.</p>
<p>For each monkey, we have the following information:</p>
<ul>
<li>Monkey ID</li>
<li>Age (4-30)</li>
<li>Day (one or two)</li>
<li>Number of active intervals in the first two minutes (0-8)</li>
<li>Number of active intervals in the first twenty minutes (0-80)</li>
</ul>
<p>The data and their analysis are freely<sup>7</sup> available <a href="https://datadryad.org/stash/dataset/doi:10.5061/dryad.1bj133v">here</a>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2">acti_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"activity_data.csv"</span>) </span>
<span id="cb1-3">activity_2mins <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> acti_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(obs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(subj_id, Day) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">total=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(Activity), </span>
<span id="cb1-6">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">active_bins =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(Activity <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>), </span>
<span id="cb1-7">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">age =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(age)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">monkey =</span> subj_id, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">day =</span> Day) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>()</span>
<span id="cb1-10"></span>
<span id="cb1-11">activity_20minms80 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> acti_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(obs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">81</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(subj_id, Day) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">total=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(Activity), </span>
<span id="cb1-14">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">active_bins =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(Activity <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>), </span>
<span id="cb1-15">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">age =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(age)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">monkey =</span> subj_id, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">day =</span> Day) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>()</span>
<span id="cb1-18"></span>
<span id="cb1-19"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glimpse</span>(activity_20minms80)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Rows: 485
Columns: 5
$ monkey      &lt;dbl&gt; 0, 0, 88, 88, 636, 636, 760, 760, 1257, 1257, 1607, 1607, …
$ day         &lt;dbl&gt; 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2…
$ total       &lt;dbl&gt; 9881, 6356, 15833, 4988, 572, 308, 1097, 2916, 4884, 2366,…
$ active_bins &lt;int&gt; 42, 34, 43, 19, 10, 4, 12, 23, 50, 33, 9, 11, 13, 7, 30, 3…
$ age         &lt;dbl&gt; 29, 29, 29, 29, 28, 28, 30, 30, 27, 27, 27, 27, 27, 27, 26…</code></pre>
</div>
</div>
</section>
<section id="ok-mary-how-are-we-going-to-analyze-this-data" class="level2">
<h2 class="anchored" data-anchor-id="ok-mary-how-are-we-going-to-analyze-this-data">Ok Mary, how are we going to analyze this data?</h2>
<p>Eliza and Mark’s monkey data is an example of a fairly common type of experimental data, where the same subject is measured multiple times. It is useful to break the covariates down into three types: <em>grouping variables</em>, <em>group-level covariates</em>, and <em>individual-level covariates</em>.</p>
<p><em>Grouping variables</em> indicate what <em>group</em> each observation is in. We will see a lot of different ways of defining groups as we go on, but a core idea is that observations within a group should conceptually more similar to each other than observations in different groups. For Eliza and Mark, their grouping variable is <code>monkey</code>. This encodes the idea that different monkeys might have very different levels of curiosity, but the same monkey across two different days would probably have fairly similar levels of curiosity.</p>
<p><em>Group-level covariates</em> are covariates that describe a feature of the <em>group</em> rather than the observation. In this example, <code>age</code> is a group-level covariate, because the monkeys are the same age at each observation.</p>
<p><em>Individual-level covariates</em> are covariates that describe a feature that is specific to an observation. (The nomenclature here can be a bit confusing: the “individual” refers to individual observations, not to individual monkeys. All good naming conventions go to shit eventually.) The individual-level covariate is experiment day. This can be a bit harder to see than the other designations, but it’s a little clearer if you think of it as an indicator of whether this is the first time the monkey has seen the task or the second time. Viewed this way, it is very clearly a measurement of an property of an observation rather than of a group.</p>
<p>Eliza and Mark’s monkey data is an example of a fairly general type of experimental data where subjects (our groups) are given the same task under different experimental conditions (described through individual-level covariates). As we will see, it’s not uncommon to have much more complex group definitions (that involve several grouping covariates) and larger sets of both group-level and individual-level covariates.</p>
<p>So how do we fit a model to this data.</p>
<section id="there-are-just-too-many-monkeys-or-why-cant-we-just-analyse-this-with-regression" class="level3">
<h3 class="anchored" data-anchor-id="there-are-just-too-many-monkeys-or-why-cant-we-just-analyse-this-with-regression">There are just too many monkeys; or Why can’t we just analyse this with regression?</h3>
<p>The temptation with this sort of data is to fit a linear regression to it as a first model. In this case, we are using grouping, group-level, and individual-level covariates in the same way. Let’s suck it and see.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(broom)</span>
<span id="cb3-2">fit_lm <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(active_bins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> age<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(day) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(monkey), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> activity_2mins)</span>
<span id="cb3-3"></span>
<span id="cb3-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidy</span>(fit_lm) </span></code></pre></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"-6.688440e+00","3":"5.38400465","4":"-1.242280e+00","5":"0.215345864"},{"1":"age","2":"4.234598e-01","3":"0.21287645","4":"1.989228e+00","5":"0.047811367"},{"1":"factor(day)2","2":"1.879672e-03","3":"0.39658478","4":"4.739646e-03","5":"0.996222261"},{"1":"factor(monkey)88","2":"1.000000e+00","3":"1.70008898","4":"5.882045e-01","5":"0.556948129"},{"1":"factor(monkey)636","2":"-3.062500e+00","3":"1.60442379","4":"-1.908785e+00","5":"0.057482707"},{"1":"factor(monkey)760","2":"-2.937500e+00","3":"1.81569582","4":"-1.617837e+00","5":"0.107011195"},{"1":"factor(monkey)1257","2":"3.750000e-01","3":"1.53243949","4":"2.447079e-01","5":"0.806891726"},{"1":"factor(monkey)1607","2":"3.750000e-01","3":"1.53243949","4":"2.447079e-01","5":"0.806891726"},{"1":"factor(monkey)1632","2":"-2.125000e+00","3":"1.53243949","4":"-1.386678e+00","5":"0.166826659"},{"1":"factor(monkey)1860","2":"8.125000e-01","3":"1.48757785","4":"5.461899e-01","5":"0.585442774"},{"1":"factor(monkey)1869","2":"2.812500e+00","3":"1.48757785","4":"1.890657e+00","5":"0.059874728"},{"1":"factor(monkey)2191","2":"1.312500e+00","3":"1.48757785","4":"8.823068e-01","5":"0.378493776"},{"1":"factor(monkey)2637","2":"-2.500000e-01","3":"1.47232024","4":"-1.698000e-01","5":"0.865310455"},{"1":"factor(monkey)2747","2":"3.750000e+00","3":"1.47232024","4":"2.547000e+00","5":"0.011490788"},{"1":"factor(monkey)2833","2":"2.500000e-01","3":"1.47232024","4":"1.698000e-01","5":"0.865310455"},{"1":"factor(monkey)2912","2":"2.250000e+00","3":"1.47232024","4":"1.528200e+00","5":"0.127779712"},{"1":"factor(monkey)3536","2":"-3.125000e-01","3":"1.48757785","4":"-2.100730e-01","5":"0.833788896"},{"1":"factor(monkey)3545","2":"-8.125000e-01","3":"1.48757785","4":"-5.461899e-01","5":"0.585442774"},{"1":"factor(monkey)3696","2":"-3.812500e+00","3":"1.48757785","4":"-2.562891e+00","5":"0.010991577"},{"1":"factor(monkey)4009","2":"1.125000e+00","3":"1.53243949","4":"7.341236e-01","5":"0.463590006"},{"1":"factor(monkey)4392","2":"1.250000e-01","3":"1.53243949","4":"8.156929e-02","5":"0.935057207"},{"1":"factor(monkey)4624","2":"-4.375000e-01","3":"1.60442379","4":"-2.726836e-01","5":"0.785330940"},{"1":"factor(monkey)4686","2":"1.062500e+00","3":"1.60442379","4":"6.622315e-01","5":"0.508458252"},{"1":"factor(monkey)4776","2":"3.562500e+00","3":"1.60442379","4":"2.220423e+00","5":"0.027324613"},{"1":"factor(monkey)4795","2":"1.562500e+00","3":"1.60442379","4":"9.738699e-01","5":"0.331101681"},{"1":"factor(monkey)4886","2":"5.625000e-01","3":"1.60442379","4":"3.505932e-01","5":"0.726201115"},{"1":"factor(monkey)4964","2":"3.062500e+00","3":"1.60442379","4":"1.908785e+00","5":"0.057482707"},{"1":"factor(monkey)5252","2":"4.500000e+00","3":"1.70008898","4":"2.646920e+00","5":"0.008660496"},{"1":"factor(monkey)5332","2":"-5.000000e-01","3":"1.70008898","4":"-2.941023e-01","5":"0.768933965"},{"1":"factor(monkey)5388","2":"3.000000e+00","3":"1.70008898","4":"1.764614e+00","5":"0.078900697"},{"1":"factor(monkey)5413","2":"5.500000e+00","3":"1.70008898","4":"3.235125e+00","5":"0.001386994"},{"1":"factor(monkey)5453","2":"2.000000e+00","3":"1.70008898","4":"1.176409e+00","5":"0.240596983"},{"1":"factor(monkey)5498","2":"-5.000000e-01","3":"1.70008898","4":"-2.941023e-01","5":"0.768933965"},{"1":"factor(monkey)5604","2":"-2.680485e-14","3":"1.70008898","4":"-1.576674e-14","5":"1.000000000"},{"1":"factor(monkey)5607","2":"5.000000e-01","3":"1.70008898","4":"2.941023e-01","5":"0.768933965"},{"1":"factor(monkey)5646","2":"-1.747008e-14","3":"1.70008898","4":"-1.027598e-14","5":"1.000000000"},{"1":"factor(monkey)5774","2":"2.000000e+00","3":"1.70008898","4":"1.176409e+00","5":"0.240596983"},{"1":"factor(monkey)5895","2":"2.937500e+00","3":"1.81569582","4":"1.617837e+00","5":"0.107011195"},{"1":"factor(monkey)5967","2":"-6.250000e-02","3":"1.81569582","4":"-3.442207e-02","5":"0.972569200"},{"1":"factor(monkey)5990","2":"1.937500e+00","3":"1.81569582","4":"1.067084e+00","5":"0.287006110"},{"1":"factor(monkey)6008","2":"-6.250000e-02","3":"1.81569582","4":"-3.442207e-02","5":"0.972569200"},{"1":"factor(monkey)6098","2":"1.937500e+00","3":"1.81569582","4":"1.067084e+00","5":"0.287006110"},{"1":"factor(monkey)6159","2":"3.437500e+00","3":"1.81569582","4":"1.893214e+00","5":"0.059532471"},{"1":"factor(monkey)6258","2":"2.937500e+00","3":"1.81569582","4":"1.617837e+00","5":"0.107011195"},{"1":"factor(monkey)6264","2":"2.937500e+00","3":"1.81569582","4":"1.617837e+00","5":"0.107011195"},{"1":"factor(monkey)6287","2":"4.375000e-01","3":"1.81569582","4":"2.409545e-01","5":"0.809796132"},{"1":"factor(monkey)6505","2":"2.375000e+00","3":"1.94769661","4":"1.219389e+00","5":"0.223893579"},{"1":"factor(monkey)6512","2":"3.875000e+00","3":"1.94769661","4":"1.989530e+00","5":"0.047777867"},{"1":"factor(monkey)6516","2":"3.375000e+00","3":"1.94769661","4":"1.732816e+00","5":"0.084412906"},{"1":"factor(monkey)6807","2":"2.375000e+00","3":"1.94769661","4":"1.219389e+00","5":"0.223893579"},{"1":"factor(monkey)6877","2":"1.375000e+00","3":"1.94769661","4":"7.059621e-01","5":"0.480896409"},{"1":"factor(monkey)6930","2":"8.750000e-01","3":"1.94769661","4":"4.492486e-01","5":"0.653657739"},{"1":"factor(monkey)7261","2":"8.125000e-01","3":"2.09299182","4":"3.882003e-01","5":"0.698211947"},{"1":"factor(monkey)7289","2":"5.812500e+00","3":"2.09299182","4":"2.777125e+00","5":"0.005917223"},{"1":"factor(monkey)7307","2":"5.312500e+00","3":"2.09299182","4":"2.538233e+00","5":"0.011774807"},{"1":"factor(monkey)7321","2":"3.312500e+00","3":"2.09299182","4":"1.582663e+00","5":"0.114815239"},{"1":"factor(monkey)7333","2":"4.312500e+00","3":"2.09299182","4":"2.060448e+00","5":"0.040433840"},{"1":"factor(monkey)7451","2":"2.312500e+00","3":"2.09299182","4":"1.104878e+00","5":"0.270319009"},{"1":"factor(monkey)7588","2":"-6.250000e-02","3":"1.81569582","4":"-3.442207e-02","5":"0.972569200"},{"1":"factor(monkey)7598","2":"3.750000e-01","3":"1.94769661","4":"1.925351e-01","5":"0.847485879"},{"1":"factor(monkey)7600","2":"-1.625000e+00","3":"1.94769661","4":"-8.343189e-01","5":"0.404930992"},{"1":"factor(monkey)7623","2":"2.812500e+00","3":"2.09299182","4":"1.343770e+00","5":"0.180291735"},{"1":"factor(monkey)7707","2":"1.312500e+00","3":"2.09299182","4":"6.270928e-01","5":"0.531194536"},{"1":"factor(monkey)7721","2":"1.812500e+00","3":"2.09299182","4":"8.659852e-01","5":"0.387363127"},{"1":"factor(monkey)7828","2":"1.812500e+00","3":"2.09299182","4":"8.659852e-01","5":"0.387363127"},{"1":"factor(monkey)7942","2":"4.375000e+00","3":"1.94769661","4":"2.246243e+00","5":"0.025598621"},{"1":"factor(monkey)7992","2":"5.250000e+00","3":"2.24900632","4":"2.334364e+00","5":"0.020402519"},{"1":"factor(monkey)8053","2":"4.750000e+00","3":"2.24900632","4":"2.112044e+00","5":"0.035716332"},{"1":"factor(monkey)8094","2":"2.750000e+00","3":"2.24900632","4":"1.222762e+00","5":"0.222618881"},{"1":"factor(monkey)8103","2":"2.250000e+00","3":"2.24900632","4":"1.000442e+00","5":"0.318104340"},{"1":"factor(monkey)8179","2":"4.250000e+00","3":"2.24900632","4":"1.889723e+00","5":"0.060000176"},{"1":"factor(monkey)8183","2":"-2.500000e-01","3":"2.24900632","4":"-1.111602e-01","5":"0.911582214"},{"1":"factor(monkey)8338","2":"1.750000e+00","3":"2.24900632","4":"7.781214e-01","5":"0.437263858"},{"1":"factor(monkey)8340","2":"4.750000e+00","3":"2.24900632","4":"2.112044e+00","5":"0.035716332"},{"1":"factor(monkey)8343","2":"4.250000e+00","3":"2.24900632","4":"1.889723e+00","5":"0.060000176"},{"1":"factor(monkey)8371","2":"5.750000e+00","3":"2.24900632","4":"2.556685e+00","5":"0.011184196"},{"1":"factor(monkey)8544","2":"2.750000e+00","3":"2.24900632","4":"1.222762e+00","5":"0.222618881"},{"1":"factor(monkey)8611","2":"3.250000e+00","3":"2.24900632","4":"1.445083e+00","5":"0.149739064"},{"1":"factor(monkey)8729","2":"3.312500e+00","3":"2.09299182","4":"1.582663e+00","5":"0.114815239"},{"1":"factor(monkey)8834","2":"2.187500e+00","3":"2.41366237","4":"9.062991e-01","5":"0.365686559"},{"1":"factor(monkey)8873","2":"6.187500e+00","3":"2.41366237","4":"2.563532e+00","5":"0.010971865"},{"1":"factor(monkey)8956","2":"4.687500e+00","3":"2.41366237","4":"1.942069e+00","5":"0.053298767"},{"1":"factor(monkey)8963","2":"4.187500e+00","3":"2.41366237","4":"1.734915e+00","5":"0.084039563"},{"1":"factor(monkey)9009","2":"2.687500e+00","3":"2.41366237","4":"1.113453e+00","5":"0.266627767"},{"1":"factor(monkey)9014","2":"3.187500e+00","3":"2.41366237","4":"1.320607e+00","5":"0.187890272"},{"1":"factor(monkey)9023","2":"2.187500e+00","3":"2.41366237","4":"9.062991e-01","5":"0.365686559"},{"1":"factor(monkey)9117","2":"1.187500e+00","3":"2.41366237","4":"4.919909e-01","5":"0.623175457"},{"1":"factor(monkey)9344","2":"4.687500e+00","3":"2.41366237","4":"1.942069e+00","5":"0.053298767"},{"1":"factor(monkey)9355","2":"6.187500e+00","3":"2.41366237","4":"2.563532e+00","5":"0.010971865"},{"1":"factor(monkey)9411","2":"3.187500e+00","3":"2.41366237","4":"1.320607e+00","5":"0.187890272"},{"1":"factor(monkey)9416","2":"6.187500e+00","3":"2.41366237","4":"2.563532e+00","5":"0.010971865"},{"1":"factor(monkey)9542","2":"8.125000e-01","3":"1.48757785","4":"5.461899e-01","5":"0.585442774"},{"1":"factor(monkey)9598","2":"7.625000e+00","3":"2.58530938","4":"2.949357e+00","5":"0.003499076"},{"1":"factor(monkey)9609","2":"2.125000e+00","3":"2.58530938","4":"8.219519e-01","5":"0.411920078"},{"1":"factor(monkey)9615","2":"2.625000e+00","3":"2.58530938","4":"1.015352e+00","5":"0.310960381"},{"1":"factor(monkey)9662","2":"4.625000e+00","3":"2.58530938","4":"1.788954e+00","5":"0.074883259"},{"1":"factor(monkey)9680","2":"6.250000e-01","3":"2.58530938","4":"2.417506e-01","5":"0.809179879"},{"1":"factor(monkey)9683","2":"5.125000e+00","3":"2.58530938","4":"1.982355e+00","5":"0.048580117"},{"1":"factor(monkey)9771","2":"6.250000e-01","3":"2.58530938","4":"2.417506e-01","5":"0.809179879"},{"1":"factor(monkey)9847","2":"3.125000e+00","3":"2.58530938","4":"1.208753e+00","5":"0.227947352"},{"1":"factor(monkey)9926","2":"6.625000e+00","3":"2.58530938","4":"2.562556e+00","5":"0.011001901"},{"1":"factor(monkey)9940","2":"3.625000e+00","3":"2.58530938","4":"1.402153e+00","5":"0.162161553"},{"1":"factor(monkey)9986","2":"3.625000e+00","3":"2.58530938","4":"1.402153e+00","5":"0.162161553"},{"1":"factor(monkey)10069","2":"-6.250000e-02","3":"1.81569582","4":"-3.442207e-02","5":"0.972569200"},{"1":"factor(monkey)10084","2":"2.437500e+00","3":"1.81569582","4":"1.342461e+00","5":"0.180715137"},{"1":"factor(monkey)10290","2":"4.625000e+00","3":"2.58530938","4":"1.788954e+00","5":"0.074883259"},{"1":"factor(monkey)10399","2":"-3.750000e-01","3":"1.53243949","4":"-2.447079e-01","5":"0.806891726"},{"1":"factor(monkey)10646","2":"6.062500e+00","3":"2.76264459","4":"2.194455e+00","5":"0.029161550"},{"1":"factor(monkey)10790","2":"6.562500e+00","3":"2.76264459","4":"2.375441e+00","5":"0.018314388"},{"1":"factor(monkey)10826","2":"3.062500e+00","3":"2.76264459","4":"1.108539e+00","5":"0.268738626"},{"1":"factor(monkey)10850","2":"3.562500e+00","3":"2.76264459","4":"1.289525e+00","5":"0.198456854"},{"1":"factor(monkey)10866","2":"2.562500e+00","3":"2.76264459","4":"9.275533e-01","5":"0.354571211"},{"1":"factor(monkey)11252","2":"8.562500e+00","3":"2.76264459","4":"3.099385e+00","5":"0.002170583"},{"1":"factor(monkey)11440","2":"5.500000e+00","3":"2.94464048","4":"1.867800e+00","5":"0.063008689"},{"1":"factor(monkey)11446","2":"8.000000e+00","3":"2.94464048","4":"2.716800e+00","5":"0.007071614"},{"1":"factor(monkey)11475","2":"3.000000e+00","3":"2.94464048","4":"1.018800e+00","5":"0.309323781"},{"1":"factor(monkey)11483","2":"5.500000e+00","3":"2.94464048","4":"1.867800e+00","5":"0.063008689"},{"1":"factor(monkey)11484","2":"6.500000e+00","3":"2.94464048","4":"2.207400e+00","5":"0.028232894"},{"1":"factor(monkey)11542","2":"3.500000e+00","3":"2.94464048","4":"1.188600e+00","5":"0.235771839"},{"1":"factor(monkey)11717","2":"4.000000e+00","3":"2.94464048","4":"1.358400e+00","5":"0.175612222"},{"1":"factor(monkey)11754","2":"4.000000e+00","3":"2.94464048","4":"1.358400e+00","5":"0.175612222"},{"1":"factor(monkey)11781","2":"6.000000e+00","3":"2.94464048","4":"2.037600e+00","5":"0.042686931"},{"1":"factor(monkey)11799","2":"7.000000e+00","3":"2.94464048","4":"2.377200e+00","5":"0.018229337"},{"1":"factor(monkey)11895","2":"2.500000e+00","3":"2.94464048","4":"8.490001e-01","5":"0.396727282"},{"1":"factor(monkey)11916","2":"5.000000e+00","3":"2.94464048","4":"1.698000e+00","5":"0.090803840"},{"1":"factor(monkey)12017","2":"5.500000e+00","3":"2.94464048","4":"1.867800e+00","5":"0.063008689"},{"1":"factor(monkey)12164","2":"5.437500e+00","3":"3.13048431","4":"1.736952e+00","5":"0.083678713"},{"1":"factor(monkey)12298","2":"8.937500e+00","3":"3.13048431","4":"2.854990e+00","5":"0.004680388"},{"1":"factor(monkey)12355","2":"7.937500e+00","3":"3.13048431","4":"2.535550e+00","5":"0.011862943"},{"1":"factor(monkey)12368","2":"3.437500e+00","3":"3.13048431","4":"1.098073e+00","5":"0.273273063"},{"1":"factor(monkey)12381","2":"4.937500e+00","3":"3.13048431","4":"1.577232e+00","5":"0.116059306"},{"1":"factor(monkey)12505","2":"6.437500e+00","3":"3.13048431","4":"2.056391e+00","5":"0.040826302"},{"1":"factor(monkey)12520","2":"4.937500e+00","3":"3.13048431","4":"1.577232e+00","5":"0.116059306"},{"1":"factor(monkey)12532","2":"6.437500e+00","3":"3.13048431","4":"2.056391e+00","5":"0.040826302"},{"1":"factor(monkey)12630","2":"5.437500e+00","3":"3.13048431","4":"1.736952e+00","5":"0.083678713"},{"1":"factor(monkey)12631","2":"7.437500e+00","3":"3.13048431","4":"2.375830e+00","5":"0.018295539"},{"1":"factor(monkey)12749","2":"6.937500e+00","3":"3.13048431","4":"2.216111e+00","5":"0.027622537"},{"1":"factor(monkey)12906","2":"2.437500e+00","3":"3.13048431","4":"7.786335e-01","5":"0.436962634"},{"1":"factor(monkey)12947","2":"5.375000e+00","3":"3.31952984","4":"1.619205e+00","5":"0.106716412"},{"1":"factor(monkey)12958","2":"7.375000e+00","3":"3.31952984","4":"2.221700e+00","5":"0.027236943"},{"1":"factor(monkey)13121","2":"4.875000e+00","3":"3.31952984","4":"1.468581e+00","5":"0.143255789"},{"1":"factor(monkey)13129","2":"5.375000e+00","3":"3.31952984","4":"1.619205e+00","5":"0.106716412"},{"1":"factor(monkey)13131","2":"4.875000e+00","3":"3.31952984","4":"1.468581e+00","5":"0.143255789"},{"1":"factor(monkey)13260","2":"5.875000e+00","3":"3.31952984","4":"1.769829e+00","5":"0.078025353"},{"1":"factor(monkey)13279","2":"5.375000e+00","3":"3.31952984","4":"1.619205e+00","5":"0.106716412"},{"1":"factor(monkey)13312","2":"4.375000e+00","3":"3.31952984","4":"1.317958e+00","5":"0.188774375"},{"1":"factor(monkey)13442","2":"3.375000e+00","3":"3.31952984","4":"1.016710e+00","5":"0.310315126"},{"1":"factor(monkey)13473","2":"6.375000e+00","3":"3.31952984","4":"1.920453e+00","5":"0.055985798"},{"1":"factor(monkey)13578","2":"4.875000e+00","3":"3.31952984","4":"1.468581e+00","5":"0.143255789"},{"1":"factor(monkey)13590","2":"6.875000e+00","3":"3.31952984","4":"2.071076e+00","5":"0.039420756"},{"1":"factor(monkey)13790","2":"5.812500e+00","3":"3.51125999","4":"1.655389e+00","5":"0.099152667"},{"1":"factor(monkey)13825","2":"4.812500e+00","3":"3.51125999","4":"1.370591e+00","5":"0.171783107"},{"1":"factor(monkey)13883","2":"6.312500e+00","3":"3.51125999","4":"1.797788e+00","5":"0.073467506"},{"1":"factor(monkey)13922","2":"7.812500e+00","3":"3.51125999","4":"2.224985e+00","5":"0.027012536"},{"1":"factor(monkey)14043","2":"7.312500e+00","3":"3.51125999","4":"2.082586e+00","5":"0.038348277"},{"1":"factor(monkey)14066","2":"4.812500e+00","3":"3.51125999","4":"1.370591e+00","5":"0.171783107"},{"1":"factor(monkey)14077","2":"6.812500e+00","3":"3.51125999","4":"1.940187e+00","5":"0.053528408"},{"1":"factor(monkey)14137","2":"4.312500e+00","3":"3.51125999","4":"1.228192e+00","5":"0.220578143"},{"1":"factor(monkey)14165","2":"6.312500e+00","3":"3.51125999","4":"1.797788e+00","5":"0.073467506"},{"1":"factor(monkey)14177","2":"4.812500e+00","3":"3.51125999","4":"1.370591e+00","5":"0.171783107"},{"1":"factor(monkey)14307","2":"6.812500e+00","3":"3.51125999","4":"1.940187e+00","5":"0.053528408"},{"1":"factor(monkey)14323","2":"8.312500e+00","3":"3.51125999","4":"2.367384e+00","5":"0.018708484"},{"1":"factor(monkey)14351","2":"7.312500e+00","3":"3.51125999","4":"2.082586e+00","5":"0.038348277"},{"1":"factor(monkey)14651","2":"8.250000e+00","3":"3.70525802","4":"2.226566e+00","5":"0.026905107"},{"1":"factor(monkey)14666","2":"5.250000e+00","3":"3.70525802","4":"1.416905e+00","5":"0.157807347"},{"1":"factor(monkey)14699","2":"8.750000e+00","3":"3.70525802","4":"2.361509e+00","5":"0.019000512"},{"1":"factor(monkey)14823","2":"1.025000e+01","3":"3.70525802","4":"2.766339e+00","5":"0.006110166"},{"1":"factor(monkey)14826","2":"3.750000e+00","3":"3.70525802","4":"1.012075e+00","5":"0.312521306"},{"1":"factor(monkey)14902","2":"7.250000e+00","3":"3.70525802","4":"1.956679e+00","5":"0.051544843"},{"1":"factor(monkey)14919","2":"7.250000e+00","3":"3.70525802","4":"1.956679e+00","5":"0.051544843"},{"1":"factor(monkey)14985","2":"7.750000e+00","3":"3.70525802","4":"2.091622e+00","5":"0.037523810"},{"1":"factor(monkey)15041","2":"7.250000e+00","3":"3.70525802","4":"1.956679e+00","5":"0.051544843"},{"1":"factor(monkey)15088","2":"4.750000e+00","3":"3.70525802","4":"1.281962e+00","5":"0.201092964"},{"1":"factor(monkey)15120","2":"4.750000e+00","3":"3.70525802","4":"1.281962e+00","5":"0.201092964"},{"1":"factor(monkey)15218","2":"5.750000e+00","3":"3.70525802","4":"1.551849e+00","5":"0.122016071"},{"1":"factor(monkey)15530","2":"7.687500e+00","3":"3.90118562","4":"1.970555e+00","5":"0.049924231"},{"1":"factor(monkey)15642","2":"7.187500e+00","3":"3.90118562","4":"1.842389e+00","5":"0.066651841"},{"1":"factor(monkey)15732","2":"6.687500e+00","3":"3.90118562","4":"1.714222e+00","5":"0.087778891"},{"1":"factor(monkey)15820","2":"5.687500e+00","3":"3.90118562","4":"1.457890e+00","5":"0.146178124"},{"1":"factor(monkey)15909","2":"4.687500e+00","3":"3.90118562","4":"1.201558e+00","5":"0.230719242"},{"1":"factor(monkey)15926","2":"7.187500e+00","3":"3.90118562","4":"1.842389e+00","5":"0.066651841"},{"1":"factor(monkey)16002","2":"9.187500e+00","3":"3.90118562","4":"2.355053e+00","5":"0.019326031"},{"1":"factor(monkey)16090","2":"7.687500e+00","3":"3.90118562","4":"1.970555e+00","5":"0.049924231"},{"1":"factor(monkey)16097","2":"5.187500e+00","3":"3.90118562","4":"1.329724e+00","5":"0.184871653"},{"1":"factor(monkey)16169","2":"9.187500e+00","3":"3.90118562","4":"2.355053e+00","5":"0.019326031"},{"1":"factor(monkey)16236","2":"5.187500e+00","3":"3.90118562","4":"1.329724e+00","5":"0.184871653"},{"1":"factor(monkey)16258","2":"6.187500e+00","3":"3.90118562","4":"1.586056e+00","5":"0.114043194"},{"1":"factor(monkey)16447","2":"1.162500e+01","3":"4.09876609","4":"2.836219e+00","5":"0.004954837"},{"1":"factor(monkey)16553","2":"7.625000e+00","3":"4.09876609","4":"1.860316e+00","5":"0.064064047"},{"1":"factor(monkey)16556","2":"8.625000e+00","3":"4.09876609","4":"2.104292e+00","5":"0.036393453"},{"1":"factor(monkey)16598","2":"5.625000e+00","3":"4.09876609","4":"1.372364e+00","5":"0.171231281"},{"1":"factor(monkey)16670","2":"5.625000e+00","3":"4.09876609","4":"1.372364e+00","5":"0.171231281"},{"1":"factor(monkey)16736","2":"9.125000e+00","3":"4.09876609","4":"2.226280e+00","5":"0.026924503"},{"1":"factor(monkey)16807","2":"5.125000e+00","3":"4.09876609","4":"1.250376e+00","5":"0.212379846"},{"1":"factor(monkey)16972","2":"9.125000e+00","3":"4.09876609","4":"2.226280e+00","5":"0.026924503"},{"1":"factor(monkey)17037","2":"8.125000e+00","3":"4.09876609","4":"1.982304e+00","5":"0.048585829"},{"1":"factor(monkey)17047","2":"1.012500e+01","3":"4.09876609","4":"2.470256e+00","5":"0.014197944"},{"1":"factor(monkey)17069","2":"7.625000e+00","3":"4.09876609","4":"1.860316e+00","5":"0.064064047"},{"1":"factor(monkey)17103","2":"7.125000e+00","3":"4.09876609","4":"1.738328e+00","5":"0.083435526"},{"1":"factor(monkey)17285","2":"7.625000e+00","3":"4.09876609","4":"1.860316e+00","5":"0.064064047"},{"1":"factor(monkey)17516","2":"7.062500e+00","3":"4.29777147","4":"1.643294e+00","5":"0.101631615"},{"1":"factor(monkey)17718","2":"5.562500e+00","3":"4.29777147","4":"1.294275e+00","5":"0.196814278"},{"1":"factor(monkey)17738","2":"7.562500e+00","3":"4.29777147","4":"1.759633e+00","5":"0.079744124"},{"1":"factor(monkey)17794","2":"5.062500e+00","3":"4.29777147","4":"1.177936e+00","5":"0.239988783"},{"1":"factor(monkey)17825","2":"7.062500e+00","3":"4.29777147","4":"1.643294e+00","5":"0.101631615"},{"1":"factor(monkey)17927","2":"5.062500e+00","3":"4.29777147","4":"1.177936e+00","5":"0.239988783"},{"1":"factor(monkey)18038","2":"7.562500e+00","3":"4.29777147","4":"1.759633e+00","5":"0.079744124"},{"1":"factor(monkey)18067","2":"5.562500e+00","3":"4.29777147","4":"1.294275e+00","5":"0.196814278"},{"1":"factor(monkey)18069","2":"1.056250e+01","3":"4.29777147","4":"2.457669e+00","5":"0.014692320"},{"1":"factor(monkey)18106","2":"8.562500e+00","3":"4.29777147","4":"1.992312e+00","5":"0.047469830"},{"1":"factor(monkey)18153","2":"9.062500e+00","3":"4.29777147","4":"2.108651e+00","5":"0.036011350"},{"1":"factor(monkey)18230","2":"9.000000e+00","3":"4.49801264","4":"2.000884e+00","5":"0.046531245"},{"1":"factor(monkey)18232","2":"1.000000e+01","3":"4.49801264","4":"2.223204e+00","5":"0.027133995"},{"1":"factor(monkey)18364","2":"1.000000e+01","3":"4.49801264","4":"2.223204e+00","5":"0.027133995"},{"1":"factor(monkey)18407","2":"1.150000e+01","3":"4.49801264","4":"2.556685e+00","5":"0.011184196"},{"1":"factor(monkey)18450","2":"9.000000e+00","3":"4.49801264","4":"2.000884e+00","5":"0.046531245"},{"1":"factor(monkey)18489","2":"5.571141e+00","3":"4.65807995","4":"1.196017e+00","5":"0.232870369"},{"1":"factor(monkey)18520","2":"1.100000e+01","3":"4.49801264","4":"2.445524e+00","5":"0.015183749"},{"1":"factor(monkey)18569","2":"8.500000e+00","3":"4.49801264","4":"1.889723e+00","5":"0.060000176"},{"1":"factor(monkey)18652","2":"1.100000e+01","3":"4.49801264","4":"2.445524e+00","5":"0.015183749"},{"1":"factor(monkey)18653","2":"9.000000e+00","3":"4.49801264","4":"2.000884e+00","5":"0.046531245"},{"1":"factor(monkey)18873","2":"9.000000e+00","3":"4.49801264","4":"2.000884e+00","5":"0.046531245"},{"1":"factor(monkey)18947","2":"1.100000e+01","3":"4.49801264","4":"2.445524e+00","5":"0.015183749"},{"1":"factor(monkey)19178","2":"5.937500e+00","3":"4.69933163","4":"1.263478e+00","5":"0.207643559"},{"1":"factor(monkey)19220","2":"9.937500e+00","3":"4.69933163","4":"2.114662e+00","5":"0.035490064"},{"1":"factor(monkey)19239","2":"8.937500e+00","3":"4.69933163","4":"1.901866e+00","5":"0.058386063"},{"1":"factor(monkey)22020","2":"3.312500e+00","3":"2.09299182","4":"1.582663e+00","5":"0.114815239"},{"1":"factor(monkey)22021","2":"2.812500e+00","3":"2.09299182","4":"1.343770e+00","5":"0.180291735"},{"1":"factor(monkey)22023","2":"6.250000e-02","3":"1.60442379","4":"3.895480e-02","5":"0.968958814"},{"1":"factor(monkey)22024","2":"7.500000e-01","3":"2.24900632","4":"3.334806e-01","5":"0.739062688"},{"1":"factor(monkey)22025","2":"-1.437500e+00","3":"1.60442379","4":"-8.959603e-01","5":"0.371171733"},{"1":"factor(monkey)22047","2":"1.875000e+00","3":"1.94769661","4":"9.626756e-01","5":"0.336679273"},{"1":"factor(monkey)22048","2":"3.750000e-01","3":"1.94769661","4":"1.925351e-01","5":"0.847485879"},{"1":"factor(monkey)22049","2":"2.875000e+00","3":"1.94769661","4":"1.476103e+00","5":"0.141227161"},{"1":"factor(monkey)22050","2":"-1.000000e+00","3":"1.70008898","4":"-5.882045e-01","5":"0.556948129"},{"1":"factor(monkey)22052","2":"1.562500e+00","3":"1.60442379","4":"9.738699e-01","5":"0.331101681"},{"1":"factor(monkey)22053","2":"2.625000e+00","3":"1.53243949","4":"1.712955e+00","5":"0.088012235"},{"1":"factor(monkey)22054","2":"1.875000e+00","3":"1.94769661","4":"9.626756e-01","5":"0.336679273"},{"1":"factor(monkey)22055","2":"3.625000e+00","3":"1.53243949","4":"2.365509e+00","5":"0.018801226"},{"1":"factor(monkey)22056","2":"2.937500e+00","3":"1.81569582","4":"1.617837e+00","5":"0.107011195"},{"1":"factor(monkey)22057","2":"2.000000e+00","3":"1.70008898","4":"1.176409e+00","5":"0.240596983"},{"1":"factor(monkey)22058","2":"2.500000e-01","3":"1.47232024","4":"1.698000e-01","5":"0.865310455"},{"1":"factor(monkey)22060","2":"-1.250000e-01","3":"1.94769661","4":"-6.417837e-02","5":"0.948881619"},{"1":"factor(monkey)22062","2":"2.500000e+00","3":"1.70008898","4":"1.470511e+00","5":"0.142733141"},{"1":"factor(monkey)22064","2":"NA","3":"NA","4":"NA","5":"NA"},{"1":"age:factor(day)2","2":"2.808043e-02","3":"0.02493246","4":"1.126260e+00","5":"0.261180467"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>So the first thing you will notice is that that is <em>a lot</em> of regression coefficients! There are 243 monkeys and 2 days, but only 485 observations. This isn’t enough data to reliably estimate all of these parameters. (Look at the standard errors for the monkey-related coefficients. They are huge!)</p>
<p>So what are we to do?</p>
<p>The problem is the monkeys. If we use <code>monkey</code> as a factor variable, we only have (at most) two observations of each factor level. This is simply not enough observations per to estimate a different intercept for each monkey!</p>
<p>This type of model is often described as having <em>no pooling</em>, which indicates that there is no explicit dependence between the intercepts for each group (<code>monkey</code>). (There is some dependence between groups due to the group-level covariate <code>age</code>.)</p>
</section>
<section id="if-we-ignore-the-monkeys-will-they-go-away-or-another-attempt-at-regression" class="level3">
<h3 class="anchored" data-anchor-id="if-we-ignore-the-monkeys-will-they-go-away-or-another-attempt-at-regression">If we ignore the monkeys, will they go away? or Another attempt at regression</h3>
<p>Our first attempt at a regression model didn’t work particularly well, but that doesn’t mean we should give up<sup>8</sup>. A second option is that we can assume that there is, fundamentally, no difference between monkeys. If all monkeys of the same age have similar amounts of interest in new puzzles, this would be a reasonable assumption. The best case scenario is that not accounting for differences between individual monkeys would still lead to approximately normal residuals, albeit with probably a larger residual variance.</p>
<p>This type of modelling assumption is called <em>complete pooling</em> as it pools the information between groups by treating them all as the same.</p>
<p>Let’s see what happens in this case!</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb4-1">fit_lm_pool <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(active_bins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> age<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(day), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> activity_2mins)</span>
<span id="cb4-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(fit_lm_pool)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
lm(formula = active_bins ~ age * factor(day), data = activity_2mins)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.5249 -1.5532  0.1415  1.6731  4.1884 

Coefficients:
                 Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept)      3.789718   0.344466  11.002   &lt;2e-16 ***
age              0.003126   0.021696   0.144    0.885    
factor(day)2     0.056112   0.488818   0.115    0.909    
age:factor(day)2 0.025170   0.030759   0.818    0.414    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.103 on 481 degrees of freedom
Multiple R-squared:  0.01365,   Adjusted R-squared:  0.0075 
F-statistic: 2.219 on 3 and 481 DF,  p-value: 0.0851</code></pre>
</div>
</div>
<p>On the up side, the regression runs and doesn’t have too many parameters!</p>
<p>The brave and the bold might even try to interpret the coefficients and say something like <em>there doesn’t seem to be a strong effect of age</em>. But there’s real danger in trying to interpret regression coefficients in the presence of a potential confounder (in this case, the monkey ID). And it’s particularly bad form to do this without ever looking at any sort of regression diagnostics. Linear regression is not a magic eight ball.</p>
<p>Let’s look at the diagnostic plots.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(broom)</span>
<span id="cb6-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">augment</span>(fit_lm_pool) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb6-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> .fitted, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> active_bins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> .fitted)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_smooth</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lm"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">se =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_classic</span>()</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey/everybodys-got-something-to-hide-except-me-and-my-monkey_files/figure-html/unnamed-chunk-4-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">augment</span>(fit_lm_pool) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample =</span> .std.resid)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb7-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_qq</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">slope =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">intercept =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dashed"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb7-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_classic</span>()</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey/everybodys-got-something-to-hide-except-me-and-my-monkey_files/figure-html/unnamed-chunk-4-2.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>There are certainly some patterns in those residuals (and some suggestion that the error need a heavier tail for this model to make sense).</p>
</section>
</section>
<section id="what-is-between-no-pooling-and-complete-pooling-multilevel-models-thats-what" class="level2">
<h2 class="anchored" data-anchor-id="what-is-between-no-pooling-and-complete-pooling-multilevel-models-thats-what">What is between no pooling and complete pooling? Multilevel models, that’s what</h2>
<p>We are in a Goldilocks situation: no pooling results in a model that has too many independent parameters for the amount of data that we’ve got, while complete pooling has too few parameters to correctly account for the differences between the monkeys. So what is our perfectly tempered porridge<sup>9</sup>?</p>
<p>The answer is to assume that each monkey has its own intercept, but that it’s intercept can only be <em>so far</em> from the overall intercept (that we would’ve gotten from complete pooling). There are a bunch of ways to realize this concept, but the classical method is to use a normal distribution.</p>
<p>In particular, if the <img src="https://latex.codecogs.com/png.latex?j">th monkey has observations <img src="https://latex.codecogs.com/png.latex?y_%7Bij%7D">, <img src="https://latex.codecogs.com/png.latex?i=1,2">, then we can write our model as <img src="https://latex.codecogs.com/png.latex?%0Ay_%7Bij%7D%20%20%5Csim%20N(%5Cmu_j%20+%20%5Cbeta_%5Ctext%7Bage%7D%5C,%20%5Ctext%7Bage%7D_j%20+%20%5Cbeta_%5Ctext%7Bday%7D%5C,%20%5Ctext%7Bday%7D_%7Bij%7D%20+%20%5Cbeta_%5Ctext%7Bage,day%7D%5C,%20%5Ctext%7B%5Bage*day%5D%7D_%7Bij%7D,%20%5Csigma%5E2).%0A"></p>
<p>The effects of age and day and the data standard deviation (<img src="https://latex.codecogs.com/png.latex?%5Csigma">) are just like they’d be in an ordinary linear regression model. Our modification comes in how we treat the <img src="https://latex.codecogs.com/png.latex?%5Cmu_j">.</p>
<p>In a classical linear regression model, we would fit the <img src="https://latex.codecogs.com/png.latex?%5Cmu_j">s independently, perhaps with some weakly informative prior distribution. But we’ve already discussed that that won’t work.</p>
<p>Instead we will make the <img src="https://latex.codecogs.com/png.latex?%5Cmu_j"> <em>exchangeable</em> rather than independent. Exchangeability is a relaxation of the independence assumption to say instead encode that we have no idea which of the intercepts will do what. That is, if we switch around the labels of our intercepts the prior should not change. There is a long and storied history of exchangeable models in statistics, but the short version that is more than sufficient for our purposes is that they usually<sup>10</sup> take the form <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Cmu_j%20%5Cmid%20%5Ctau%20%5Cstackrel%7B%5Ctext%7Biid%7D%7D%7B%5Csim%7D%20&amp;p(%5Cmu_j%20%5Cmid%20%5Ctau),%20%5Cqquad%20i%20=%201,%5Cldots,%20J%20%5C%5C%0A%5Ctau%20%5Csim%20&amp;%20p(%5Ctau).%0A%5Cend%7Balign*%7D"></p>
<p>In a regression context, we typically assume that <img src="https://latex.codecogs.com/png.latex?%0A%5Cmu_j%20%5Cmid%20%5Ctau%20%5Csim%20N(%5Cmu,%20%5Ctau%5E2)%0A"> for some <img src="https://latex.codecogs.com/png.latex?%5Cmu"> and <img src="https://latex.codecogs.com/png.latex?%5Ctau"> that will need their own priors.</p>
<p>We can explore this difference mathematically. The regression model, which assumes independence of the <img src="https://latex.codecogs.com/png.latex?%5Cmu_j">, uses <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Cmu_1,%20%5Cldots,%20%5Cmu_J)%20=%20%5Cprod_%7Bj=1%7D%5EJ%20N(%5Cmu,%20%5Ctau_%5Ctext%7Bfixed%7D%5E2)%0A"> as the joint prior on <img src="https://latex.codecogs.com/png.latex?%5Cmu_1,%5Cldots,%5Cmu_J">. On the other hand, the exchangeable model, which forms the basis of multilevel models, assumes the joint prior <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Cmu_1,%20%5Cldots,%20%5Cmu_J)%20=%20%5Cint_0%5E%5Cinfty%20%5Cleft(%5Cprod_%7Bj=1%7D%5EJ%20N(%5Cmu,%20%5Ctau%5E2)%5Cright)p(%5Ctau)%5C,d%5Ctau,%0A"> for some prior on <img src="https://latex.codecogs.com/png.latex?p(%5Ctau)"> on <img src="https://latex.codecogs.com/png.latex?%5Ctau">.</p>
<p>This might not seem like much of a change, but it can be quite profound. In both cases, the prior is saying that each <img src="https://latex.codecogs.com/png.latex?%5Cmu_j"> is, with high probability, at most <img src="https://latex.codecogs.com/png.latex?3%5Ctau"> away from the overall mean <img src="https://latex.codecogs.com/png.latex?%5Cmu">. The difference is that while the classical least squares formulation uses a fixed value of <img src="https://latex.codecogs.com/png.latex?%5Ctau"> that needs to be specified by the modeller, while the exchangeable model lets <img src="https://latex.codecogs.com/png.latex?%5Ctau"> adapt to the data.</p>
<p>This data adaptation is really nifty! It means that if the groups have similar means, they can borrow information from the other groups (via the narrowing of <img src="https://latex.codecogs.com/png.latex?%5Ctau">) in order to improve their precision over an unpooled estimate. On the other hand, if there is a meaningful difference between the groups<sup>11</sup>, this model can still represent that, unlike the unpooled model.</p>
<p>In our context, however, we need a tiny bit more. We have a <em>group-level covariate</em> (specifically <code>age</code>) that we think is going to effect the group mean. So the model we want is <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Ay_%7Bij%7D%20%20%5Cmid%20%5Cmu_j,%5Cbeta,%20%5Csigma%20&amp;%5Csim%20N(%5Cmu_j%20+%20%5Cbeta_%5Ctext%7Bday%7D%5C,%20%5Ctext%7Bday%7D_%7Bij%7D%20+%20%5Cbeta_%5Ctext%7Bage,day%7D%5C,%20%5Ctext%7B%5Bage*day%5D%7D_%7Bij%7D%20,%20%5Csigma%5E2)%20%5C%5C%0A%5Cmu_j%5Cmid%20%5Ctau,%20%5Cmu,%5Cbeta%20&amp;%5Csim%20N(%5Cmu%20+%20%20%5Cbeta_%5Ctext%7Bage%7D%5C,%20%5Ctext%7Bage%7D_j,%20%5Ctau%5E2)%20%5C%5C%0A%5Cmu%20&amp;%5Csim%20p(%5Cmu)%5C%5C%0A%5Cbeta%20&amp;%5Csim%20p(%5Cbeta)%5C%5C%0A%5Ctau%20&amp;%20%5Csim%20p(%5Ctau)%20%5C%5C%0A%5Csigma%20&amp;%5Csim%20p(%5Csigma).%0A%5Cend%7Balign*%7D"></p>
<p>In order to fully specify the model we need to set the four prior distributions.</p>
<p>This is an example of a <em>multilevel</em><sup>12</sup> <em>model</em>. The name comes from the data having multiple levels (in this case two: the observation level and the group level). Both levels have an appropriate model for their mean.</p>
<p>This mathematical representation does a good job in separating out the two different levels. However, there are a lot of other ways of writing multilevel models. An important example is the extended formula notation created<sup>13</sup> by R’s <code>lme4</code> package. In their notation, we would write this model as</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb8-1">formula <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> active_bins_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> age_centred<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>day <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> monkey)</span></code></pre></div>
</div>
<p>The first bit of this formula is the same as the formula used in linear regression. The interesting bit is is the <code>(1 | monkey)</code>. This is the way to tell R that the intercept (aka <code>1</code> in formula notation) is going to be grouped by <code>monkey</code> and we are going to put an exchangeable normal prior on it. For more complex models there are more complex variations on this theme, but for the moment we won’t go any further.</p>
</section>
<section id="reasoning-out-some-prior-distributions" class="level2">
<h2 class="anchored" data-anchor-id="reasoning-out-some-prior-distributions">Reasoning out some prior distributions</h2>
<p>We need to set priors. The canny amongst you may have noticed that I did not set priors in the previous two examples. There are two reasons for this: firstly I didn’t feel like it, and secondly none but the most terrible prior distributions would have meaningfully changed the conclusions. This is, it turns out, one of the great truths when it comes to prior distributions: <em>they do not matter until they do</em><sup>14</sup>.</p>
<p>In particular, if you have a parameter that <em>directly</em> sees the data (eg it’s in the likelihood) and there is nothing weird going on<sup>15</sup>, then the prior distribution will usually not do much as any prior will be quickly overwhelmed by the data.</p>
<p>The problem is that we have one parameter in our model (<img src="https://latex.codecogs.com/png.latex?%5Ctau">) that does not directly see the data. Instead of directly telling us about an observation, it tells us about how different the <em>groups</em> of observations are. There is usually less information in the data about this type of parameter and, consequently, the prior distribution will be more important. This is especially true when you have more than one grouping variable, or when a variable only has a small number of groups.</p>
<p>So let’s pay some proper attention to the priors.</p>
<p>To begin with, let’s set priors on <img src="https://latex.codecogs.com/png.latex?%5Cmu">, <img src="https://latex.codecogs.com/png.latex?%5Cbeta">, and <img src="https://latex.codecogs.com/png.latex?%5Csigma"> (aka the data-level parameters). This is a <em>considerably</em> easier task if the data is scaled. Otherwise, you need to encode information about the usual scale<sup>16</sup> of the data into your priors. Sometimes this is a sensible and easy thing to do, but usually it’s easier to simply scale the data. (A lot of software will simply scale your data for you, but it is <em>always</em> better to do it yourself!)</p>
<p>So let’s scale our data. We have three variables that need scaling: <code>age</code> (aka the covariate that isn’t categorical) and <code>active_bins</code> (aka the response). For age, we are going to want to measure it as either <em>years from the youngest monkey</em> or <em>years from the average monkey</em>. I think, in this situation, the first version could make a lot of sense, but we are going with the second. This allows us to interpret <img src="https://latex.codecogs.com/png.latex?%5Cmu"> as the over-all mean. Otherwise, <img src="https://latex.codecogs.com/png.latex?%5Cmu"> would tell us about the overall average activity of 4 year old monkeys and we will use <img src="https://latex.codecogs.com/png.latex?%5Cbeta(%5Ctext%7Bage%7D_j%20-%204)"> to estimate how much the activity changes, on average keeping all other aspects constant, as the monkey ages.</p>
<p>On the other hand, we have no sensible baseline for activity, so deviation from the average seems like a sensible scaling. I also don’t know, <em>a priori</em>, how variable activity is going to be, so I might want to scale<sup>17</sup> it by its standard deviation. In this case, I’m not going to do that because we have a sensible fixed<sup>18</sup> upper limit (8), which I can scale by.</p>
<p>One important thing here is that if we scale the data by data-dependent quantities (like the minimum, the mean, or the standard deviation) we <em>must</em> keep track of this information. This is because <em>any</em> future data we try to predict with this model will need to be transformed <em>the same way using the same</em><sup>19</sup> <em>numbers</em>! This particularly has implication when you are doing things like test/training set validation or cross validation: in the first case, the test set needs to be scaled in the same way the training set was; while in the second case each cross validation training set needs to be scaled independently and that scaling needs to be used on the corresponding left-out data<sup>20</sup>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb9-1">age_centre <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(activity_2mins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>age)</span>
<span id="cb9-2">age_scale <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">diff</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">range</span>(activity_2mins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>age))<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb9-3">active_bins_centre <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span></span>
<span id="cb9-4"></span>
<span id="cb9-5">activity_2mins_scaled <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> activity_2mins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">monkey =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(monkey),</span>
<span id="cb9-7">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">day =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(day),</span>
<span id="cb9-8">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">age_centred =</span> (age <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> age_centre)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>age_scale,</span>
<span id="cb9-9">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">active_bins_scaled =</span> (active_bins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> active_bins_centre)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)</span>
<span id="cb9-10"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glimpse</span>(activity_2mins_scaled)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Rows: 485
Columns: 7
$ monkey             &lt;fct&gt; 0, 0, 88, 88, 636, 636, 760, 760, 1257, 1257, 1607,…
$ day                &lt;fct&gt; 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, …
$ total              &lt;dbl&gt; 495, 1003, 2642, 524, 199, 282, 363, 445, 96, 495, …
$ active_bins        &lt;int&gt; 6, 6, 8, 6, 2, 3, 3, 4, 3, 8, 6, 5, 3, 3, 6, 5, 8, …
$ age                &lt;dbl&gt; 29, 29, 29, 29, 28, 28, 30, 30, 27, 27, 27, 27, 27,…
$ age_centred        &lt;dbl&gt; 1.1054718, 1.1054718, 1.1054718, 1.1054718, 1.02854…
$ active_bins_scaled &lt;dbl&gt; 0.50, 0.50, 1.00, 0.50, -0.50, -0.25, -0.25, 0.00, …</code></pre>
</div>
</div>
<p>With our scaling completed, we can now start thinking about prior distributions. The trick with priors is to make them wide enough to cover all plausible values of a parameter without making them so wide that they put a whole bunch of weight on essentially silly values.</p>
<p>We know, for instance, that our unscaled activity will go between 0 and 8. That means that it’s unlikely for the mean of the scaled process to be much bigger than 3 or 4. These considerations, along with the fact that we have centred the data so the mean should be closer to zero, suggest that a <img src="https://latex.codecogs.com/png.latex?N(0,1)"> prior should be appropriate for <img src="https://latex.codecogs.com/png.latex?%5Cmu">.</p>
<p>As we normalised our age data relative to the smallest age, we should think more carefully about the scaling of <img src="https://latex.codecogs.com/png.latex?%5Cbeta">. Macaques live for 20-30<sup>21</sup> years, so we need to think about, for instance, an ordinary aged macaque that would be 15 years older than the baseline. Thanks to our scaling, the largest change that we can have is around 1, which strongly suggests that if <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> was too much larger than <img src="https://latex.codecogs.com/png.latex?1/8"> we are going to be in unreasonable territory. So let’s put a <img src="https://latex.codecogs.com/png.latex?N(0,0.2%5E2)"> prior<sup>22</sup> on <img src="https://latex.codecogs.com/png.latex?%5Cbeta_%5Ctext%7Bage%7D"> and <img src="https://latex.codecogs.com/png.latex?%5Cbeta_%5Ctext%7Bage,day%7D">. For <img src="https://latex.codecogs.com/png.latex?%5Cbeta_%5Ctext%7Bday%7D"> we can use a <img src="https://latex.codecogs.com/png.latex?N(0,1)"> prior.</p>
<p>Similarly, the scaling of <code>activity_bins</code> suggests that a <img src="https://latex.codecogs.com/png.latex?N(0,1)"> prior would be sufficient for the data-level standard deviation <img src="https://latex.codecogs.com/png.latex?%5Csigma">.</p>
<p>That just leaves us with our choice of prior for the standard deviation of the intercept<sup>23</sup> <img src="https://latex.codecogs.com/png.latex?%5Cmu_j">, <img src="https://latex.codecogs.com/png.latex?%5Ctau">. Thankfully, we considered this case in detail <a href="https://dansblog.netlify.app/posts/2022-08-29-priors4/priors4.html">in the previous blog post</a>. There I argued that a sensible prior for <img src="https://latex.codecogs.com/png.latex?%5Ctau"> would be an exponential prior. To be quite honest with you, a half-normal or a half-t also would be fine. But I’m going to stick to my guns. For the scaling, again, it would be a touch surprising (given the scaling of the data) if the group means were more than 3 apart, so choosing <img src="https://latex.codecogs.com/png.latex?%5Clambda=1"> in the exponential distribution should give a relatively weak prior without being so wide that we are putting prior mass on a bunch of values that we would never actually want to put prior mass on.</p>
<p>We can then fit the model with <code>brms</code>. In this case, I’m using the <code>cmdstanr</code> back end, because it’s fast and I like it.</p>
<p>To specify the model, we use the <code>lme4</code>-style formula notation discussed above.</p>
<p>To set the priors, we will use <code>brms</code>. Now, if you are Paul you might be able to remember how to set priors in <code>brms</code> without having to look it up, but I am sadly not Paul<sup>24</sup>, so every time I need to set priors in <code>brms</code> I write the formula and use the convenient <code>get_prior</code> function</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(cmdstanr)</span>
<span id="cb11-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(brms)</span>
<span id="cb11-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_prior</span>(formula, activity_2mins_scaled)</span></code></pre></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["prior"],"name":[1],"type":["chr"],"align":["left"]},{"label":["class"],"name":[2],"type":["chr"],"align":["left"]},{"label":["coef"],"name":[3],"type":["chr"],"align":["left"]},{"label":["group"],"name":[4],"type":["chr"],"align":["left"]},{"label":["resp"],"name":[5],"type":["chr"],"align":["left"]},{"label":["dpar"],"name":[6],"type":["chr"],"align":["left"]},{"label":["nlpar"],"name":[7],"type":["chr"],"align":["left"]},{"label":["lb"],"name":[8],"type":["chr"],"align":["left"]},{"label":["ub"],"name":[9],"type":["chr"],"align":["left"]},{"label":["source"],"name":[10],"type":["chr"],"align":["left"]}],"data":[{"1":"","2":"b","3":"","4":"","5":"","6":"","7":"","8":"","9":"","10":"default"},{"1":"","2":"b","3":"age_centred","4":"","5":"","6":"","7":"","8":"","9":"","10":"default"},{"1":"","2":"b","3":"age_centred:day2","4":"","5":"","6":"","7":"","8":"","9":"","10":"default"},{"1":"","2":"b","3":"day2","4":"","5":"","6":"","7":"","8":"","9":"","10":"default"},{"1":"student_t(3, 0, 2.5)","2":"Intercept","3":"","4":"","5":"","6":"","7":"","8":"","9":"","10":"default"},{"1":"student_t(3, 0, 2.5)","2":"sd","3":"","4":"","5":"","6":"","7":"","8":"0","9":"","10":"default"},{"1":"","2":"sd","3":"","4":"monkey","5":"","6":"","7":"","8":"","9":"","10":"default"},{"1":"","2":"sd","3":"Intercept","4":"monkey","5":"","6":"","7":"","8":"","9":"","10":"default"},{"1":"student_t(3, 0, 2.5)","2":"sigma","3":"","4":"","5":"","6":"","7":"","8":"0","9":"","10":"default"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>From this, we can see that the default prior on <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> is an improper flat prior, the default prior on the intercept is a Student-t with 3 degrees of freedom centred at zero with standard deviation 2.5. The same prior (restricted to positive numbers) is put on all of the standard deviation parameters. These default prior distributions are, to be honest, probably fine in this context<sup>25</sup>, but it is good practice to always set your prior.</p>
<p>We do this as follows. (Note that <code>brms</code> uses Stan, which parameterises the normal distribution by its mean and <em>standard deviation</em>!)</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb12-1">priors <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prior</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">normal</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">coef =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"age_centred"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb12-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prior</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">normal</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">coef =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"age_centred:day2"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb12-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prior</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">normal</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">coef =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"day2"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb12-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prior</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">normal</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">class =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sigma"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb12-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prior</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exponential</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">class =</span> sd) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># tau</span></span>
<span id="cb12-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prior</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">normal</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">class =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Intercept"</span>)</span>
<span id="cb12-7">priors</span></code></pre></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["prior"],"name":[1],"type":["chr"],"align":["left"]},{"label":["class"],"name":[2],"type":["chr"],"align":["left"]},{"label":["coef"],"name":[3],"type":["chr"],"align":["left"]},{"label":["group"],"name":[4],"type":["chr"],"align":["left"]},{"label":["resp"],"name":[5],"type":["chr"],"align":["left"]},{"label":["dpar"],"name":[6],"type":["chr"],"align":["left"]},{"label":["nlpar"],"name":[7],"type":["chr"],"align":["left"]},{"label":["lb"],"name":[8],"type":["chr"],"align":["left"]},{"label":["ub"],"name":[9],"type":["chr"],"align":["left"]},{"label":["source"],"name":[10],"type":["chr"],"align":["left"]}],"data":[{"1":"normal(0, 0.2)","2":"b","3":"age_centred","4":"","5":"","6":"","7":"","8":"NA","9":"NA","10":"user"},{"1":"normal(0, 0.2)","2":"b","3":"age_centred:day2","4":"","5":"","6":"","7":"","8":"NA","9":"NA","10":"user"},{"1":"normal(0, 1)","2":"b","3":"day2","4":"","5":"","6":"","7":"","8":"NA","9":"NA","10":"user"},{"1":"normal(0, 1)","2":"sigma","3":"","4":"","5":"","6":"","7":"","8":"NA","9":"NA","10":"user"},{"1":"exponential(1)","2":"sd","3":"","4":"","5":"","6":"","7":"","8":"NA","9":"NA","10":"user"},{"1":"normal(0, 1)","2":"Intercept","3":"","4":"","5":"","6":"","7":"","8":"NA","9":"NA","10":"user"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
</section>
<section id="pre-experiment-prophylaxis" class="level2">
<h2 class="anchored" data-anchor-id="pre-experiment-prophylaxis">Pre-experiment prophylaxis</h2>
<p>So we have specified some priors using the power of <em>our thoughts</em>. But we should probably check to see if they are broadly sensible. A great thing about Bayesian modelling is that we are explicitly specifying our <em>a priori</em> (or pre-data) assumptions about the data generating process. That means that we can do a fast validation of our priors by simulating from them and checking that they’re not too wild.</p>
<p>There are lots of ways to do this, but the easiest<sup>26</sup> way to do this is to use the <code>sample_prior = "only"</code> option in the <code>brm()</code> function.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb13-1">prior_draws <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">brm</span>(formula, </span>
<span id="cb13-2">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> activity_2mins_scaled,</span>
<span id="cb13-3">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prior =</span> priors,</span>
<span id="cb13-4">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample_prior =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"only"</span>,</span>
<span id="cb13-5">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">backend =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cmdstanr"</span>,</span>
<span id="cb13-6">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cores =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,</span>
<span id="cb13-7">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">refresh =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Start sampling</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>Running MCMC with 4 parallel chains...

Chain 1 finished in 0.7 seconds.
Chain 2 finished in 0.7 seconds.
Chain 3 finished in 0.7 seconds.
Chain 4 finished in 0.7 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.7 seconds.
Total execution time: 0.9 seconds.</code></pre>
</div>
</div>
<p>Now that we have samples from the prior distribution, we can assemble them to work out what our prior tells us we would, pre-data, predict for the number of active bins for a single monkey (in this a single monkey<sup>27</sup> that is 10 years older than the baseline).</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb16-1">pred_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.frame</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">age_centred =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">day =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">monkey =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"88"</span>) </span>
<span id="cb16-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pred =</span> brms<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">posterior_predict</span>(prior_draws, </span>
<span id="cb16-3">                                      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">newdata =</span> pred_data )) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb16-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(pred)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb16-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">after_stat</span>(density)), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lightgrey"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb16-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_vline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xintercept =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dashed"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb16-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_vline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xintercept =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dashed"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb16-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlim</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb16-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>()</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey/everybodys-got-something-to-hide-except-me-and-my-monkey_files/figure-html/unnamed-chunk-10-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>The vertical lines are (approximately) the minimum and maximum of the data. This<sup>28</sup> suggests that the implied priors are definitely wider than our observed data, but they are not several orders of magnitude too wide. This is a good situation to be in: it gives enough room in the priors that we might be wrong with our specification while also not allowing for truly wild values of the parameters (and implied predictive distribution). One could even go so far as to say that the prior is weakly informative.</p>
<p>Let’s compare this to the default priors on the standard deviation parameters. (The default priors on the regression parameters are improper so we can’t simulate from them. So I replaced the improper prior with a much narrower <img src="https://latex.codecogs.com/png.latex?N(0,10%5E2)"> prior. If you make the prior on the <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> wider the prior predictive distribution also gets wider.)</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb17-1">priors_default <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prior</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">normal</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">class =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"b"</span>)</span>
<span id="cb17-2">prior_draws_default <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">brm</span>(formula, </span>
<span id="cb17-3">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> activity_2mins_scaled,</span>
<span id="cb17-4">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prior =</span> priors_default,</span>
<span id="cb17-5">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample_prior =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"only"</span>,</span>
<span id="cb17-6">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">backend =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cmdstanr"</span>,</span>
<span id="cb17-7">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cores =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,</span>
<span id="cb17-8">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">refresh =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Running MCMC with 4 parallel chains...

Chain 1 finished in 0.6 seconds.
Chain 2 finished in 0.6 seconds.
Chain 3 finished in 0.6 seconds.
Chain 4 finished in 0.6 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.6 seconds.
Total execution time: 0.8 seconds.</code></pre>
</div>
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pred =</span> brms<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">posterior_predict</span>(prior_draws_default, </span>
<span id="cb19-2">                                      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">newdata =</span> pred_data )) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb19-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(pred)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">after_stat</span>(density)), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lightgrey"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_vline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xintercept =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dashed"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb19-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_vline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xintercept =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dashed"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>()</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey/everybodys-got-something-to-hide-except-me-and-my-monkey_files/figure-html/unnamed-chunk-11-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>This is considerably wider.</p>
</section>
<section id="fitting-the-data-or-do-my-monkeys-get-less-interesting-as-they-age" class="level2">
<h2 class="anchored" data-anchor-id="fitting-the-data-or-do-my-monkeys-get-less-interesting-as-they-age">Fitting the data; or do my monkeys get less interesting as they age</h2>
<p>With all of that in hand, we can now fit the data. Hooray. This is done with the same command (minus the <code>sample_prior</code> bit).</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb20-1">posterior_draws <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">brm</span>(formula, </span>
<span id="cb20-2">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> activity_2mins_scaled,</span>
<span id="cb20-3">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prior =</span> priors,</span>
<span id="cb20-4">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">backend =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cmdstanr"</span>,</span>
<span id="cb20-5">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cores =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,</span>
<span id="cb20-6">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">refresh =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Start sampling</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>Running MCMC with 4 parallel chains...

Chain 1 finished in 1.7 seconds.
Chain 3 finished in 1.8 seconds.
Chain 2 finished in 1.8 seconds.
Chain 4 finished in 1.8 seconds.

All 4 chains finished successfully.
Mean chain execution time: 1.8 seconds.
Total execution time: 2.0 seconds.</code></pre>
</div>
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb23-1">posterior_draws</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code> Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: active_bins_scaled ~ age_centred * day + (1 | monkey) 
   Data: activity_2mins_scaled (Number of observations: 485) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Group-Level Effects: 
~monkey (Number of levels: 243) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.31      0.03     0.25     0.37 1.00     1070     1766

Population-Level Effects: 
                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           -0.04      0.03    -0.11     0.02 1.00     4222     3171
age_centred          0.02      0.07    -0.11     0.14 1.00     3671     3150
day2                 0.10      0.04     0.03     0.18 1.00     8022     2911
age_centred:day2     0.07      0.07    -0.08     0.22 1.00     6170     2584

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.43      0.02     0.39     0.47 1.00     1613     2430

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).</code></pre>
</div>
</div>
<p>There doesn’t seem to be much of an effect of age in this data.</p>
<p>If you’re curious, this matches well<sup>29</sup> with the output of <code>lme4</code>, which is a nice sense check for simple models. Generally speaking, if they’re the same then they’re both fine. If they are different<sup>30</sup>, then you’ve got to look deeper.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb25-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(lme4)</span>
<span id="cb25-2">fit_lme4 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lmer</span>(formula, activity_2mins_scaled)</span>
<span id="cb25-3">fit_lme4</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Linear mixed model fit by REML ['lmerMod']
Formula: active_bins_scaled ~ age_centred * day + (1 | monkey)
   Data: activity_2mins_scaled
REML criterion at convergence: 734.9096
Random effects:
 Groups   Name        Std.Dev.
 monkey   (Intercept) 0.3091  
 Residual             0.4253  
Number of obs: 485, groups:  monkey, 243
Fixed Effects:
     (Intercept)       age_centred              day2  age_centred:day2  
        -0.04114           0.01016           0.10507           0.08507  </code></pre>
</div>
</div>
<p>We can also compare the fit using leave-one-out cross validation. This is similar to AIC, but more directly interpretable. It is the average of <img src="https://latex.codecogs.com/png.latex?%0A%5Clog%20p_%5Ctext%7Bposterior%20predictive%7D(y_%7Bij%7D%20%5Cmid%20y_%7B-ij%7D)%20=%20%5Clog%20%5Cleft(%5Cint_%5Ctheta%20p(y_%7Bij%7D%20%5Cmid%20%5Ctheta)p(%5Ctheta%20%5Cmid%20y_%7B-ij%7D)%5C,%20d%5Ctheta%5Cright),%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> is a vector of all of the parameters in the model. The notation <img src="https://latex.codecogs.com/png.latex?y_%7B-ij%7D"> is the data <em>without</em> the <img src="https://latex.codecogs.com/png.latex?ij">th observation. This average is sometimes called the <em>expected log predictive density</em> or elpd.</p>
<p>To compare it with the two linear regression models, I need to fit them in <code>brms</code>. I will use a <img src="https://latex.codecogs.com/png.latex?N(0,1)"> prior for the monkey intercepts and the same priors as the previous model for the other parameters.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb27-1">priors_lm <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span>  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prior</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">normal</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">class =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"b"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb27-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prior</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">normal</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">coef =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"age_centred"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb27-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prior</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">normal</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">coef =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"age_centred:day2"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb27-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prior</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">normal</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">coef =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"day2"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb27-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prior</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">normal</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">class =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Intercept"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb27-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prior</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">normal</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">class =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sigma"</span>)</span>
<span id="cb27-7"></span>
<span id="cb27-8">posterior_nopool <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">brm</span>(</span>
<span id="cb27-9">  active_bins_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> age_centred <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> day <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> monkey, </span>
<span id="cb27-10">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> activity_2mins_scaled,</span>
<span id="cb27-11">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prior =</span> priors_lm,</span>
<span id="cb27-12">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">backend =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cmdstanr"</span>,</span>
<span id="cb27-13">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cores =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,</span>
<span id="cb27-14">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">refresh =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Running MCMC with 4 parallel chains...

Chain 1 finished in 4.5 seconds.
Chain 3 finished in 4.5 seconds.
Chain 2 finished in 4.5 seconds.
Chain 4 finished in 4.5 seconds.

All 4 chains finished successfully.
Mean chain execution time: 4.5 seconds.
Total execution time: 4.7 seconds.</code></pre>
</div>
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb29-1">posterior_pool <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">brm</span>(</span>
<span id="cb29-2">  active_bins_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> age_centred <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> day, </span>
<span id="cb29-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> activity_2mins_scaled,</span>
<span id="cb29-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prior =</span> priors_lm,</span>
<span id="cb29-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">backend =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cmdstanr"</span>,</span>
<span id="cb29-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cores =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,</span>
<span id="cb29-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">refresh =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Running MCMC with 4 parallel chains...

Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.
Chain 3 finished in 0.1 seconds.
Chain 4 finished in 0.1 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.3 seconds.</code></pre>
</div>
</div>
<p>We an now use the <code>loo_compare</code> function to compare the models. By default, the best model is listed first and the other models are listed below it with the difference in elpd values given. To do this, we need to tell <code>brms</code> to compute the <code>loo</code> criterion using the <code>add_criterion</code> function.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb31-1">posterior_draws <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_criterion</span>(posterior_draws, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"loo"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: Found 2 observations with a pareto_k &gt; 0.7 in model 'posterior_draws'.
It is recommended to set 'moment_match = TRUE' in order to perform moment
matching for problematic observations.</code></pre>
</div>
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb33-1">posterior_nopool <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_criterion</span>(posterior_nopool, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"loo"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: Found 63 observations with a pareto_k &gt; 0.7 in model
'posterior_nopool'. It is recommended to set 'moment_match = TRUE' in order to
perform moment matching for problematic observations.</code></pre>
</div>
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb35-1">posterior_pool <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_criterion</span>(posterior_pool, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"loo"</span>)</span>
<span id="cb35-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">loo_compare</span>(posterior_draws, posterior_nopool, posterior_pool)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>                 elpd_diff se_diff
posterior_draws    0.0       0.0  
posterior_pool   -29.0       7.4  
posterior_nopool -53.3       9.0  </code></pre>
</div>
</div>
<p>There are some warnings there suggesting that we could recompute these using a slower method, but for the purposes of today I’m not going to do that and I shall declare that the multilevel model performs <em>far better</em> than the other two models.</p>
</section>
<section id="post-experiment-prophylaxis" class="level2">
<h2 class="anchored" data-anchor-id="post-experiment-prophylaxis">Post-experiment prophylaxis</h2>
<p>Of course, we would be fools to just assume that because we fit a model and compared it to some other models, the model is a good representation of the data. To do that, we need to look at some posterior checks.</p>
<p>The easiest thing to look at is the predictions themselves.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb37-1">fitted <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> activity_2mins_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb37-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cbind</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">posterior_predict</span>(posterior_draws,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ndraws =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb37-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">207</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"draw"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"fitted"</span>)</span>
<span id="cb37-4"></span>
<span id="cb37-5">day_labs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Day 1"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Day 2"</span>)</span>
<span id="cb37-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(day_labs) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"1"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2"</span>)</span>
<span id="cb37-7"></span>
<span id="cb37-8">violin_plot <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> fitted <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb37-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>( <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>age, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>fitted <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> active_bins_centre, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">group =</span> age)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb37-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_violin</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lightgrey"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb37-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> active_bins), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb37-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>day, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labeller =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labeller</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">day =</span> day_labs)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb37-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>() </span>
<span id="cb37-14">violin_plot</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey/everybodys-got-something-to-hide-except-me-and-my-monkey_files/figure-html/unnamed-chunk-16-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>That appears to be a reasonably good fit, although it’s possible that the prediction intervals are a bit wide. We can also look at the plot of the posterior residuals vs the fitted values. Here the fitted values are the mean of the posterior predictive distribution.</p>
<p>Next, let’s check for evidence of non-linearity in <code>age</code>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb38-1">plot_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> activity_2mins_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb38-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fitted_mean =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colMeans</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">posterior_epred</span>(posterior_draws,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ndraws =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)))</span>
<span id="cb38-3"></span>
<span id="cb38-4">age_plot <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> plot_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb38-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> age, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> active_bins_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> fitted_mean)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb38-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb38-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>()</span>
<span id="cb38-8">age_plot</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey/everybodys-got-something-to-hide-except-me-and-my-monkey_files/figure-html/unnamed-chunk-17-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>There doesn’t seem to be any obvious evidence of non-linearity in the residuals, which suggests the linear model for age was sufficient.</p>
<p>We can also check the distributional assumption<sup>31</sup> that the residuals <img src="https://latex.codecogs.com/png.latex?%0Ar_%7Bij%7D%20=%20y_%7Bij%7D%20-%20%5Cmu_j%0A"> have a Gaussian distribution. We can check this with a qq-plot. Here we are using the posterior mean to define our residuals.</p>
<p>We can look at the qq-plot to see how we’re doing with normality.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb39-1">distribution_plot <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> plot_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample =</span> (active_bins_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> fitted_mean)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sd</span>(active_bins_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> fitted_mean))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb39-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_qq</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb39-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">slope =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">intercept =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dashed"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb39-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_classic</span>()</span>
<span id="cb39-5">distribution_plot</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey/everybodys-got-something-to-hide-except-me-and-my-monkey_files/figure-html/unnamed-chunk-18-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>That’s not too bad. A bit of a deviation from normality in the tails but nothing that would make me weep. It could well be an artifact of how I defined and normalised the residuals.</p>
<p>We can also look at the so-called k-hat plot, which can be useful for finding high-leverage observations in general models.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb40-1">loo_posterior <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">LOO</span>(posterior_draws) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#warnings suppressed</span></span>
<span id="cb40-2">loo_posterior</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Computed from 4000 by 485 log-likelihood matrix

         Estimate   SE
elpd_loo   -349.8 12.4
p_loo       117.8  5.2
looic       699.7 24.7
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     418   86.2%   902       
 (0.5, 0.7]   (ok)        65   13.4%   443       
   (0.7, 1]   (bad)        1    0.2%   272       
   (1, Inf)   (very bad)   1    0.2%   59        
See help('pareto-k-diagnostic') for details.</code></pre>
</div>
<div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb42-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(loo_posterior)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey/everybodys-got-something-to-hide-except-me-and-my-monkey_files/figure-html/unnamed-chunk-19-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>This suggests that observations 393, 394 are potentially high leverage and we should check them more carefully. I won’t be doing that today.</p>
<p>Finally, let’s look at the residuals vs the fitted values. This is a commonly used diagnostic plot in linear regression and it can be very useful for visually detecting non-linear patterns and heteroskedasticity in the residuals. So let’s make the plot<sup>32</sup>.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb43-1">problem_plot <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> plot_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb43-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> fitted_mean, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> active_bins_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> fitted_mean)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb43-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb43-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_smooth</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lm"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">se =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dashed"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb43-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>day) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb43-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span>  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb43-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlim</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb43-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylim</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb43-9">problem_plot</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey/everybodys-got-something-to-hide-except-me-and-my-monkey_files/figure-html/unnamed-chunk-20-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Hmmmm. That’s not <em>excellent</em>. The stripes are related to the 8 distinct values the response can take, but there is definitely a trend in the residuals. In particular, we are under-predicting small values and over-predicting large values. <em>There is something here and we will look into it</em>!</p>
</section>
<section id="understanding-diagnostic-plots-from-multilevel-models" class="level2">
<h2 class="anchored" data-anchor-id="understanding-diagnostic-plots-from-multilevel-models">Understanding diagnostic plots from multilevel models</h2>
<p>The thing is, multilevel models are notorious for having patterns that are essentially a product of the data design and not of any type of statistical misspecification. In a really great paper that you should all read, <a href="https://arxiv.org/pdf/1502.06988.pdf">Adam Loy, Heike Hofmann, and Di Cook</a> talk extensively about the challenges with interpreting diagnostic plots for linear mixed effects models<sup>33</sup>.</p>
<p>I’m not going to fully follow their recommendations, mostly because I’m too lazy<sup>34</sup> to write a for loop, but I am going to appropriate the guts of their idea.</p>
<p>They note that strange patterns can occur in diagnostic plots <em>even for correctly specified models</em>. Moreover, we simply do not know what these patters will be. It’s too complex a function of the design, the structure, the data, and the potential misspecification. That sounds bad, but they note that <em>we don’t need to know what pattern to expect</em>. Why not? Because we can simulate it!</p>
<p>So this is the idea: Let’s simulate some fake<sup>35</sup> data from a correctly specified model that otherwise matches with our data. We can then compare the diagnostic plots from the fake data with diagnostic plots from the real data and see if the patterns are meaningfully different.</p>
<p>In order to do this, we should have a method to construct <em>multiple</em> fake data sets. Why? Well a plot is nothing but another test statistic and we <em>must</em> take this variability into account.</p>
<p>(That said, do what I say, not what I do. This is a blog. I’m not going to code well enough to make this clean and straightforward, so I’m just going to do one.)</p>
<p>There is an entire theory of <a href="https://royalsocietypublishing.org/doi/10.1098/rsta.2009.0120"><em>visual inference</em></a> that uses these lineups of diagnostic plots, where one uses the real data and the rest use realisations of the null data, that is really quite interesting and <em>well</em> beyond the scope of this post. But if you want to know more, read the <a href="https://arxiv.org/pdf/1502.06988.pdf">Low, Hoffman, and Cook</a> paper!</p>
<section id="making-new-data" class="level3">
<h3 class="anchored" data-anchor-id="making-new-data">Making new data</h3>
<p>The first thing that we need to do is to work out how to simulate fake data from a correctly specified model with the same structure. Following the Low etc paper, I’m going to do a simple parameteric bootstrap, where I take the posterior medians of the fitted distribution and simulate data from them.</p>
<p>That said, there are a bunch of other options. Specifically, we have a whole bag of samples from our posterior distribution and it would be possible to use that to select values of<sup>36</sup> <img src="https://latex.codecogs.com/png.latex?(%5Cmu,%20%5Cbeta,%20%5Ctau,%20%5Csigma)"> for our simulation.</p>
<p>So let’s make some fake data and fit the model to it!</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb44-1">monkey_effect <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">monkey =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>(activity_2mins_scaled<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>monkey), </span>
<span id="cb44-2">                        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">monkey_effect =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">243</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.31</span>))</span>
<span id="cb44-3">data_fake <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> activity_2mins_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb44-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(monkey_effect, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"monkey"</span>)  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb44-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">active_bins_scaled =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(age_centred),</span>
<span id="cb44-6">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.04</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.01</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> age_centred <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb44-7">              monkey_effect <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">if_else</span>(day <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2"</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.085</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>age_centred, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>), </span>
<span id="cb44-8">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.43</span>))</span>
<span id="cb44-9">                                              </span>
<span id="cb44-10">posterior_draws_fake <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">brm</span>(formula, </span>
<span id="cb44-11">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> data_fake,</span>
<span id="cb44-12">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prior =</span> priors,</span>
<span id="cb44-13">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">backend =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cmdstanr"</span>,</span>
<span id="cb44-14">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cores =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,</span>
<span id="cb44-15">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">refresh =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Running MCMC with 4 parallel chains...

Chain 1 finished in 1.6 seconds.
Chain 2 finished in 1.6 seconds.
Chain 3 finished in 1.6 seconds.
Chain 4 finished in 1.6 seconds.

All 4 chains finished successfully.
Mean chain execution time: 1.6 seconds.
Total execution time: 1.8 seconds.</code></pre>
</div>
</div>
</section>
<section id="the-good-plots" class="level3">
<h3 class="anchored" data-anchor-id="the-good-plots">The good plots</h3>
<p>First up, let’s look at the violin plot.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb46" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb46-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(cowplot)</span>
<span id="cb46-2">fitted_fake <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> data_fake <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb46-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cbind</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">posterior_predict</span>(posterior_draws_fake,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ndraws =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb46-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">207</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"draw"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"fitted"</span>)</span>
<span id="cb46-5"></span>
<span id="cb46-6">day_labs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Day 1"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Day 2"</span>)</span>
<span id="cb46-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(day_labs) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"1"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2"</span>)</span>
<span id="cb46-8"></span>
<span id="cb46-9">violin_fake <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> fitted_fake <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb46-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>( <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>age, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>fitted <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> active_bins_centre, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">group =</span> age)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb46-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_violin</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lightgrey"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb46-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> active_bins), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb46-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>day, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labeller =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labeller</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">day =</span> day_labs)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb46-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>() </span>
<span id="cb46-15">  </span>
<span id="cb46-16"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_grid</span>(violin_plot, violin_fake, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Real"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Fake"</span>))</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey/everybodys-got-something-to-hide-except-me-and-my-monkey_files/figure-html/unnamed-chunk-22-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>That’s very similar to our data plot.</p>
<p>Next up, we will look at the residuals ordered by age</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb47" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb47-1">plot_data_fake <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> data_fake <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb47-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fitted_mean =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colMeans</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">posterior_epred</span>(posterior_draws_fake,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ndraws =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)))</span>
<span id="cb47-3"></span>
<span id="cb47-4">age_fake <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> plot_data_fake <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb47-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> age, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> active_bins_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> fitted_mean)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb47-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb47-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>()</span>
<span id="cb47-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_grid</span>(age_plot, age_fake, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Real"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Fake"</span>))</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey/everybodys-got-something-to-hide-except-me-and-my-monkey_files/figure-html/unnamed-chunk-23-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Fabulous!</p>
<p>Now let’s check the distributional assumption on the residuals!</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb48" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb48-1">distribution_fake <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> plot_data_fake <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb48-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample =</span> (active_bins_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> fitted_mean)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sd</span>(active_bins_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> fitted_mean))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb48-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_qq</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb48-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">slope =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">intercept =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dashed"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb48-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_classic</span>()</span>
<span id="cb48-6"></span>
<span id="cb48-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_grid</span>(distribution_plot, distribution_fake, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Real"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Fake"</span>))</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey/everybodys-got-something-to-hide-except-me-and-my-monkey_files/figure-html/unnamed-chunk-24-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Excellent!</p>
<p>Finally, we can look at the k-hat plot. Because I’m lazy, I’m not going to put them side by side. You can scroll.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb49" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb49-1">loo_fake <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">LOO</span>(posterior_draws_fake)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: Found 4 observations with a pareto_k &gt; 0.7 in model
'posterior_draws_fake'. It is recommended to set 'moment_match = TRUE' in order
to perform moment matching for problematic observations.</code></pre>
</div>
<div class="sourceCode cell-code" id="cb51" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb51-1">loo_fake</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Computed from 4000 by 485 log-likelihood matrix

         Estimate   SE
elpd_loo   -372.1 14.9
p_loo       115.4  6.1
looic       744.2 29.7
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     422   87.0%   579       
 (0.5, 0.7]   (ok)        59   12.2%   220       
   (0.7, 1]   (bad)        4    0.8%   118       
   (1, Inf)   (very bad)   0    0.0%   &lt;NA&gt;      
See help('pareto-k-diagnostic') for details.</code></pre>
</div>
<div class="sourceCode cell-code" id="cb53" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb53-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(loo_fake)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey/everybodys-got-something-to-hide-except-me-and-my-monkey_files/figure-html/unnamed-chunk-25-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>And look: we get some extreme values. (Depending on the run we get more or less). This suggests that while it would be useful to look at the data points flagged by the k-hat statistic, it may just be sampling variation.;</p>
<p>All of this suggests our model assumptions are not being grossly violated. All except for that residual vs fitted values plot…</p>
</section>
<section id="the-haunted-residual-vs-fitted-plot" class="level3">
<h3 class="anchored" data-anchor-id="the-haunted-residual-vs-fitted-plot">The haunted residual vs fitted plot</h3>
<p>Now let’s look at our residual vs fitted plot.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb54" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb54-1">problem_fake <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> plot_data_fake <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb54-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> fitted_mean, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> active_bins_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> fitted_mean)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb54-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb54-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_smooth</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lm"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">se =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dashed"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb54-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>day) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb54-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span>  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb54-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlim</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb54-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylim</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb54-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_grid</span>(problem_plot, problem_fake, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Real"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Fake"</span>))</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey/everybodys-got-something-to-hide-except-me-and-my-monkey_files/figure-html/unnamed-chunk-26-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>And what do you know! They look the same. (Well, minus the discretisation artefacts.)</p>
</section>
<section id="so-what-the-hell-is-going-on" class="level3">
<h3 class="anchored" data-anchor-id="so-what-the-hell-is-going-on">So what the hell is going on?</h3>
<p>Great question! It turns out that this is one of those cases where our intuition from linear models <em>does not</em> transfer over to multilevel models.</p>
<p>We can actually reason this out by thinking about a model where we have no covariates.</p>
<p>If we have no pooling then the observations for every monkey are, essentially, averaged to get our estimate of <img src="https://latex.codecogs.com/png.latex?%5Cmu_j">. If we repeat this, we will find that our <img src="https://latex.codecogs.com/png.latex?%5Cmu_j"> are basically<sup>37</sup> unbiased and the corresponding residual <img src="https://latex.codecogs.com/png.latex?%0Ar_%7Bij%7D%20=%20y_%7Bij%7D%20-%20%5Cmu_j%0A"> will have mean zero.</p>
<p>But that’s not what happens when we have partial pooling. When we have partial pooling we are <em>combining</em> our naive average<sup>38</sup> <img src="https://latex.codecogs.com/png.latex?%5Cbar%20y_j"> with the global average <img src="https://latex.codecogs.com/png.latex?%5Cmu"> in a way that accounts for the size of group <img src="https://latex.codecogs.com/png.latex?j"> relative to other groups as well as the within-group variability relative to the between-group variability.</p>
<details>
<summary>
Expand for maths. Just a little
</summary>
There is, in fact, a formula for it. Just in case you’re a formula sort of person. The posterior estimate for a Gaussian multilevel model with an intercept but no covariates is <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B1%7D%7B1%20+%5Cfrac%7B%5Csigma%5E2/n%7D%7B%5Ctau%5E2%7D%7D%5Cleft(%5Cbar%7By%7D_j%20+%20%5Cfrac%7B%5Csigma%5E2/n%7D%7B%5Ctau%5E2%7D%20%5Cmu%5Cright).%0A"> When <img src="https://latex.codecogs.com/png.latex?%5Csigma/%5Csqrt%7Bn%7D"> is small, which happens when the sampling standard deviation of <img src="https://latex.codecogs.com/png.latex?%5Cbar%20y_j"> is small relative to the between group variation <img src="https://latex.codecogs.com/png.latex?%5Ctau">, this is almost equal to <img src="https://latex.codecogs.com/png.latex?%5Cbar%7By%7D_j"> and there is almost no pooling. On the other hand, when <img src="https://latex.codecogs.com/png.latex?%5Csigma/%5Csqrt%7Bn%7D"> is large relative to <img src="https://latex.codecogs.com/png.latex?%5Ctau">, then the estimate of <img src="https://latex.codecogs.com/png.latex?%5Cmu_j"> will be very close to the overall mean <img src="https://latex.codecogs.com/png.latex?%5Cmu">.
</details>
<p>The short version is that there is some magical number <img src="https://latex.codecogs.com/png.latex?%5Calpha">, which depends on <img src="https://latex.codecogs.com/png.latex?%5Ctau">, <img src="https://latex.codecogs.com/png.latex?%5Csigma">, and <img src="https://latex.codecogs.com/png.latex?n_j"> such that <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%20%5Cmu_j%20=%20%5Calpha%20%5Cbar%7By%7D_j%20+%20(1-%5Calpha)%20%5Cmu.%0A"> Because of this, the residuals <img src="https://latex.codecogs.com/png.latex?%0Ar_%7Bij%7D%20=%20y_j%20-%20%5Calpha%20%5Cbar%7By_j%7D%20-%20(1-%5Calpha)%5Cmu%0A"> are suddenly <em>not</em> going to have mean zero.</p>
<p>In fact, if we think about it a bit more, we will realise that the model will drag extreme groups to the centre, which accounts for the positive slope in the residuals vs the fitted values.</p>
<p>The slope in this example is quite extreme because the groups are very small (only one or two individuals). But it is a general phenomenon and it’s discussed extensively in Chapter 7 of <a href="http://www.biostat.umn.edu/~hodges/RPLMBook/RPLMBookpage.htm">Jim Hodges’ excellent book</a>. His suggestion is that there isn’t really a good, general way to remove the trend. But that doesn’t mean the plot is useless. It is still able to pinpoint outliers and heteroskedasticity. You’ve just got to tilt your head.</p>
<p>But for the purposes of today we can notice that there don’t seem to be any extreme outliers so everything is probably ok.</p>
</section>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>So what have we done? Well we’ve gone through the process of fitting and scruitinising a simple Bayesian multilevel model. We’ve talked about some of the challenges associated with graphical diagnostics for structured data. And we’ve all<sup>39</sup> learnt something about the residual-vs-fitted plot for a multilevel model.</p>
<p>Most importantly, we’ve all learnt the value of using fake data simulated from the posterior model to help us understand our diagnostics.</p>
<p>There is more to the scientific story here. It turns out that while there is no effect over 2 minutes, there is <a href="https://royalsocietypublishing.org/doi/10.1098/rsos.200316">a slight effect over 20 minutes</a>. So the conceptual replication failed, but still found some interesting things.</p>
<p>Of course, I’ve ignored one big elephant in the room: That data was discrete. In the end, our distributional diagnostics didn’t throw up any massive red flags, but nevertheless it could be an interesting exercise to see what happens if we use a more problem-adapted likelihood.</p>
<p>Last, and certainly not least, I barely scratched the surface<sup>40</sup> of the <a href="https://arxiv.org/pdf/1502.06988.pdf">Loy, Hoffman, and Cook</a> paper. Anyone who is interested in fitting Gaussian multilevel models should definitely give it a read.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Mark insisted that I like to his google scholar rather than his website. He’s cute that way.↩︎</p></li>
<li id="fn2"><p>Mark wants me to tell you that he’s not vain he’s just moving. Sure Jan.↩︎</p></li>
<li id="fn3"><p>I know that marmosets suffer from lesbian bed death, but I’m told that a marmoset is not a macaque, which in turn is not a macaw. Ecology is fascinating.↩︎</p></li>
<li id="fn4"><p>A real problem in the world is that there aren’t enough monkeys for animal research at the best of times. Once you need aged monkeys, it’s an even smaller population. Non-human primate research is <em>hard</em>.↩︎</p></li>
<li id="fn5"><p>Actually 244, but one of them turned out to be blind. Animal research is a journey.↩︎</p></li>
<li id="fn6"><p>It turns out that some of the monkeys didn’t want to give up the puzzle after 20 minutes. One held out for 72 minutes before the data collection ended. Cheeky monkeys.↩︎</p></li>
<li id="fn7"><p>Did Mark make me do unspeakable, degrading, borderline immoral things to get the data? No.&nbsp;It’s open source. Truly the first time I’ve been disappointed that something was open source.↩︎</p></li>
<li id="fn8"><p>If statisticians abandoned linear regression we would have nothing left. We would be desiccated husks propping up the bar at 3am talking about how we used to do loads of lines in the 80s.↩︎</p></li>
<li id="fn9"><p>Our perfect amount of pool? I don’t know how metaphors work↩︎</p></li>
<li id="fn10"><p>They <em>always</em> take this form if there is a countable collection of exchangeable random variables. For a finite set there are a few more options. But no one talks about those.↩︎</p></li>
<li id="fn11"><p>monkeys↩︎</p></li>
<li id="fn12"><p>Also known as a mixed effects or a linear mixed effects model.↩︎</p></li>
<li id="fn13"><p>There are <em>many</em> other ways to represnt Gaussian multilevel models. My former colleague Emi Tanaka and Francis Hui wrote a <a href="https://arxiv.org/abs/1911.08628">great paper</a> on this topic.↩︎</p></li>
<li id="fn14"><p>Some particularly bold and foolish people take this to mean that priors aren’t important. They usually get their arse handed to them the moment they try to fit an even mildly complex model.↩︎</p></li>
<li id="fn15"><p>A non-exhaustive set of weird things: categorical regressors with a rare category, tail parameters, mixture models↩︎</p></li>
<li id="fn16"><p>There are situations where this is not true. For instance if you have a log or logit link function you can put reasonable bounds on your coefficients regardless of the scaling of your data. That said, the computational procedures <em>always</em> appreciate a bit of scaling. If there’s one thing that computers hate more that big numbers it’s small numbers.↩︎</p></li>
<li id="fn17"><p>Of course, we know that the there are only 8 fifteen second intervals in two minutes, so we could use this information to make a data-independent scaling. To be brutally francis with you, that’s what you should probably do in this situation, but I’m trying to be pedagogical so let’s at least think about scaling it by the standard deviation.↩︎</p></li>
<li id="fn18"><p>Fixed scaling is always easier than data-dependent scaling↩︎</p></li>
<li id="fn19"><p>A real trick for young players is scaling new data by the mean and standard deviation of the new data rather than the old data. That’s a very subtle bug that can be <em>very</em> hard to squash.↩︎</p></li>
<li id="fn20"><p>The <code>tidymodels</code> package in R is a great example of an ecosystem that does this properly. <a href="https://www.tmwr.org">Max and Julia’s book</a> on using <code>tidymodels</code> is very excellent and well worth a read.↩︎</p></li>
<li id="fn21"><p>Of all of the things in this post, this has been the most aggressively fact checked one↩︎</p></li>
<li id="fn22"><p>In prior width and on grindr, you should always expect that he’s rounding up.↩︎</p></li>
<li id="fn23"><p>In some places, we would call this a random effect.↩︎</p></li>
<li id="fn24"><p>He is very lovely. Many people would prefer that I was him.↩︎</p></li>
<li id="fn25"><p>It’s possible the the prior on <img src="https://latex.codecogs.com/png.latex?%5Ctau"> might be too wide. If we were doing a logistic regression, these priors would definitely be too wide. And if we had a lot of different random terms (eg if we had lots of different species or lots of different labs) then they would also probably be too wide. But they are better than not having priors.↩︎</p></li>
<li id="fn26"><p>Not the most computationally efficient, but the easiest. Also because it’s the same code we will later use to fit the model, we are evaluating the priors that are actually used and not the ones that we think we’re using.↩︎</p></li>
<li id="fn27"><p> It’s number 88, but because our prior is exchangeable it does not matter which monkey we do this for!↩︎</p></li>
<li id="fn28"><p>I also checked different values of <code>age</code> as well as looking at the posterior mean (via <code>posterior_epred</code>) and the conclusions stay the same.↩︎</p></li>
<li id="fn29"><p>The numbers will never be exactly equal, but they are of similar orders of magnitude.↩︎</p></li>
<li id="fn30"><p>Or if you get some sort of error or warning from <code>lme4</code>↩︎</p></li>
<li id="fn31"><p>So there’s a wrinkle here. Technically, all of the residuals have different variances, which is annoying. You typically studentise them using the leverage scores, but this is a touch trickier for multilevel models. Chapter 7 of <a href="https://www.google.com/search?client=safari&amp;rls=en&amp;q=richly+parametrized+linear+models&amp;ie=UTF-8&amp;oe=UTF-8">Jim Hodges’s excellent book</a> contains a really good discussion.↩︎</p></li>
<li id="fn32"><p>Once again, we are not studentizing the residuals. I’m sorry.↩︎</p></li>
<li id="fn33"><p>Another name for a multilevel model with a Gaussian response↩︎</p></li>
<li id="fn34"><p>Also because all of my data plots are gonna be stripey as hell, and that kinda destroys the point of visual inference.↩︎</p></li>
<li id="fn35"><p>They call it <em>null data</em>.↩︎</p></li>
<li id="fn36"><p>Note that I am <em>not</em> using values of <img src="https://latex.codecogs.com/png.latex?%5Cmu_j">! I will simulate those from the normal distribution to ensure correct model specification. For the same reason, I am not using a residual bootstrap. The aim here is not to assess uncertainty so much as it is to ↩︎</p></li>
<li id="fn37"><p>This is a bit more complex when you’re Bayesian, but the intuition still holds. The difference is that now it is asymptotic↩︎</p></li>
<li id="fn38"><p>This is the average of all observations in group j. <img src="https://latex.codecogs.com/png.latex?%0A%5Cbar%20y_j%20=%20%5Cfrac%7B1%7D%7Bn_j%7D%20%5Csum_%7Bi=1%7D%5E%7Bn_j%7D%20y_%7Bij%7D.%0A">↩︎</p></li>
<li id="fn39"><p>I mean, some of us knew this. Personally, I only remembered after I saw it and swore a bit.↩︎</p></li>
<li id="fn40"><p>In particular, they have an interesting discussion on assessing the distributional assumption for <img src="https://latex.codecogs.com/png.latex?%5Cmu_j">.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2022,
  author = {Simpson, Dan},
  title = {A First Look at Multilevel Regression; or {Everybody’s} Got
    Something to Hide Except Me and My Macaques},
  date = {2022-09-06},
  url = {https://dansblog.netlify.app/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2022" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2022. <span>“A First Look at Multilevel Regression; or
Everybody’s Got Something to Hide Except Me and My Macaques.”</span>
September 6, 2022. <a href="https://dansblog.netlify.app/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey.html">https://dansblog.netlify.app/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey.html</a>.
</div></div></section></div> ]]></description>
  <category>Multilevel models</category>
  <category>Visual diagnostics</category>
  <category>Prior distributions</category>
  <category>fundamentals</category>
  <guid>https://dansblog.netlify.app/posts/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey/everybodys-got-something-to-hide-except-me-and-my-monkey.html</guid>
  <pubDate>Mon, 05 Sep 2022 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2022-09-04-everybodys-got-something-to-hide-except-me-and-my-monkey/mark.png" medium="image" type="image/png" height="81" width="144"/>
</item>
<item>
  <title>Priors part 4: Specifying priors that appropriately penalise complexity</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2022-08-29-priors4/priors4.html</link>
  <description><![CDATA[ 





<p>At some point in the distant past, I wrote three posts about prior distributions. The <a href="https://dansblog.netlify.app/posts/2021-10-14-priors1/priors1.html">first</a> was very basic, because why not. The <a href="https://dansblog.netlify.app/posts/2021-10-14-priors2/priors2.html">second</a> one talked about conjugate priors. The <a href="https://dansblog.netlify.app/posts/2021-10-15-priors3/priors3.html">third</a> one talked about so-called objective priors.</p>
<p>I am suddenly<sup>1</sup> of a mood to write some more on this<sup>2</sup> topic.</p>
<p>The thing is, so far I’ve only really talked about methods for setting prior distributions that I don’t particularly care for. Fuck that. Let’s talk about things I like. There is enough negative energy<sup>3</sup> in the world.</p>
<p>So let’s talk about priors. But the good stuff. The aim is to give my answer to the question “how should you set a prior distribution?”.</p>
<section id="bro-do-you-even-know-what-a-parameter-is" class="level2">
<h2 class="anchored" data-anchor-id="bro-do-you-even-know-what-a-parameter-is">Bro do you even know what a parameter is?</h2>
<p>You don’t. No one does. They’re not real.</p>
<p>Parameters are polite fictions that we use to get through the day. They’re our weapons of mass destruction. They’re the magazines we only bought for the articles. They are our girlfriends who live in Canada<sup>4</sup>.</p>
<p>One way we can see this is to ask ourselves a simple<sup>5</sup>: <img src="https://latex.codecogs.com/png.latex?%0Ay_i%20%5Csim%20%5Ctext%7BNegative-Binomial%7D(%5Cmu,%20%5Calpha),%20%5Cqquad%20i%20=%201,%5Cldots,%20n%5Ctext%7B?%7D%0A"></p>
<p>The answer<sup>6</sup> <sup>7</sup> would be two.</p>
<p>But let me ask a different question. How many parameters are there in this model<sup>8</sup> <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Ay_i%5Cmid%20u_i%20&amp;%5Csim%20%5Ctext%7BPoisson%7D(%5Cmu%20u_i)%20%5C%5C%0Au_i%20&amp;%5Csim%20%5Ctext%7BGamma%7D(%5Calpha%5E%7B-1%7D,%20%5Calpha%5E%7B-1%7D),%5Cqquad%20i=1,%5Cldots,%20n%5Ctext%7B?%7D%0A%5Cend%7Balign*%7D"></p>
<p>One answer to this question would be <img src="https://latex.codecogs.com/png.latex?n+2">. In this interpretation of the question everything in the model that isn’t directly observed is a parameter.</p>
<p>But there is another view.</p>
<p>Mathematically, these two models are equivalent. That is, if you marginalise<sup>9</sup> out the <img src="https://latex.codecogs.com/png.latex?u_i"> you get <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5CPr(y=k)%20&amp;=%5Cfrac%7B%5Cmu%5Ek%5Calpha%5E%7B-1/%5Calpha%7D%7D%7B%5CGamma(%5Calpha%5E%7B-1%7D)%5CGamma(k+1)%7D%20%5Cint_0%5E%5Cinfty%20u%5Ek%20e%5E%7B-%5Cmu%20u%7D%20u%5E%7B1/%5Calpha-1%7De%5E%7B-u/%5Calpha%7D%5C,du%20%5C%5C%0A&amp;=%20%5Cfrac%7B%5Cmu%5Ek%5Calpha%5E%7B-1/%5Calpha%7D%7D%7B%5CGamma(%5Calpha%5E%7B-1%7D)%5CGamma(k+1)%7D%5Cint_0%5E%5Cinfty%20u%5E%7Bk%20+%201/%5Calpha-1%7De%5E%7B-(%5Cmu%20+%20%5Calpha%5E%7B-1%7D)u%7D%5C,du%20%5C%5C%0A&amp;=%20%5Cfrac%7B%5Cmu%5Ek%5Calpha%5E%7B-1/%5Calpha%7D%7D%7B%5CGamma(%5Calpha%5E%7B-1%7D)%5CGamma(k+1)%7D%5Cint_0%5E%5Cinfty%20%5Cleft(%5Cfrac%7Bt%7D%7B%5Cmu+%5Calpha%5E%7B-1%7D%7D%5Cright)%5E%7Bk%20+%201/%5Calpha-1%7De%5E%7B-t%7D%5Cfrac%7B1%7D%7B%5Cmu%20+%20%5Calpha%5E%7B-1%7D%7D%5C,dt%20%5C%5C%0A&amp;=%5Cfrac%7B%5CGamma(k%20+%20%5Calpha%5E%7B-1%7D)%7D%7B%5CGamma(%5Calpha%5E%7B-1%7D)%5CGamma(k+1)%7D%20%5Cleft(%5Cfrac%7B%5Cmu%7D%7B%5Cmu%20+%20%5Calpha%5E%7B-1%7D%7D%5Cright)%5Ek%20%5Cleft(%5Cfrac%7B%5Calpha%5E%7B-1%7D%7D%7B%5Cmu%20+%20%5Calpha%5E%7B-1%7D%7D%5Cright)%5E%7B1/%5Calpha%7D%20.%0A%5Cend%7Balign*%7D"> This is <em>exactly</em> the negative binomial distribution with mean <img src="https://latex.codecogs.com/png.latex?%5Cmu"> and variance <img src="https://latex.codecogs.com/png.latex?%5Cmu(1%20+%20%5Calpha%20%5Cmu)">.</p>
<p>So maybe there are two parameters.</p>
<p>Does it make a difference? Sometimes. For instance, if you were following ordinary practice in Bayesian machine learning, you would (approximately) marginalise out <img src="https://latex.codecogs.com/png.latex?(%5Cmu,%20%5Clambda)"> in the first model, but in the second model you’d probably treat them as tuning hyper-parameters<sup>10</sup> in the second and optimise<sup>11</sup> over them.</p>
<p>Moreover, in the second model we can ask <em>what other priors could we put on the</em> <img src="https://latex.codecogs.com/png.latex?u_i"><em>?</em>. There is no equivalent question for the first model. This could be useful, for instance, if we believe that the overdispersion may differ among population groups. It is considerably easier to extend the random effects formulation into a multilevel model.</p>
<p>Ok. So it doesn’t really matter too much. It really depends on what you’re going to do with the model when you’re breaking your model into <em>things that we need to set priors for</em> and <em>things where the priors are a structural part of the model</em>.</p>
</section>
<section id="a-hello-boys-into-a-party-date-on-flexibility" class="level2">
<h2 class="anchored" data-anchor-id="a-hello-boys-into-a-party-date-on-flexibility">A hello boys into a party date: on flexibility</h2>
<p>There are a lot of ways to set prior distributions. I’ve covered some in previous posts and there are certainly more. But today I’m going to focus on one constructive method that I’m particular fond of: <a href="https://projecteuclid.org/journals/statistical-science/volume-32/issue-1/Penalising-Model-Component-Complexity--A-Principled-Practical-Approach-to/10.1214/16-STS576.full">penalised complexity priors</a>.</p>
<p>These priors fall out from a certain way of seeing parameters. The idea is that some parameters in a model function as <em>flexibility parameters</em>. These naturally have a base value, which corresponds to the simplest model that they index. I’ll refer to the distribution you get when the parameter takes its base value as the <em>base model</em>.</p>
<div id="exm-neg-binom" class="theorem example">
<p><span class="theorem-title"><strong>Example 1 (Overdispersion of a negative binomial)</strong></span> The negative binomial distribution has two parameters: a mean <img src="https://latex.codecogs.com/png.latex?%5Cmu"> and an overdispersion parameter <img src="https://latex.codecogs.com/png.latex?%5Calpha"> so the variance is <img src="https://latex.codecogs.com/png.latex?%5Cmu(1%20+%20%5Calpha%20%5Cmu)">. The mean parameter is <em>not</em> a flexibility parameter. Conceptually, changing the mean<sup>12</sup> does not make a distribution more or less complex, it simply shuttles it around.</p>
<p>On the other hand, the overdispersion parameter <img src="https://latex.codecogs.com/png.latex?%5Calpha"> <em>is</em> a flexibility parameter. It’s special value is <img src="https://latex.codecogs.com/png.latex?%5Calpha%20=0">, which corresponds to a Poisson distribution, which is the base model for the negative binomial distribution.</p>
</div>
<div id="exm-student-t" class="theorem example">
<p><span class="theorem-title"><strong>Example 2 (Student-t degrees of freedom)</strong></span> The three parameter student-t distribution has density (parameterised by its standard deviation assuming <img src="https://latex.codecogs.com/png.latex?%5Cnu%20%3E%202">!) <img src="https://latex.codecogs.com/png.latex?%0Ap(y%20%5Cmid%20%5Cmu,%20%5Csigma,%20%5Cnu)%20=%20%5Cfrac%7B%5CGamma%5Cleft(%5Cfrac%7B%5Cnu%20+%201%7D%7B2%7D%5Cright)%7D%7B%5Csigma%5Cnu%20%5Csqrt%7B%5Cfrac%7B%5Cpi%7D%7B%5Cnu-2%7D%7D%20%5CGamma%5Cleft(%5Cfrac%7B%5Cnu%7D%7B2%7D%5Cright)%7D%5Cleft(1%20+%20%5Cfrac%7B%5Cfrac%7B%5Cnu-2%7D%7B%5Cnu%7D%5Cleft(%5Cfrac%7By%20-%20%5Cmu%7D%7B%5Csigma%7D%5Cright)%5E2%7D%7B%5Cnu%7D%5Cright)%5E%7B-%5Cfrac%7B%5Cnu+1%7D%7B2%7D%7D,%20%5Cqquad%20%5Cnu%20%3E%202.%0A"> This has mean <img src="https://latex.codecogs.com/png.latex?%5Cmu"> and variance <img src="https://latex.codecogs.com/png.latex?%5Csigma%5E2">. The slightly strange parameterisation and the restriction to <img src="https://latex.codecogs.com/png.latex?%5Cnu%3E0"> is useful because it lets us specify a prior on the <em>variance</em> itself and not some parameter that is the variance divided by some function<sup>13</sup> of <img src="https://latex.codecogs.com/png.latex?%5Cnu">.</p>
<p>The natural base model here is <img src="https://latex.codecogs.com/png.latex?N(%5Cmu,%20%5Csigma%5E2)">, which corresponds to <img src="https://latex.codecogs.com/png.latex?%5Cnu%20=%20%5Cinfty">.</p>
</div>
<div id="exm-gaussian" class="theorem example">
<p><span class="theorem-title"><strong>Example 3 (Variance of a Gaussian random effect)</strong></span> A Gaussian distribution has two parameters: a mean <img src="https://latex.codecogs.com/png.latex?%5Cmu"> and a standard deviation <img src="https://latex.codecogs.com/png.latex?%5Ctau">. Once again, <img src="https://latex.codecogs.com/png.latex?%5Cmu"> is not a flexibility parameter, but in some circumstances <img src="https://latex.codecogs.com/png.latex?%5Ctau"> can be.</p>
<p>To see this, imagine that we have a simple random intercept model <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Ay_%7Bij%7D%20%5Cmid%20u_j%20&amp;%5Csim%20N(u_j,%20%5Csigma%5E2),%5Cqquad%20i=1,%5Cldots,n,%20j%20=1,%5Cldots,J%20%5C%5C%0Au_j%20&amp;%5Csim%20N(%5Cmu,%20%5Ctau).%0A%5Cend%7Balign*%7D"> In this case, we don’t really view <img src="https://latex.codecogs.com/png.latex?%5Csigma"> as a flexibility parameter, but <img src="https://latex.codecogs.com/png.latex?%5Ctau"> is. Why the distinction? Well let’s think about what happens at special value <img src="https://latex.codecogs.com/png.latex?0">.</p>
<p>When <img src="https://latex.codecogs.com/png.latex?%5Csigma%20=%200"> we are saying that there is no variability in the data if we know the corresponding <img src="https://latex.codecogs.com/png.latex?u_i">. This is, frankly, quite weird and it’s not necessarily a base model we would believe<sup>14</sup> in.</p>
<p>On the other hand, if <img src="https://latex.codecogs.com/png.latex?%5Ctau%20=0">, then we are say that all of the groups have the same mean. This is a useful and interesting base model that could absolutely happen in most data. So we say that while <img src="https://latex.codecogs.com/png.latex?%5Csigma"> isn’t necessarily a flexibility parameter in the model, <img src="https://latex.codecogs.com/png.latex?%5Ctau"> definitely is.</p>
<p>In this case the base model is the degenerate distribution<sup>15</sup> where the mean of each group is equal to <img src="https://latex.codecogs.com/png.latex?%5Cmu">.</p>
</div>
<p>The second example shows that the idea of a flexibility parameter is deeply contextual. Once again, we run into the idea that Statistical Arianism<sup>16</sup> is bad. <em>Parameters and their prior distributions can only be fully understood if you know their context within the entire model.</em></p>
</section>
<section id="sure-youre-flexible-but-lets-not-over-do-the-dutch-wink" class="level2">
<h2 class="anchored" data-anchor-id="sure-youre-flexible-but-lets-not-over-do-the-dutch-wink">Sure you’re flexible, but let’s not over-do the Dutch wink</h2>
<p>Now that we have the concept of a flexibility parameter, let’s think about how we should use it. In particular, we should ask exactly what we want our prior to do. In <a href="https://projecteuclid.org/journals/statistical-science/volume-32/issue-1/Penalising-Model-Component-Complexity--A-Principled-Practical-Approach-to/10.1214/16-STS576.full">the paper</a> we listed 8 things that we want the prior to do:</p>
<ol type="1">
<li>The prior should contain information<sup>17</sup> <sup>18</sup> <sup>19</sup></li>
<li>The prior should be aware of model structure</li>
<li>If we move our model to a new application, it should be clear how we can change the information contained in our prior. We can do this by <em>explicitly</em> including specific information in the prior.</li>
<li>The prior should limit<sup>20</sup> the flexibility of an overparameterised model</li>
<li>Restrictions of the prior to identifiable sub-manifolds<sup>21</sup> of the parameter space should be sensible.</li>
<li>The prior should be specified to control what a parameter <em>does</em> in the context<sup>22</sup> of the model (rather than its numerical value)</li>
<li>The prior should be computationally<sup>23</sup> feasible</li>
<li>The prior should perform well<sup>24</sup>.</li>
</ol>
<p>These desiderata are <em>aspirational</em> and I in no way claim that we successfully satisfied them. But we tried. And we came up with a pretty useful proposal.</p>
<p>The idea is simple: if our model has a flexibility parameter we should put a prior on it that <em>penalises the complexity</em> of the model. That is, we want most of the prior mass to be near<sup>25</sup> the base value.</p>
<p>In practice, we try to do this by penalising the complexity of each <em>component</em> of a model. For instance, consider the following model for a flexible regression: <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Ay_i%20%5Cmid%20f,%20u_i%20&amp;%5Csim%20N(u_i%20+f(z_i),%20%5Csigma%5E2)%20%5C%5C%0Af%20&amp;%5Csim%20%5Ctext%7BSmoothing-spline%7D(%5Clambda)%5C%5C%0Au_i%20&amp;%5Csim%20N(%20%5Cmu%20+%20x_i%5ET%5Cbeta%20,%20%5Ctau%5E2).%0A%5Cend%7Balign*%7D"> The exact definition<sup>26</sup> of a smoothing spline that we are using is not wildly important, but it is specified<sup>27</sup> by a smoothing parameter <img src="https://latex.codecogs.com/png.latex?%5Clambda">, and when <img src="https://latex.codecogs.com/png.latex?%5Clambda=%5Cinfty"> we get our base model (a function that is equal to zero everywhere). This model has two components (<img src="https://latex.codecogs.com/png.latex?f"> and <img src="https://latex.codecogs.com/png.latex?u">) and they each have one smoothing parameter (<img src="https://latex.codecogs.com/png.latex?%5Clambda">, with base model at <img src="https://latex.codecogs.com/png.latex?%5Clambda%20=%20%5Cinfty">, and <img src="https://latex.codecogs.com/png.latex?%5Ctau">, with base model at <img src="https://latex.codecogs.com/png.latex?%5Ctau%20=%200">).</p>
<p>The nice thing about splitting a model up into components and building priors for each component is that we can build generic priors for each component that can be potentially be tuned to make them appropriate for the global model. Is this a perfect way to realise our second aim? No.&nbsp;But it’s an ok place to start<sup>28</sup>.</p>
</section>
<section id="the-speed-of-a-battered-sav-proximity-to-the-base-model" class="level2">
<h2 class="anchored" data-anchor-id="the-speed-of-a-battered-sav-proximity-to-the-base-model">The speed of a battered sav: proximity to the base model</h2>
<p>Ok. So you’re Brad Pitt. Wait. No.</p>
<p>Ok. So we need to build a prior that penalises complexity by putting most of its prior mass near the base model. In order to do this we need to first specify what we mean by <em>near</em>.</p>
<p>There are <em>a lot</em> of things that we could mean. The easiest choice would be to just use the natural distance from the base model in the parameter space. But this isn’t necessarily a good idea. Firstly, it falls flat when the base model is at infinity. But more importantly, it violates our 6th aim by ignoring the context of the parameter and just setting a prior on its numerical value.</p>
<p>So instead we are going to parameterise distance by asking ourselves a simple question: for a component with flexibility parameter <img src="https://latex.codecogs.com/png.latex?%5Cxi">, how much more complex would our model component be if we used the value <img src="https://latex.codecogs.com/png.latex?%5Cxi"> instead of the base value <img src="https://latex.codecogs.com/png.latex?%5Cxi_%5Ctext%7Bbase%7D">?</p>
<p>We can measure this complexity using the Kullback-Leibler divergence (or KL divergence if you’re nasty) <img src="https://latex.codecogs.com/png.latex?%0A%5Coperatorname%7BKL%7D(f%20%7C%7C%20g)%20=%20%5Cint_%5CTheta%20f(t)%20%5Clog%5Cleft(%5Cfrac%7Bf(t)%7D%7Bg(t)%7D%5Cright)%5C,dt.%0A"> This is a quantity from information theory that directly measures how much information would be lost<sup>29</sup> if we replaced the more complex model <img src="https://latex.codecogs.com/png.latex?f"> with the simpler model <img src="https://latex.codecogs.com/png.latex?g">. The more information that would be lost, the more complex <img src="https://latex.codecogs.com/png.latex?f"> is relative to <img src="https://latex.codecogs.com/png.latex?g">.</p>
<p>While the Kullback-Leibler divergence looks a bit intimidating the first time you see it, it’s got a lot of nice properties:</p>
<ul>
<li><p>It’s always non-negative.</p></li>
<li><p>It doesn’t depend on how you parameterise the distribution. If you do a smooth, invertible change of variables to both distribution the KL divergence remains unchanged.</p></li>
<li><p>It’s related to the information matrix and the Fisher distance. In particular, let <img src="https://latex.codecogs.com/png.latex?f(%5Ctheta%20%5Cmid%20%5Cxi)"> be a family of distributions parameterised by <img src="https://latex.codecogs.com/png.latex?%5Cxi">. Then, near <img src="https://latex.codecogs.com/png.latex?%5Cxi_0">, <img src="https://latex.codecogs.com/png.latex?%0A%5Coperatorname%7BKL%7D(f(%5Ccdot%20%5Cmid%20%5Cxi_0%20+%5Cdelta)%20%20%7C%7C%20f(%5Ccdot%20%5Cmid%20%5Cxi_0))%20=%20%5Cfrac%7B%5Cdelta%5E2%7D%7B2%7D%20I(%5Cxi_0)%20+%20o(%5Cdelta%5E2),%0A"> where <img src="https://latex.codecogs.com/png.latex?I(%5Cxi)%20=%20%5Cmathbb%7BE%7D(%5Clog%20p(f(y%20%5Cmid%20%5Cxi))%5E2)"> is the Fisher information. The quantity on the right hand side is the square of a distance from the base model.</p></li>
<li><p>It can be related to the total variation distance<sup>30</sup> <img src="https://latex.codecogs.com/png.latex?%0A%5C%7Cf%20-%20g%5C%7C_%5Ctext%7BTV%7D%20%5Cleq%20%5Csqrt%7B%5Cfrac%7B1%7D%7B2%7D%20%5Coperatorname%7BKL%7D(f%20%7C%7C%20g)%7D.%0A"></p></li>
</ul>
<p>But it also has some less charming properties:</p>
<ul>
<li>The KL divergence is <em>not</em> a distance!</li>
<li>The KL divergence is <em>not</em> symmetric, that is <img src="https://latex.codecogs.com/png.latex?%5Coperatorname%7BKL%7D(f%20%7C%7C%20g)%20%5Cneq%20%5Coperatorname%7BKL%7D(g%20%7C%7C%20f)"></li>
</ul>
<p>The first of these properties is irrelevant to us. The second interesting. I’d argue that it is an advantage. We can think in an analogy: if your base model is a point at the bottom of a valley, there is a big practical difference between how much effort it takes to get from the base model to another model that is on top of a hill compared to the amount of effort it takes to go in the other direction. This type of asymmetry is relevant to us: it’s easier for data to tell a simple model that it should be more complex than it is to tell a complex model to be simpler. We want our prior information to somewhat even this out, so we put less prior mass on models that are more complex and more on models that are more complex.</p>
<p>There is one more little annoyance: if you look at the two distance measures that the KL divergence is related to, you’ll notice that in both cases, the KL divergence is related to the <em>square</em> of the distance and not the distance itself.</p>
<p>If we use the KL divergence itself as a distance proxy, it will increase too sharply<sup>31</sup> and we may end up over-penalising. To that end, we use the following “distance” measure <img src="https://latex.codecogs.com/png.latex?%0Ad(%5Cxi)%20=%20%5Csqrt%7B2%20%5Coperatorname%7BKL%7D(f(%5Ccdot%20%5Cmid%20%5Cxi)%20%7C%7C%20f(%5Ccdot%20%5Cmid%20%5Cxi_0))%7D.%0A"> If you’re wondering about that 2, it doesn’t really matter but it makes a couple of things ever so slightly cleaner down the road.</p>
<p>Ok. Let’s compute some of these distances!</p>
<div id="exm-neg-binom2" class="theorem example">
<p><span class="theorem-title"><strong>Example 4 (Overdispersion of a negative binomial (continued))</strong></span> The negative binomial distribution is discrete so <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Bmultline%7D%0A%5Cfrac%7B1%7D%7B2%7Dd%5E2(%5Calpha)%20=%20%5Csum_%7Bk=1%7D%5E%5Cinfty%20%5Cfrac%7B%5CGamma(k%20+%20%5Calpha%5E%7B-1%7D)%7D%7B%5CGamma(%5Calpha%5E%7B-1%7D)%5CGamma(k+1)%7D%20%20%5Cleft(%5Cfrac%7B%5Cmu%7D%7B%5Cmu%20+%20%5Calpha%5E%7B-1%7D%7D%5Cright)%5Ek%20%5Cleft(%5Cfrac%7B%5Calpha%5E%7B-1%7D%7D%7B%5Cmu%20+%20%5Calpha%5E%7B-1%7D%7D%5Cright)%5E%7B1/%5Calpha%7D%20%5C%5C%0A%5Ctimes%20%5Cleft%5B%5Clog%20%5CGamma(k%20%20+%5Calpha%5E%7B-1%7D)%20-%20%5Clog%20%5CGamma(%5Calpha%5E%7B-1%7D)%20%20-%20k%20%5Clog(%5Cmu%20+%20%5Calpha%5E%7B-1%7D)%5Cright.%20%5C%5C%20%5Cleft.%20+%20%5Calpha%5E%7B-1%7D%5Clog%20%5Cleft(%5Calpha%5E%7B-1%7D(%5Cmu%20+%20%5Calpha%5E%7B-2%7D)%5Cright)%20%20+%20%5Cmu%20%5Cright%5D.%0A%5Cend%7Bmultline%7D"> This has two problems: I can’t work out what it is and it might<sup>32</sup> end up depending on <img src="https://latex.codecogs.com/png.latex?%5Cmu">.</p>
<p>Thankfully we can use our alternative representation of the negative binomial to note that <img src="https://latex.codecogs.com/png.latex?u_i%20%5Csim%20%5Ctext%7BGamma%7D(%5Calpha%5E%7B-1%7D,%20%5Calpha%5E%7B-1%7D)"> and so we could just as well consider <img src="https://latex.codecogs.com/png.latex?u_i"> the model component that we want to penalise the complexity of. In this case we need the KL divergence<sup>33</sup> <a href="https://en.wikipedia.org/wiki/Gamma_distribution#Kullback–Leibler_divergence">between Gamma distributions</a> <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Coperatorname%7BKL%7D(%5Ctext%7BGamma%7D(a%5E%7B-1%7D,a%5E%7B-1%7D)%20%7C%7C%20%5Ctext%7BGamma%7D(b%5E%7B-1%7D,b%5E%7B-1%7D))%20=&amp;%20(a%5E%7B-1%7D-b%5E%7B-1%7D)%20%5Cpsi(a%5E%7B-1%7D)%20%5C%5C%20&amp;%5Cquad-%20%5Clog%5CGamma(a%5E%7B-1%7D)%20+%20%5Clog%5CGamma(b%5E%7B-1%7D)%20%5C%5C%20&amp;%5Cquad%0A+%20b%5E%7B-1%7D(%5Clog%20a%5E%7B-1%7D%20-%20%5Clog%20b%5E%7B-1%7D)%5C%5C%20&amp;%5Cquad%20+%20b%5E%7B-1%7D-a%5E%7B-1%7D,%0A%5Cend%7Balign*%7D"> where <img src="https://latex.codecogs.com/png.latex?%5Cpsi(a)"> is the <a href="https://en.wikipedia.org/wiki/Digamma_function">digamma function</a>.</p>
<p>As <img src="https://latex.codecogs.com/png.latex?b%5Crightarrow%200">, the KL divergence becomes<sup>34</sup> <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A&amp;b%5E%7B-1%7D%20%20(%5Clog(a%5E%7B-1%7D)%20-%20%5Cpsi(a%5E%7B-1%7D))%20+%20%5Clog%5CGamma(b%5E%7B-1%7D)%20-%20b%5E%7B-1%7D%5Clog%20b%5E%7B-1%7D%20+%20b%5E%7B-1%7D%20%20+%20o(b%5E%7B-1%7D)%5C%5C%0A=&amp;%20b%5E%7B-1%7D%20(%5Clog(a%5E%7B-1%7D)%20-%20%5Cpsi(a%5E%7B-1%7D))%20+%20b%5E%7B-1%7D%20%5Clog%20b%5E%7B-1%7D%20-%20b%5E%7B-1%7D%20-%20b%5E%7B-1%7D%5Clog%20b%5E%7B-1%7D%20+%20b%5E%7B-1%7D%20%5C%5C%0A=%20&amp;b%5E%7B-1%7D%20%20(%5Clog(a%5E%7B-1%7D)%20-%20%5Cpsi(a%5E%7B-1%7D))%20+%20o(b%5E%7B-1%7D).%0A%5Cend%7Balign*%7D"></p>
<p>Now, you will notice that as <img src="https://latex.codecogs.com/png.latex?b%5Crightarrow%200"> the KL divergence heads off to infinity. This happens a lot when the base model is much simpler than the flexible model. Thankfully, we will see later that we can ignore the factor of <img src="https://latex.codecogs.com/png.latex?b%5E%7B-1%7D"> and get a PC prior that’s valid against the base model <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BGamma%7D(b%5E%7B-1%7D,%20b%5E%7B-1%7D)"> for <em>all</em> sufficiently small <img src="https://latex.codecogs.com/png.latex?b%3E0">. This is not legally the same thing as having one for <img src="https://latex.codecogs.com/png.latex?b=0">, but it is morally the same.</p>
<p>With this, we get <img src="https://latex.codecogs.com/png.latex?%0Ad(%5Calpha)%20=%20%5Csqrt%7B2%5Clog(%5Calpha%5E%7B-1%7D)%20-%202%5Cpsi(%5Calpha%5E%7B-1%7D)%20%7D.%0A"></p>
<p>If the digamma function is a bit too hardcore for you, the <a href="https://functions.wolfram.com/GammaBetaErf/PolyGamma/06/02/">approximation</a> <img src="https://latex.codecogs.com/png.latex?%0A%5Cpsi(%5Calpha%5E%7B-1%7D)%20=%20%5Clog(%5Calpha%5E%7B-1%7D)%20-%20%5Cfrac%7B%5Calpha%7D%7B2%7D%20+%20%5Cmathcal%7BO%7D(%5Calpha%5E2)%0A"> gives the approximate distance <img src="https://latex.codecogs.com/png.latex?%0Ad(%5Calpha)%20%5Capprox%20%5Csqrt%7B%5Calpha%7D.%0A"> That is, the distance we are using is approximately the <em>standard deviation</em> of <img src="https://latex.codecogs.com/png.latex?u_i">.</p>
<p>Let’s see if this approximation<sup>35</sup> is any good.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.01</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length.out =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>),</span>
<span id="cb1-3">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">exact =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>alpha) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">digamma</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>alpha)),</span>
<span id="cb1-4">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">approx =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(alpha)</span>
<span id="cb1-5">       ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> alpha, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> exact)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb1-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> approx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dashed"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>()</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-08-29-priors4/priors4_files/figure-html/unnamed-chunk-1-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>It’s ok but it’s not perfect.</p>
</div>
<div id="exm-student-t-2" class="theorem example">
<p><span class="theorem-title"><strong>Example 5 (Student-t degrees of freedom (Continued))</strong></span> In our original paper, we computed the distance for the degrees of freedom numerically. However, <a href="https://arxiv.org/pdf/1811.08042.pdf">Yongqiang Tang</a> derived an analytic expression for it. <img src="https://latex.codecogs.com/png.latex?%0Ad(%5Cnu)%20=%20%5Csqrt%7B1%20+%20%20%5Clog%5Cleft(%5Cfrac%7B2%5CGamma((%5Cnu+1)/2)%5E2%7D%7B(%5Cnu-2)%5CGamma(%5Cnu/2)%5E2%7D%5Cright)%20-%20(%5Cnu%20+%201)(%5Cpsi((%5Cnu+1)/2)%20-%20%5Cpsi(%5Cnu/2))%7D.%0A"></p>
<p>If we note that <img src="https://latex.codecogs.com/png.latex?%0A%5Clog(%5CGamma(z))%20=%20%5Cleft(z-%20%5Cfrac%7B1%7D%7B2%7D%5Cright)%5Clog%20z%20-%20z%20+%20%5Cfrac%7B1%7D%7B2%7D%5Clog(2%5Cpi)%20%20+%20%5Cfrac%7B1%7D%7B12z%7D%20+%20%5Cmathcal%7BO%7D(z%5E%7B-1%7D),%0A"> we can use this (and the above asymptotic expansion of the digamma function) to get We can use the same asymptotic approximations as above to get <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Ad(%5Cnu)%5E2%20%5Capprox&amp;%20%7B%7D%201%20+%20%5Clog%20%5Cleft(%5Cfrac%7B2%7D%7B%5Cnu-2%7D%5Cright)%20%5C%5C%0A&amp;%5Cquad%20%7B%7D%20+%202%5Cleft(%5Cfrac%7B%5Cnu%7D%7B2%7D%5Clog%20%5Cfrac%7B%5Cnu+1%7D%7B2%7D%20-%20%5Cfrac%7B%5Cnu+1%7D%7B2%7D%20+%20%5Cfrac%7B1%7D%7B2%7D%5Clog(2%5Cpi)%20%20+%20%5Cfrac%7B1%7D%7B6(%5Cnu+1)%7D%5Cright)%20%5C%5C%0A&amp;%5Cquad%20-2%5Cleft(%5Cfrac%7B%5Cnu-1%7D%7B2%7D%5Clog%20%5Cfrac%7B%5Cnu%7D%7B2%7D%20-%20%5Cfrac%7B%5Cnu%7D%7B2%7D%20+%20%5Cfrac%7B1%7D%7B2%7D%5Clog(2%5Cpi)%20%20+%20%5Cfrac%7B1%7D%7B6%5Cnu%7D%5Cright)%20%5C%5C%0A&amp;%5Cquad%20%7B%7D%20-%20(%5Cnu%20+%201)(%5Clog((%5Cnu+1)/2)%20-%20%5Cfrac%7B1%7D%7B%5Cnu+1%7D-%20%5Clog(%5Cnu/2)%20+%20%5Cfrac%7B1%7D%7B%5Cnu%7D)%20%5C%5C%0A=&amp;%20%5Clog%20%5Cleft(%5Cfrac%7B%5Cnu%5E2%7D%7B(%5Cnu+1)(%5Cnu-2)%7D%5Cright)%20%20%20-%20%5Cfrac%7B%5Cnu%20+2%7D%7B3%5Cnu(%5Cnu+1)%7D.%0A%5Cend%7Balign*%7D"></p>
<p>Let’s check this approximation numerically.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nu =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">300</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length.out =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>),</span>
<span id="cb2-2">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">exact =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>(nu<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">-2</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb2-3">                      <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lgamma</span>((nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lgamma</span>(nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> </span>
<span id="cb2-4">                      (nu <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">digamma</span>((nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span></span>
<span id="cb2-5">                                   <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">digamma</span>(nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))),</span>
<span id="cb2-6">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">approx =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>((nu<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">-2</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> (nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)))</span>
<span id="cb2-7">       ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> nu, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> exact)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb2-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> approx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dashed"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>()</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-08-29-priors4/priors4_files/figure-html/unnamed-chunk-2-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Once again, this is not a terrible approximation, but it’s also not an excellent one.</p>
</div>
<div id="exm-gaussian-2" class="theorem example">
<p><span class="theorem-title"><strong>Example 6 (Variance of a Gaussian random effect (Continued))</strong></span> The distance calculation for the standard deviation of a Gaussian random effect has a very similar structure to the negative binomial case. We note, via wikipedia, that <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Coperatorname%7BKL%7D(N(%5Cmu,%20%5Ctau%5E2)%20%7C%7C%20N(%5Cmu,%20%5Cepsilon%5E2))%20&amp;=%20%5Clog%20%5Cfrac%7B%5Ctau%7D%7B%5Cepsilon%7D%20+%20%5Cfrac%7B%5Ctau%5E2%7D%7B%5Cepsilon%5E2%7D%20-%20%5Cfrac%7B1%7D%7B2%7D%20%20%5C%5C%0A&amp;=%20%5Cfrac%7B%5Ctau%5E2%7D%7B%5Cepsilon%5E2%7D%5Cleft(1%20+%20%5Cfrac%7B%5Cepsilon%5E2%7D%7B%5Ctau%5E2%7D%5Clog%20%5Cfrac%7B%5Ctau%7D%7B%5Cepsilon%7D-%20%5Cfrac%7B%5Cepsilon%5E2%7D%7B2%5Ctau%5E2%7D%5Cright).%0A%5Cend%7Balign*%7D"></p>
<p>This implies that <img src="https://latex.codecogs.com/png.latex?%0Ad(%5Ctau)%20=%20%5Cepsilon%5E%7B-1%7D%5Ctau%20+%20o(%5Cepsilon%5E%7B-1%7D).%0A"> We shall see later that the scaling on the <img src="https://latex.codecogs.com/png.latex?%5Ctau"> doesn’t matter so for all intents and purposed <img src="https://latex.codecogs.com/png.latex?%0Ad(%5Ctau)%20=%20%5Ctau.%0A"></p>
</div>
</section>
<section id="spinning-off-the-flute-into-a-flat-bag-turning-a-distance-into-a-prior" class="level2">
<h2 class="anchored" data-anchor-id="spinning-off-the-flute-into-a-flat-bag-turning-a-distance-into-a-prior">Spinning off the flute into a flat bag: Turning a distance into a prior</h2>
<p>So now that we have a distance measure, we need to turn it into a prior. There are lots of ways we can do this. Essentially any prior we put on the distance <img src="https://latex.codecogs.com/png.latex?d(%5Cxi)"> can be transformed into a prior on the flexibility parameter <img src="https://latex.codecogs.com/png.latex?%5Cxi">. We do this through the change of variables formula <img src="https://latex.codecogs.com/png.latex?%0Ap_%5Cxi(%5Cxi)%20=%20p_d(d(%5Cxi))%5Cleft%7C%5Cfrac%7Bd%7D%7Bd%5Cxi%7D%20d(%5Cxi)%5Cright%7C,%0A"> where <img src="https://latex.codecogs.com/png.latex?p_d(%5Ccdot)"> is the prior density for the distance parameterisation</p>
<p>But which prior should we use on the distance? A good default choice is a prior that penalises at a constant rate. That is, we want <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7Bp_d(d%20+%20%5Cdelta)%7D%7Bp_d(d)%7D%20=%20r%5E%7B%5Cdelta%7D%0A"> for some <img src="https://latex.codecogs.com/png.latex?0%3Cr%3C1">. This condition says that the rate at which the density decreases does not change as we move through the parameter space. This is extremely useful because any other (monotone) distribution is going to have a point at which the bulk changes to the tail. As we are putting our prior on <img src="https://latex.codecogs.com/png.latex?d">, we won’t necessarily be able to reason about this point.</p>
<p>Constant-rate penalisation implies that the prior on the distance scale is an exponential distribution and, hence, we get our generic PC prior for a flexibility parameter <img src="https://latex.codecogs.com/png.latex?%5Cxi"> <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Cxi)%20=%20%5Clambda%20e%5E%7B-%5Clambda%20d(%5Cxi)%7D%5Cleft%7C%5Cfrac%7Bd%7D%7Bd%5Cxi%7D%20d(%5Cxi)%5Cright%7C.%0A"></p>
<div id="exm-neg-bin-3" class="theorem example">
<p><span class="theorem-title"><strong>Example 7 (Overdispersion of a negative binomial (continued))</strong></span> The exact PC prior for the overdispersion parameter in the negative binomial distribution is <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Calpha)%20=%20%5Cfrac%7B%5Clambda%7D%7B%5Calpha%5E%7B2%7D%7D%5Cfrac%7B%5Cleft%7C%5Cpsi'%5Cleft(%5Calpha%5E%7B-1%7D%5Cright)-%5Calpha%5Cright%7C%7D%7B%20%5Csqrt%7B2%20%5Clog%20(%5Calpha%5E%7B-1%7D)%20-%202%20%5Cpsi(%5Calpha%5E%7B-1%7D)%7D%7D%20%5Cexp%20%5Cleft%5B%20-%5Clambda%20%5Csqrt%7B2%20%5Clog%20(%5Calpha%5E%7B-1%7D)%20-%202%20%5Cpsi(%5Calpha%5E%7B-1%7D)%7D%5Cright%5D,%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Cpsi'(%5Ccdot)"> is the derivative of the digamma function.</p>
<p>On the other hand, if we use the approximate distance we get <img src="https://latex.codecogs.com/png.latex?%0Ap_%5Ctext%7Bapprox%7D(%5Calpha)%20=%20%5Cfrac%7B%5Clambda%7D%7B2%5Csqrt%7B%5Calpha%7D%7D%20e%5E%7B-%5Clambda%20%5Csqrt%7B%5Calpha%7D%7D.%0A"></p>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb3-1">lambda <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb3-2">dat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.01</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length.out =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>),</span>
<span id="cb3-3">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">exact =</span> lambda <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> alpha<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">trigamma</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>alpha) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> alpha)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span></span>
<span id="cb3-4">         <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>alpha) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span></span>
<span id="cb3-5">                <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">digamma</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>alpha))<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span></span>
<span id="cb3-6">         <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>lambda<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>alpha) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> </span>
<span id="cb3-7">                            <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">digamma</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>alpha))),</span>
<span id="cb3-8">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">approx =</span> lambda<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(alpha))<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>lambda<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(alpha))</span>
<span id="cb3-9">       ) </span>
<span id="cb3-10">dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> alpha, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> exact)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb3-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> approx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dashed"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>()</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-08-29-priors4/priors4_files/figure-html/unnamed-chunk-3-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb4-1">dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> alpha, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> exact <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> approx)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb4-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>()</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-08-29-priors4/priors4_files/figure-html/unnamed-chunk-3-2.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>That’s a pretty good agreement!</p>
</div>
<div id="exm-student-t-3" class="theorem example">
<p><span class="theorem-title"><strong>Example 8 (Student-t degrees of freedom (Continued))</strong></span> An interesting feature of the PC prior (and any prior where the density on the distance scale takes its maximum at the base model) is that the implied prior on <img src="https://latex.codecogs.com/png.latex?%5Cnu"> has no finite moments. In fact, if your prior on <img src="https://latex.codecogs.com/png.latex?%5Cnu"> has finite moments, the density on the distance scale is zero at zero!</p>
<p>The exact PC prior for the degrees of freedom in a Student-t distribution is <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Cnu)%20=%20%5Clambda%20%5Cfrac%7B%5Cfrac%7B1%7D%7B%5Cnu-2%7D%20+%20%5Cfrac%7B%5Cnu+1%7D%7B2%7D%5Cleft%5B%5Cpsi'%5Cleft(%5Cfrac%7B%5Cnu+1%7D%7B2%7D%5Cright)-%5Cpsi'%5Cleft(%5Cfrac%7B%5Cnu%7D%7B2%7D%5Cright)%5Cright%5D%7D%7B4d(%5Cnu)%7De%5E%7B-%5Clambda%20d(%5Cnu)%7D,%0A"> where <img src="https://latex.codecogs.com/png.latex?d(%5Cnu)"> is given above.</p>
<p>The approximate PC prior is <img src="https://latex.codecogs.com/png.latex?%0Ap_%5Ctext%7Bapprox%7D(%5Cnu)%20=%20%5Clambda%5Cfrac%7B%5Cnu(%5Cnu+2)(2%5Cnu+9)%20+%204%7D%7B3%5Cnu%5E2(%5Cnu+1)%5E2(%5Cnu-2)%7D%20%5Cleft(%5Cfrac%7B%5Cnu%5E2%7D%7B(%5Cnu+1)(%5Cnu-2)%7D%5Cright)%5E%5Clambda%20e%5E%7B%20%20%20-%20%5Clambda%5Cfrac%7B%5Cnu%20+2%7D%7B3%5Cnu(%5Cnu+1)%7D%7D.%0A"> Let’s look at the difference.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb5-1">dist_ex <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> \(nu) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>(nu<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">-2</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb5-2">                      <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lgamma</span>((nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lgamma</span>(nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> </span>
<span id="cb5-3">                      (nu <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">digamma</span>((nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span></span>
<span id="cb5-4">                                   <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">digamma</span>(nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)))</span>
<span id="cb5-5">dist_ap <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> \(nu) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>((nu<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">-2</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> (nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)))</span>
<span id="cb5-6"></span>
<span id="cb5-7">lambda <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb5-8">dat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nu =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length.out =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>),</span>
<span id="cb5-9">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">exact =</span> lambda <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>(nu<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">-2</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> (nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">trigamma</span>((nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">trigamma</span>(nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)))<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dist_ex</span>(nu)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>lambda<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dist_ex</span>(nu)),</span>
<span id="cb5-10">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">approx =</span> lambda <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>nu <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(nu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(nu<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">-2</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>lambda<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dist_ap</span>(nu))</span>
<span id="cb5-11">       ) </span>
<span id="cb5-12">dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> nu, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> exact)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb5-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> approx), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dashed"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>()</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-08-29-priors4/priors4_files/figure-html/unnamed-chunk-4-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb6-1">dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> nu, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> exact <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> approx)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb6-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>()</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-08-29-priors4/priors4_files/figure-html/unnamed-chunk-4-2.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>The approximate prior isn’t so good for <img src="https://latex.codecogs.com/png.latex?%5Cnu"> near 2. In the original paper, the distance was tabulated for <img src="https://latex.codecogs.com/png.latex?%5Cnu%20%3C%209"> and a different high-precision asymptotic expansion was given for <img src="https://latex.codecogs.com/png.latex?%5Cnu%3E9">.</p>
<p>In the <a href="https://projecteuclid.org/journals/statistical-science/volume-32/issue-1/Penalising-Model-Component-Complexity--A-Principled-Practical-Approach-to/10.1214/16-STS576.full">original paper</a>, we also plotted some common priors for the degrees of freedom on the distance scale to show just how informative flat-ish priors on <img src="https://latex.codecogs.com/png.latex?%5Cnu"> can be! Note that the wider the uniform prior on <img src="https://latex.codecogs.com/png.latex?%5Cnu"> is the more informative it is on the distance scale.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-08-29-priors4/student_t.png" class="img-fluid figure-img"></p>
<figcaption>(Left) Exponential priors on <img src="https://latex.codecogs.com/png.latex?%5Cnu"> shown on the distance scale, from right to left the mean of the prior increases (5, 10, 20). (Right) <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BUniform%7D%5B2,%20M%5D"> priors on <img src="https://latex.codecogs.com/png.latex?%5Cnu"> shown on the distance scale. From left to right <img src="https://latex.codecogs.com/png.latex?M"> increases (20, 50, 100).</figcaption>
</figure>
</div>
</div>
<div id="exm-gaussian-3" class="theorem example">
<p><span class="theorem-title"><strong>Example 9 (Variance of a Gaussian random effect (Continued))</strong></span> This is the easy one because the distance is equal to the standard deviation! The PC prior for the standard deviation of a Gaussian distribution is an exponential prior <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Csigma)%20=%20%5Clambda%20e%5E%7B-%5Clambda%20%5Csigma%7D.%0A"> More generally, if <img src="https://latex.codecogs.com/png.latex?u%20%5Csim%20N(0,%20%5Csigma%5E2%20R)"> is a multivariate normal distribution, than the PC prior for <img src="https://latex.codecogs.com/png.latex?%5Csigma"> is still <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Csigma)%20=%20%5Clambda%20e%5E%7B-%5Clambda%20%5Csigma%7D.%0A"> The corresponding prior on <img src="https://latex.codecogs.com/png.latex?%5Csigma%5E2"> is <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Csigma%5E2)%20=%20%5Cfrac%7B%5Clambda%7D%7B2%5Csqrt%7B%5Csigma%5E2%7D%7De%5E%7B-%5Clambda%5Csqrt%7B%5Csigma%5E2%7D%7D.%0A"> Sometimes, for instance if you’re converting a model from BUGS or you’re looking at the smoothing parameter of a smoothing spline, you might specify your normal distribution in terms of the precision, which is the inverse of the variance. If <img src="https://latex.codecogs.com/png.latex?u%20%5Csim%20N(0,%20%5Cgamma%5E%7B-1%7DQ%5E%7B-1%7D)">, then the corresponding PC prior (using the change of variables <img src="https://latex.codecogs.com/png.latex?%5Cgamma%20=%20%5Csigma%5E%7B-2%7D">) is <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Cgamma)%20=%20%5Cfrac%7B%5Clambda%7D%7B2%7D%5Cgamma%5E%7B-3/2%7D%20e%5E%7B-%5Clambda%20%5Cgamma%5E%7B-1/2%7D%7D.%0A"></p>
<p>This case was explored extensively in the context of structured additive regression models (think GAMs but moreso) by <a href="https://projecteuclid.org/journals/bayesian-analysis/volume-11/issue-4/Scale-Dependent-Priors-for-Variance-Parameters-in-Structured-Additive-Distributional/10.1214/15-BA983.full">Klein and Kneib</a>, who found that the choice of exponential prior on the distance scale gave more consistent performance than either a half-normal or a half-Cauchy distribution.</p>
</div>
</section>
<section id="closing-the-door-how-to-choose-lambda" class="level2">
<h2 class="anchored" data-anchor-id="closing-the-door-how-to-choose-lambda">Closing the door: How to choose <img src="https://latex.codecogs.com/png.latex?%5Clambda"></h2>
<p>The big unanswered question is how do we choose <img src="https://latex.codecogs.com/png.latex?%5Clambda">. The scaling of a prior distribution is <em>vital</em> to its success, so this is an important question.</p>
<p>And I will just say this: work it out your damn self.</p>
<p>The thing about prior distributions that shamelessly include information is that, at some point, you need to include<sup>36</sup> some information. And there is no way for anyone other than the data analyst to know what the information to include is.</p>
<p>But I can outline a general procedure.</p>
<p>Imagine that for your flexibility parameter <img src="https://latex.codecogs.com/png.latex?%5Cxi"> you have some interpretable transformation of it <img src="https://latex.codecogs.com/png.latex?Q(%5Cxi)">. For instance if <img src="https://latex.codecogs.com/png.latex?%5Cxi%20=%20%5Csigma%5E2">, then a good choice for <img src="https://latex.codecogs.com/png.latex?Q(%5Ccdot)"> would be <img src="https://latex.codecogs.com/png.latex?Q(%5Csigma%5E2)=%5Csigma">. This is because standard deviations are on the same scale as the observations<sup>37</sup>, and we have intuition about that happens one standard deviation from the mean.</p>
<p>We then use problem-specific information can help us set a natural scale for <img src="https://latex.codecogs.com/png.latex?Q(%5Cxi)">. We do this by choosing <img src="https://latex.codecogs.com/png.latex?%5Clambda"> so that <img src="https://latex.codecogs.com/png.latex?%0A%5CPr(Q(%5Cxi)%20%3E%20U)%20=%20%5Calpha%0A"> for some <img src="https://latex.codecogs.com/png.latex?U">, which we would consider large<sup>38</sup> for our problem, and <img src="https://latex.codecogs.com/png.latex?0%3C%5Calpha%3C1">.</p>
<p>From the properties of the exponential distribution, we can see that we can satisfy this if we choose <img src="https://latex.codecogs.com/png.latex?%0A%5Clambda%20=%20-%20%5Cfrac%7B%5Clog(%5Calpha)%7D%7Bd%5E%7B-1%7D(Q%5E%7B-1%7D(U))%7D.%0A"> This can be found numerically if it needs to be.</p>
<p>The simplest case is the standard deviation of the normal distribution, because in this case <img src="https://latex.codecogs.com/png.latex?Q(%5Csigma)%20=%20%5Csigma"> and <img src="https://latex.codecogs.com/png.latex?d%5E%7B-1%7D(Q%5E%7B-1%7D(U))%20=%20U">. In general, if <img src="https://latex.codecogs.com/png.latex?u%20%5Csim%20N(0,%20%5Csigma%20R)"> and <img src="https://latex.codecogs.com/png.latex?R"> is not a correlation matrix, you should take into account the diagonal of <img src="https://latex.codecogs.com/png.latex?R"> when choosing <img src="https://latex.codecogs.com/png.latex?Q">. For instance, choosing <img src="https://latex.codecogs.com/png.latex?Q"> to be the geometric mean<sup>39</sup> of the marginal variances of the <img src="https://latex.codecogs.com/png.latex?u_i"> is a good idea!</p>
<p>When a model has more than one component, or a component has more than one flexibility parameter, it can be the case that <img src="https://latex.codecogs.com/png.latex?Q(%5Ccdot)"> depends on multiple parameters. For instance, if I hadn’t reparameterised the Student-t distribution to have variance independent of <img src="https://latex.codecogs.com/png.latex?%5Cnu">, a PC prior on <img src="https://latex.codecogs.com/png.latex?%5Csigma"> would have a quantity of interest that depends on <img src="https://latex.codecogs.com/png.latex?%5Cnu">. We will also see this if I ever get around to writing about priors for Gaussian processes.</p>
</section>
<section id="the-dream-pc-priors-in-practice" class="level2">
<h2 class="anchored" data-anchor-id="the-dream-pc-priors-in-practice">The Dream: PC priors in practice</h2>
<p>Thus we can put together a PC prior as the unique prior that follows the following four principles:</p>
<ol type="1">
<li><p>Occam’s razor: We have a base model that represents simplicity and we prefer our base model.</p></li>
<li><p>Measuring complexity: We define the prior using the square root of the KL divergence between the base model and the more flexible model. The square root ensures that the divergence is on a similar scale to a distance, but we maintain the asymmetry of the divergence as as a feature (not a bug).</p></li>
<li><p>Constant penalisation: We use an exponential prior on the distance scale to ensure that our prior mass decreases evenly as we move father away from the base model.</p></li>
<li><p>User-defined scaling: We need the user to specify a quantity of interest <img src="https://latex.codecogs.com/png.latex?Q(%5Cxi)"> and a scale <img src="https://latex.codecogs.com/png.latex?U">. We choose the scaling of the prior so that <img src="https://latex.codecogs.com/png.latex?%5CPr(Q(%5Cxi)%20%3E%20U)%20=%20%5Calpha">. This ensures that when we move to a new context, we are able to modify the prior by using the relevant information about <img src="https://latex.codecogs.com/png.latex?Q(%5Cxi)">.</p></li>
</ol>
<p>These four principles define a PC prior. I think the value of laying them out explicitly is that users and critics can clearly and cleanly identify if these principles are relevant to their problem and, if they are, they can implement them. Furthermore, if you need to modify the principles (say by choosing a different distance measure), there is a clear way to do that.</p>
<p>I’ve come to the end of my energy for this blog post, so I’m going to try to wrap it up. I will write more on the topic later, but for now there are a couple of things I want to say.</p>
<p>These priors can seem quite complex, but I assure you that are a) useful, b) used, and c) not too terrible in practice. Why? Well fundamentally because you usually don’t have to derive them yourselves. Moreover, a lot of that complexity is the price we pay for dealing with densities. We think that this is worth it and the lesson that the parameterisation that you are given may not be the correct parameterisation to use when specifying your prior is an important one!</p>
<p>The <a href="https://projecteuclid.org/journals/statistical-science/volume-32/issue-1/Penalising-Model-Component-Complexity--A-Principled-Practical-Approach-to/10.1214/16-STS576.full">original paper</a> contains a bunch of other examples. The paper was discussed and we wrote a <a href="https://projecteuclid.org/journals/statistical-science/volume-32/issue-1/You-Just-Keep-on-Pushing-My-Love-over-the-Borderline/10.1214/17-STS576REJ.full">rejoinder</a><sup>40</sup>, which contains an out-of-date list of other PC priors people have derived. If you are interested in some other people’s views of this idea, a good place to start is <a href="https://projecteuclid.org/journals/statistical-science/volume-32/issue-1">the discussion of the original paper</a>.</p>
<p>There are also PC priors for <a href="https://arxiv.org/abs/1503.00256">Gaussian Processes</a>, <a href="https://arxiv.org/abs/1601.01180">disease mapping models</a>, <a href="https://arxiv.org/abs/1608.08941">AR(p) processes</a>, <a href="https://arxiv.org/abs/1902.00242">variance parameters in multilevel models</a>, and many more applications.</p>
<p>PC priors are all over the <a href="https://r-inla.org">INLA</a> software package and its documentation contains a bunch more examples.</p>
<p>Try them out. They’ll make you happy.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>I’ve not turned on my computer for six weeks and tbh I finished 3 games and I’m caught up on TV and the weather is shite.↩︎</p></li>
<li id="fn2"><p>“But what about sparse matrices?!” exactly 3 people ask. I’ll get back to them. But this is what I’m feeling today.↩︎</p></li>
<li id="fn3"><p>I am told my Mercury is in Libra and truly I am not living that with those posts. Maybe Mercury was in Gatorade when I wrote them. So if we can’t be balanced at least let’s like things.↩︎</p></li>
<li id="fn4"><p>Our weapons of ass destruction that lives in Canada?↩︎</p></li>
<li id="fn5"><p>Negative binomial parameterised by mean and overdispersion so that its mean is <img src="https://latex.codecogs.com/png.latex?%5Cmu"> and the variance is <img src="https://latex.codecogs.com/png.latex?%5Cmu(1+%5Calpha%20%5Cmu)"> because we are not flipping fucking coins here↩︎</p></li>
<li id="fn6"><p>Hello and welcome to Statistics for Stupid Children. My name is Daniel and I will be your host today.↩︎</p></li>
<li id="fn7"><p>If we didn’t have stupid children we’d never get dumb adults and then who would fuck me? You? You don’t have that sort of time. You’ve got a mortgage to service and interest rates are going up. You’ve got your Warhammer collection and it is simply not going to paint itself. You’ve been meaning to learn how to cook Thai food. You simply do not have the time. (I’m on SSRIs so it’s never clear what will come first: the inevitable decay and death of you and your children and your children’s children; the interest, eventual disinterest, and inevitable death of the family archivist from the far future who digs up your name from the digital graveyard; the death of the final person who will ever think of you, thereby removing you from the mortal realm entirely; the death of the universe; or me. Fucking me is a real time commitment.)↩︎</p></li>
<li id="fn8"><p>Gamma is parameterised by shape and rate, so <img src="https://latex.codecogs.com/png.latex?u_i"> has mean 1 and variance <img src="https://latex.codecogs.com/png.latex?%5Calpha">.↩︎</p></li>
<li id="fn9"><p>integrate↩︎</p></li>
<li id="fn10"><p>Sometimes, people still refer to these as <em>hyperparameters</em> and put priors on them, which would clarify things, but like everything in statistics there’s no real agreed upon usage. Because why would anyone want that?↩︎</p></li>
<li id="fn11"><p>somehow↩︎</p></li>
<li id="fn12"><p>location parameter↩︎</p></li>
<li id="fn13"><p>This is critical: we <em>do not know</em> <img src="https://latex.codecogs.com/png.latex?%5Cnu"> so the only way we can put a sensible prior on the scaling parameter is if we disentangle the role of these two parameters!↩︎</p></li>
<li id="fn14"><p>In fact, if my model estimated the data-level variance to be nearly zero I would assume I’ve fucked something up elsewhere and my model is either over-fitting or I have a redundancy in my model (like if <img src="https://latex.codecogs.com/png.latex?J%20=%20n">).↩︎</p></li>
<li id="fn15"><p>There are some mathematical peculiarities that we will run into later when the base model is singular. But they’re not too bad.↩︎</p></li>
<li id="fn16"><p>The Arianist heresy is that God, Jesus, and the Holy Spirit are three separate beings rather than consubstantial. It’s the reason for that bit of the Nicene. The statistical version most commonly occurs when you consider you model for your data conditional on the parameters (you likelihood) and your model for the parameters (your prior) as separate objects. This can lead to really dumb priors and bad inferences.↩︎</p></li>
<li id="fn17"><p>Complaining that a prior is adding information is like someone complaining to you that his boyfriend has stopped fucking him and you subsequently discovering that this is because his boyfriend died a few weeks ago. Like I’m sorry Jonathan, I know even the sight of a traffic cone sets your bussy a-quiverin’, but there really are bigger concerns and I’m gonna need you to focus.↩︎</p></li>
<li id="fn18"><p>In this story, the bigger concerns are things like misspecification, incorrect assumptions, data problems etc etc, the traffic cone is an unbiased estimator, Jonathan is our stand in for a generic data analyst, and Jonathan’s bussy is said data scientist’s bussy.↩︎</p></li>
<li id="fn19"><p>Yes, I know that there are problems with giving my generic data analyst a male name. Did I carefully think through the gender and power dynamics in my bussy simile? I think the answer to that is obvious.↩︎</p></li>
<li id="fn20"><p>We use priors for the same reason that other people use penalties: we don’t want to go into a weird corner of our model space <em>unless</em> our data explicitly drags us there↩︎</p></li>
<li id="fn21"><p>This is a bit technical. When a model is over-parameterised, it’s not always possible to recover all of the parameters. So we ideally want to make sure that if there are bunch of asymptotically equivalent parameters, our prior operates sensibly on that set. An example of this will come in a future post where I’ll talk about priors for the parameters of a Gaussian process.↩︎</p></li>
<li id="fn22"><p>That Arianism thing creeping in again!↩︎</p></li>
<li id="fn23"><p>There are examples of theoretically motivated priors where it’s wildly expensive to compute their densities. We will see one in a later post about GPs.↩︎</p></li>
<li id="fn24"><p>Sure, Jan.&nbsp;Of course we want that. But we believed that it was important to include this in a list of desiderata because we <em>never</em> want to say “our prior has motivation X and therefore it is good”. It is not enough to be pure, you actually have to work.↩︎</p></li>
<li id="fn25"><p>What do I mean by near? Read on McDuff.↩︎</p></li>
<li id="fn26"><p>Think of it as a P-spline if you must. The the important thing is that the weights of the basis functions are jointly normal with mean zero and precision matrix <img src="https://latex.codecogs.com/png.latex?%5Clambda%20Q">.↩︎</p></li>
<li id="fn27"><p>Given the knots, which are fixed↩︎</p></li>
<li id="fn28"><p>I might talk about <a href="https://arxiv.org/abs/2105.09712">more advanced solutions</a> at some point.↩︎</p></li>
<li id="fn29"><p>Strictly how many bits would we need ↩︎</p></li>
<li id="fn30"><p>The largest absolute difference between the probability that an event <img src="https://latex.codecogs.com/png.latex?A"> happens under <img src="https://latex.codecogs.com/png.latex?f"> and <img src="https://latex.codecogs.com/png.latex?g">.↩︎</p></li>
<li id="fn31"><p>When performing the battered sav, it’s important to not speed up too quickly lest you over-batter.↩︎</p></li>
<li id="fn32"><p>It also might not. I don’t care to work it out.↩︎</p></li>
<li id="fn33"><p>The “easy” way to get this is to use the fact that the Gamma is in the exponential family and use the general formula for KL divergences in exponential families. The easier way is to look it up on Wikipedia↩︎</p></li>
<li id="fn34"><p>Using <a href="https://functions.wolfram.com/GammaBetaErf/LogGamma/06/03/">asymptotic expansions</a> for the log of a Gamma function at infinity↩︎</p></li>
<li id="fn35"><p>I’ll be dead before I declare that something is an approximation without bloody checking how good it is.↩︎</p></li>
<li id="fn36"><p>We have already included information that <img src="https://latex.codecogs.com/png.latex?%5Cxi"> is a flexibility parameter with base model <img src="https://latex.codecogs.com/png.latex?%5Cxi_%5Ctext%7Bbase%7D">, but that is model-specific information. Now we move on to <em>problem</em> specific information.↩︎</p></li>
<li id="fn37"><p>the have the same units↩︎</p></li>
<li id="fn38"><p>Same thing happens if we want a particular quantity not to be too small, just swap the signs↩︎</p></li>
<li id="fn39"><p>Always average on the natural scale. For non-negative parameters geometric means make a lot more sense than arithmetic means!↩︎</p></li>
<li id="fn40"><p>Homosexually titled <em>You just keep on pushing my love over the borderline: a rejoinder</em>. I’m still not sure how I got away with that.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2022,
  author = {Simpson, Dan},
  title = {Priors Part 4: {Specifying} Priors That Appropriately
    Penalise Complexity},
  date = {2022-09-03},
  url = {https://dansblog.netlify.app/2022-08-29-priors4/2022-08-29-priors4.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2022" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2022. <span>“Priors Part 4: Specifying Priors That
Appropriately Penalise Complexity.”</span> September 3, 2022. <a href="https://dansblog.netlify.app/2022-08-29-priors4/2022-08-29-priors4.html">https://dansblog.netlify.app/2022-08-29-priors4/2022-08-29-priors4.html</a>.
</div></div></section></div> ]]></description>
  <category>Prior distributions</category>
  <category>fundamentals</category>
  <category>PC priors</category>
  <guid>https://dansblog.netlify.app/posts/2022-08-29-priors4/priors4.html</guid>
  <pubDate>Fri, 02 Sep 2022 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2022-08-29-priors4/tina.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Tail stabilization of importance sampling etimators: A bit of theory</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2022-06-03-that-psis-proof/that-psis-proof.html</link>
  <description><![CDATA[ 





<p>Imagine you have a target probability distribution <img src="https://latex.codecogs.com/png.latex?p(%5Ctheta)"> and you want to estimate the expectation <img src="https://latex.codecogs.com/png.latex?I_h%20=%20%5Cint%20h(%5Ctheta)%20p(%5Ctheta)%5C,d(%5Ctheta)">. That’s lovely and everything, but if it was easy none of us would have jobs. High-dimensional quadrature is a pain in the arse.</p>
<p>A very simple way to get an decent estimate of <img src="https://latex.codecogs.com/png.latex?I_h"> is to use <em>importance sampling</em>, that is taking draws <img src="https://latex.codecogs.com/png.latex?%5Ctheta_s">, <img src="https://latex.codecogs.com/png.latex?s%20=%201,%5Cldots,%20S"> from some proposal distribution <img src="https://latex.codecogs.com/png.latex?%5Ctheta_s%20%5Csim%20g(%5Ctheta)">. Then, noting that <img src="https://latex.codecogs.com/png.latex?%0AI_h%20=%20%5Cint%20h(%5Ctheta)%20p%20(%5Ctheta)%5C,d%5Ctheta%20=%20%5Cint%20h(%5Ctheta)%20%5Cunderbrace%7B%5Cfrac%7Bp(%5Ctheta)%7D%7Bg(%5Ctheta)%7D%7D_%7Br(%5Ctheta)%7Dg(%5Ctheta)%5C,d%5Ctheta,%0A"> we can use Monte Carlo to estimate the second integral. This leads to the importance sampling estimator <img src="https://latex.codecogs.com/png.latex?%0AI_h%5ES%20=%20%5Csum_%7Bs=1%7D%5ES%20h(%5Ctheta_s)%20r(%5Ctheta_s).%0A"></p>
<p>This all seems marvellous, but there is a problem. Even though <img src="https://latex.codecogs.com/png.latex?h"> is probably a very pleasant function and <img src="https://latex.codecogs.com/png.latex?g"> is a nice friendly distribution, <img src="https://latex.codecogs.com/png.latex?r(%5Ctheta)"> can be an absolute beast. Why? Well it’s<sup>1</sup> the ratio of two densities and there is no guarantee that the ratio of two nice functions is itself a nice function. In particular, if the bulk of the distributions <img src="https://latex.codecogs.com/png.latex?p"> and <img src="https://latex.codecogs.com/png.latex?g"> are in different places, you’ll end up with the situation where for most draws <img src="https://latex.codecogs.com/png.latex?r(%5Ctheta_s)"> is very small<sup>2</sup> and a few will be HUGE<sup>3</sup>.</p>
<p>This will lead to an extremely unstable estimator.</p>
<p>It is pretty well known that the raw importance sampler <img src="https://latex.codecogs.com/png.latex?I_h%5ES"> will behave nicely (that is will be unbiased with finite variance) precisely when the distribution of <img src="https://latex.codecogs.com/png.latex?r_s%20=%20r(%5Ctheta_s)"> has finite variance.</p>
<p>Elementary treatments stop there, but they miss two very big problems. The most obvious one is that it’s basically impossible to check if the variance of <img src="https://latex.codecogs.com/png.latex?r_s"> is finite. A second, much larger but much more subtle problem, is that the variance can be finite but <em>massive</em>. This is probably the most common case in high dimensions. McKay has an excellent example where the importance ratios are <em>bounded</em>, but that bound is so large that it is infinite for all intents and purposes.</p>
<p>All of which is to say that importance sampling doesn’t work unless you work on it.</p>
<section id="truncated-importance-sampling" class="level2">
<h2 class="anchored" data-anchor-id="truncated-importance-sampling">Truncated importance sampling</h2>
<p>If the problem is the fucking ratios then by gum we will fix the fucking ratios. Or so the saying goes.</p>
<p>The trick turns out to be modifying the largest ratios enough that we stabilise the variance, but not so much as to overly bias the estimate.</p>
<p>The first version of this was <a href="https://www.jstor.org/stable/27594308?seq=1">truncated importance sampling</a> (TIS), which selects a threshold <img src="https://latex.codecogs.com/png.latex?T"> and estimates the expectation as <img src="https://latex.codecogs.com/png.latex?%0AI_%5Ctext%7BTIS%7D%5ES%20=%20%5Cfrac%7B1%7D%7BS%7D%5Csum_%7Bs=%201%7D%5ES%20h(%5Ctheta_s)%20%5Cmax%5C%7Br(%5Ctheta_s),%20T%5C%7D.%0A"> It’s pretty obvious that <img src="https://latex.codecogs.com/png.latex?I%5ES_%5Ctext%7BTIS%7D"> has finite variance for any fixed <img src="https://latex.codecogs.com/png.latex?T">, but we should be pretty worried about the bias. Unsurprisingly, there is going to be a trade-off between the variance and the bias. So let’s explore that.</p>
<section id="the-bias-of-tis" class="level3">
<h3 class="anchored" data-anchor-id="the-bias-of-tis">The bias of TIS</h3>
<p>To get an expression for the bias, first let us write <img src="https://latex.codecogs.com/png.latex?r_s%20=%20r(%5Ctheta_s)"> and <img src="https://latex.codecogs.com/png.latex?h_s%20=%20h(%5Ctheta_s)"> for <img src="https://latex.codecogs.com/png.latex?%5Ctheta_s%20%5Csim%20g">. Occasionally we will talk about the joint distribution or <img src="https://latex.codecogs.com/png.latex?(r_s,h_s)%20%5Csim%20(R,H)">. Sometimes we will also need to use the indicator variables <img src="https://latex.codecogs.com/png.latex?z_i%20=%201_%7Br_i%20%3C%20T%7D">.</p>
<p>Then, we can write<sup>4</sup> <img src="https://latex.codecogs.com/png.latex?%0AI%20=%20%5Cmathbb%7BE%7D(HR%20%5Cmid%20R%20%5Cleq%20T)%20%5CPr(R%20%5Cleq%20T)%20+%20%5Cmathbb%7BE%7D(HR%20%5Cmid%20R%20%3E%20T)%20%5CPr(R%20%3E%20T).%0A"></p>
<p>How does this related to TIS? Well. Let <img src="https://latex.codecogs.com/png.latex?M%20=%20%5Csum_%7Bs=1%7D%5ES%20z_i"> be the random variable denoting the number of times <img src="https://latex.codecogs.com/png.latex?r_i%20%3E%20T">. Then, <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Cmathbb%7BE%7D(I_%5Ctext%7BTIC%7D%5ES)%20&amp;=%20%5Cmathbb%7BE%7D%5Cleft(%20%5Cfrac%7B1%7D%7BS%7D%5Csum_%7Bs=1%7D%5ESz_ih_ir_i%5Cright)%20%20+%20%5Cmathbb%7BE%7D%5Cleft(%20%5Cfrac%7BT%7D%7BS%7D%5Csum_%7Bs=1%7D%5ES(1-z_i)h_i%5Cright)%20%5C%5C%0A&amp;=%5Cmathbb%7BE%7D_M%5Cleft%5B%5Cfrac%7BS-M%7D%7BS%7D%5Cmathbb%7BE%7D(HR%20%5Cmid%20R%20%3C%20T)%20+%20%5Cfrac%7BMT%7D%7BS%7D%5Cmathbb%7BE%7D(H%20%5Cmid%20R%20%3E%20T)%5Cright%5D%20%5C%5C%0A&amp;=%5Cmathbb%7BE%7D(HR%20%5Cmid%20R%20%5Cleq%20T)%20%5CPr(R%20%5Cleq%20T)%20+%20T%5Cmathbb%7BE%7D(H%20%5Cmid%20R%20%3E%20T)%20%5CPr(R%20%3E%20T).%0A%5Cend%7Balign*%7D"></p>
<p>Hence the bias in TIS is <img src="https://latex.codecogs.com/png.latex?%0AI%20-%20%5Cmathbb%7BE%7D(I_%5Ctext%7BTIS%7D%5ES)%20=%20%5Cmathbb%7BE%7D(H(R-T)%20%5Cmid%20R%20%3E%20T)%20%5CPr(R%20%3E%20T).%0A"></p>
<p>To be honest, this doesn’t look phenomenally interesting for fixed <img src="https://latex.codecogs.com/png.latex?T">, however if we let <img src="https://latex.codecogs.com/png.latex?T%20=%20T_S"> depend on the sample size then as long as <img src="https://latex.codecogs.com/png.latex?T_S%20%5Crightarrow%20%5Cinfty"> we get vanishing bias.</p>
<p>We can get more specific if we make the assumption about the tail of the importance ratios. In particular, we will assume that<sup>5</sup> <img src="https://latex.codecogs.com/png.latex?1-R(r)%20=%20%5CPr(R%20%3E%20r)%20=%20cr%5E%7B-1/k%7D(1+o(1))"> for some<sup>6</sup> <img src="https://latex.codecogs.com/png.latex?k%3C1">.</p>
<p>While it seems like this will only be useful for estimating <img src="https://latex.codecogs.com/png.latex?%5CPr(R%3ET)">, it turns out that under some mild<sup>7</sup> technical conditions, the conditional excess distribution function<sup>8</sup> <img src="https://latex.codecogs.com/png.latex?%0AR_T(y)%20=%20%5CPr(R%20-%20T%20%5Cleq%20y%20%5Cmid%20R%20%3E%20T)%20=%20%5Cfrac%7BR(T%20+%20y)%20-%20R(T)%7D%7B1-R(T)%7D,%0A"> is well approximated by a Generalised Pareto Distribution as <img src="https://latex.codecogs.com/png.latex?T%5Crightarrow%20%5Cinfty">. Or, in maths, as <img src="https://latex.codecogs.com/png.latex?T%5Crightarrow%20%5Cinfty">, <img src="https://latex.codecogs.com/png.latex?%0AR_T(y)%20%5Crightarrow%20%5Cbegin%7Bcases%7D%201-%20%5Cleft(1%20+%20%5Cfrac%7Bky%7D%7B%5Csigma%7D%5Cright)%5E%7B-1/k%7D,%20%5Cquad%20&amp;%20k%20%5Cneq%200%20%5C%5C%0A1-%20%5Cmathrm%7Be%7D%5E%7B-y/%5Csigma%7D,%20%5Cquad%20&amp;k%20=%200,%0A%5Cend%7Bcases%7D%0A"> for some <img src="https://latex.codecogs.com/png.latex?%5Csigma%20%3E%200"> and <img src="https://latex.codecogs.com/png.latex?k%20%5Cin%20%5Cmathbb%7BR%7D">. The shape<sup>9</sup> parameter <img src="https://latex.codecogs.com/png.latex?k"> is very important for us, as it tells us how many moments the distribution has. In particular, if a distribution <img src="https://latex.codecogs.com/png.latex?X"> has shape parameter <img src="https://latex.codecogs.com/png.latex?k">, then <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D%7CX%7C%5E%5Calpha%20%3C%20%5Cinfty,%20%5Cquad%20%5Cforall%20%5Calpha%20%3C%20%5Cfrac%7B1%7D%7Bk%7D.%0A"> We will focus exclusively on the case where <img src="https://latex.codecogs.com/png.latex?k%20%3C%201">. When <img src="https://latex.codecogs.com/png.latex?k%20%3C%201/2">, the distribution has finite variance.</p>
<p>If <img src="https://latex.codecogs.com/png.latex?1-%20R(r)%20=%20cr%5E%7B-1/k%7D(1+%20%20o(1))">, then the conditional exceedence function is <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0AR_T(y)%20&amp;=%20%20%5Cfrac%7BcT%5E%7B-1/k%7D(1+%20%20o(1))%20-%20c(T+y)%5E%7B-1/k%7D(1+%20%20o(1))%7D%7BcT%5E%7B-1/k%7D(1+%20%20o(1)))%7D%20%5C%5C%0A&amp;=%20%5Cleft%5B1%20-%20%5Cleft(1%20+%20%5Cfrac%7By%7D%7BT%7D%5Cright)%5E%7B-1/k%7D%5Cright%5D(1%20+%20o(1)),%0A%5Cend%7Balign*%7D"> which suggests that as <img src="https://latex.codecogs.com/png.latex?T%5Crightarrow%20%5Cinfty">, <img src="https://latex.codecogs.com/png.latex?R_T"> converges to a generalised Pareto distribution with shape parameter <img src="https://latex.codecogs.com/png.latex?k"> and scale parameter <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(T)">.</p>
<p>All of this work lets us approximate the distribution of <img src="https://latex.codecogs.com/png.latex?(R-T%20%5Cmid%20R%3ET%20)"> and use the formula for the mean of a generalised Pareto distribution. This gives us the estimate <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(R-%20T%20%5Cmid%20R%3ET)%20%5Capprox%20%5Cfrac%7BT%7D%7B1-k%7D,%0A"> which estimates the bias when <img src="https://latex.codecogs.com/png.latex?h(%5Ctheta)"> is constant<sup>10</sup> as <img src="https://latex.codecogs.com/png.latex?%0AI%20-%20%5Cmathbb%7BE%7D(I_%5Ctext%7BTIS%7D%5ES)%20%5Capprox%20%5Cmathcal%7BO%7D%5Cleft(T%5E%7B1-1/k%7D%5Cright).%0A"></p>
<p>For what it’s worth, Ionides got the same result more directly in the TIS paper, but he wasn’t trying to do what I’m trying to do.</p>
</section>
<section id="the-variance-in-tis" class="level3">
<h3 class="anchored" data-anchor-id="the-variance-in-tis">The variance in TIS</h3>
<p>The variance is a little bit more annoying. We want it to go to zero.</p>
<p>As before, we condition on <img src="https://latex.codecogs.com/png.latex?z_s"> (or, equivalently, <img src="https://latex.codecogs.com/png.latex?M">) and then use the law of total variance. We know from the bias calculation that <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(I_%5Ctext%7BTIS%7D%5ES%20%5Cmid%20M)%20=%5Cfrac%7BS-M%7D%7BS%7D%5Cmathbb%7BE%7D(HR%20%5Cmid%20R%3ET)%20+%20%5Cfrac%7BTM%7D%7BS%7D%5Cmathbb%7BE%7D(H%20%5Cmid%20R%3ET).%0A"></p>
<p>A similarly quick calculation tells us that <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BV%7D(I_%5Ctext%7BTIS%7D%5ES%20%5Cmid%20M)%20=%20%5Cfrac%7BS-M%7D%7BS%5E2%7D%5Cmathbb%7BV%7D(HR%20%5Cmid%20R%20%5Cleq%20T)%20+%5Cfrac%7BMT%5E2%7D%7BS%5E2%7D%5Cmathbb%7BV%7D(H%20%5Cmid%20R%3ET).%0A"> To close it out, we recall that <img src="https://latex.codecogs.com/png.latex?M"> is the sum of Bernoulli random variables so <img src="https://latex.codecogs.com/png.latex?%0AM%20%5Csim%20%5Ctext%7BBinomial%7D(S,%20%5CPr(R%20%3E%20T)).%0A"></p>
<p>With this, we can get an expression for the unconditional variance. To simplify the expression, let’s write <img src="https://latex.codecogs.com/png.latex?p_T%20=%20%5CPr(R%20%3E%20T)">. Then, <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Cmathbb%7BV%7D(I_%5Ctext%7BTIS%7D%5ES)%20&amp;=%5Cmathbb%7BE%7D_M%5Cmathbb%7BV%7D(I_%5Ctext%7BTIS%7D%5ES%20%5Cmid%20M)%20+%20%5Cmathbb%7BV%7D_M%5Cmathbb%7BE%7D(I_%5Ctext%7BTIS%7D%5ES%20%5Cmid%20M)%20%5C%5C%0A&amp;=%20S%5E%7B-1%7D(1-p_T)%5Cmathbb%7BV%7D(HR%20%5Cmid%20R%20%5Cleq%20T)%20+S%5E%7B-1%7DT%5E2p_T%5Cmathbb%7BV%7D(H%20%5Cmid%20R%3ET)%5C%5C%0A&amp;%5Cquad%20+%20S%5E%7B-1%7Dp_T(1-p_T)%5Cmathbb%7BE%7D(HR%20%5Cmid%20R%3ET)%5E2%20+%20S%5E%7B-1%7DTp_T(1-p_T)%5Cmathbb%7BE%7D(H%20%5Cmid%20R%3ET)%5E2.%0A%5Cend%7Balign*%7D"></p>
<p>There are four terms in the variance. The first and third terms are clearly harmless: they go to zero no matter how we choose <img src="https://latex.codecogs.com/png.latex?T_S">. Our problem terms are the second and fourth. We can tame the fourth term if we choose <img src="https://latex.codecogs.com/png.latex?T_S%20=%20o(S)">. But that doesn’t seem to help with the second term. But it turns out it is enough. To see this, we note that <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0ATp_T%5Cmathbb%7BV%7D(H%5Cmid%20R%3ET)%20&amp;=%5Cleq%20Tp_T%5Cmathbb%7BE%7D(H%5E2%20%5Cmid%20R%3ET)%5C%5C%0A&amp;%5Cleq%20p_T%5Cmathbb%7BE%7D(H%5E2%20R%5Cmid%20R%3ET)%20%5C%5C%0A&amp;%5Cleq%20%5Cmathbb%7BE%7D(H%5E2%20R)%5C%5C%0A&amp;=%20%5Cint%20h(%5Ctheta)%5E2%20p(%5Ctheta)%5C,d%5Ctheta%20%3C%20%5Cinfty.%0A%5Cend%7Balign*%7D"> where the second inequality uses the fact that <img src="https://latex.codecogs.com/png.latex?R%3ET"> and the third comes from the law of total probability.</p>
<p>So the TIS estimator has vanishing bias and variance as long as the truncation <img src="https://latex.codecogs.com/png.latex?T_S%20%5Crightarrow%20%5Cinfty"> and <img src="https://latex.codecogs.com/png.latex?T_S%20=%20o(S)">. Once again, this is in the TIS paper, where it is proved in a much more compact way.</p>
</section>
<section id="asymptotic-properties" class="level3">
<h3 class="anchored" data-anchor-id="asymptotic-properties">Asymptotic properties</h3>
<p>It can also be useful to have an understanding of how wild the fluctuations <img src="https://latex.codecogs.com/png.latex?I%20-%20I_%5Ctext%7BTIS%7D%5ES"> are. For traditional importance sampling, we know that if <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(R%5E2)"> is finite, then then the fluctuations are, asymptotically, normally distributed with mean zero. Non-asymptotic results were given by <a href="https://arxiv.org/abs/1511.01437">Chatterjee and Diaconis</a> that also hold even when the estimator has infinite variance.</p>
<p>For TIS, it’s pretty obvious that for fixed <img src="https://latex.codecogs.com/png.latex?T"> and <img src="https://latex.codecogs.com/png.latex?h%20%5Cgeq%200">, <img src="https://latex.codecogs.com/png.latex?I_%5Ctext%7BTIS%7D%5ES"> will be asymptotically normal (it is, after all, the sum of bounded random variables). For growing sequences <img src="https://latex.codecogs.com/png.latex?T_S"> it’s a tiny bit more involved: it is now a triangular array<sup>11</sup> rather than a sequence of random variables. But in the end very classical results tell us that for bounded<sup>12</sup> <img src="https://latex.codecogs.com/png.latex?h">, the fluctuations of the TIS estimator are asymptotically normal.</p>
<p>It’s worth saying that when <img src="https://latex.codecogs.com/png.latex?h(%5Ctheta)"> is unbounded, it <em>might</em> be necessary to truncate the product <img src="https://latex.codecogs.com/png.latex?h_ir_i"> rather than just <img src="https://latex.codecogs.com/png.latex?r_i">. This is especially relevant if <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(H%20%5Cmid%20R=r)"> grows rapidly with <img src="https://latex.codecogs.com/png.latex?r">. Personally, I can’t think of a case where this happens: <img src="https://latex.codecogs.com/png.latex?r(%5Ctheta)"> usually grows (super-)exponentially in <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> while <img src="https://latex.codecogs.com/png.latex?h(%5Ctheta)"> usually grows polynomially, which implies <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(H%20%5Cmid%20R=r)"> grows (poly-)logarithmically.</p>
<p>The other important edge case is that when <img src="https://latex.codecogs.com/png.latex?h(%5Ctheta)"> can be both positive and negative, it might be necessary to truncate <img src="https://latex.codecogs.com/png.latex?h_ir_i"> both above <em>and</em> below.</p>
</section>
</section>
<section id="winsorised-importance-sampling" class="level2">
<h2 class="anchored" data-anchor-id="winsorised-importance-sampling">Winsorised importance sampling</h2>
<p>TIS has lovely theoretical properties, but it’s a bit challenging to use in practice. The problem is, there’s really no practical guidance on how to choose the truncation sequence.</p>
<p>So let’s do this differently. What if instead of specifying a threshold directly, we instead decided that the largest <img src="https://latex.codecogs.com/png.latex?M"> values are potentially problematic and should be modified? Recall that for TIS, the number of samples that exceeded the threshold, <img src="https://latex.codecogs.com/png.latex?M">, was random while the threshold was fixed. This is the opposite situation: the number of exceedences is fixed but the threshold is random.</p>
<p>The threshold is now the <img src="https://latex.codecogs.com/png.latex?M">th largest value of <img src="https://latex.codecogs.com/png.latex?r_s">. We denote this using order statistics notation: we re-order the sample so that <img src="https://latex.codecogs.com/png.latex?%0Ar_%7B1:S%7D%20%5Cleq%20r_%7B2:S%7D%5Cleq%20%5Cldots%20r_%7BS:S%7D.%0A"> With this notation, the threshold is <img src="https://latex.codecogs.com/png.latex?T%20=%20r_%7BS-M+1:S%7D"> and the Winsorized importance sampler (WIS) is <img src="https://latex.codecogs.com/png.latex?%0AI%5ES_%5Ctext%7BWIS%7D%20=%20%5Cfrac%7B1%7D%7BS%7D%5Csum_%7Bs%20=%201%7D%5E%7BS-M%7D%20h_%7Bs:S%7Dr_%7Bs:S%7D%20+%20%5Cfrac%7Br_%7BS-M+1:S%7D%7D%7BS%7D%5Csum_%7Bs=S-M+1%7D%5ES%20h_%7Bs:S%7D,%0A"> where <img src="https://latex.codecogs.com/png.latex?(r_%7Bs:S%7D,%20h_%7Bs:S%7D)"> are the <img src="https://latex.codecogs.com/png.latex?(r_s,%20h_s)"> pairs <em>ordered</em> so that <img src="https://latex.codecogs.com/png.latex?r_%7B1:S%7D%20%5Cleq%20r_%7B2:S%7D%5Cleq%20%5Ccdots%20%5Cleq%20r_%7BS:S%7D">. Note that <img src="https://latex.codecogs.com/png.latex?h_%7Bs:S%7D"> are not necessarily in increasing order: they are known as <em>concomitants</em> of <img src="https://latex.codecogs.com/png.latex?r_%7Bs:S%7D">, which is just a fancy way to say that they’re along for the ride. It’s <em>very</em> important that we reorder the <img src="https://latex.codecogs.com/png.latex?h_s"> when we reorder the <img src="https://latex.codecogs.com/png.latex?r_s">, otherwise we won’t preserve the joint distribution and we’ll end up with absolute rubbish.</p>
<p>We can already see that this is both much nicer and much wilder than the TIS distribution. It is <em>convenient</em> that <img src="https://latex.codecogs.com/png.latex?M"> is no longer random! But what the hell are we going to do about those order statistics? Well, the answer is very much the same thing as before: condition on them and hope for the best.</p>
<p>Conditioned on the event<sup>13</sup> <img src="https://latex.codecogs.com/png.latex?%5C%7Br_%7BS-M+1:S%7D%20=%20T%5C%7D">, we get <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D%5Cleft(I_%5Ctext%7BWIS%7D%5ES%20%5Cmid%20r_%7BS-M+1:S%7D%20=%20T%5Cright)%20=%20%5Cleft(1%20-%20%5Cfrac%7BM%7D%7BS%7D%5Cright)%5Cmathbb%7BE%7D(RH%20%5Cmid%20R%20%3C%20T)%20+%20%5Cfrac%7BMT%7D%7BS%7D%20%5Cmathbb%7BE%7D(H%20%5Cmid%20R%20%5Cgeq%20T).%0A"> From this, we get that the bias, conditional on <img src="https://latex.codecogs.com/png.latex?r_%7BS-M+1:S%7D%20=%20T"> is <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Bmultline*%7D%0A%5Cleft%7CI%20-%20%5Cmathbb%7BE%7D%5Cleft(I_%5Ctext%7BWIS%7D%5ES%20%5Cmid%20r_%7BS-M+1:S%7D%20=%20T%5Cright)%5Cright%7C%20=%5Cleft%7C%5Cleft%5B%5CPr(R%20%3C%20T)%20-%20%5Cleft(1%20-%20%5Cfrac%7BM%7D%7BS%7D%5Cright)%5Cright%5D%5Cmathbb%7BE%7D(RH%20%5Cmid%20R%20%3C%20T)%20%5Cright.%5C%5C%0A%5Cleft.+%20%5Cleft%5B%5CPr(R%20%5Cgeq%20T)%20-%20%5Cfrac%7BM%7D%7BS%7D%5Cright%5D%20%5Cmathbb%7BE%7D(H(R%20-%20T)%20%5Cmid%20R%20%5Cgeq%20T)%5Cright%7C.%0A%5Cend%7Bmultline*%7D"></p>
<p>You should immediately notice that we are in quite a different situation from TIS, where only the tail contributed to the bias. By fixing <img src="https://latex.codecogs.com/png.latex?M"> and randomising the threshold, we have bias contributions from both the bulk (due, essentially, to a weighting error) and from the tail (due to both the weighting error and the truncation). This is going to require us to be a bit creative.</p>
<p>We could probably do something more subtle and clever here, but that is not my way. Instead, let’s use the triangle inequality to say <img src="https://latex.codecogs.com/png.latex?%0A%5Cleft%7C%5Cmathbb%7BE%7D(RH%20%5Cmid%20R%20%3E%20T)%5Cright%7C%20%5Cleq%20%5Cfrac%7B%5Cmathbb%7BE%7D(R%20%7CH%7C%201(R%3CT))%7D%7B%5CPr(R%20%3CT)%7D%20%5Cleq%20%5Cfrac%7B%5C%7Ch%5C%7C_%7BL%5E1(p)%7D%7D%7B%5CPr(R%20%20%3CT)%7D%0A"> and so the first term in the bias can be bounded if we can bound the relative error <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D%5Cleft%7C1%20-%20%5Cfrac%7B1-%20M/S%7D%7B%5CPr(R%20%3C%20r_%7BS-M+1:S%7D)%7D%5Cright%7C.%0A"></p>
<p>Now the more sensible among you will say <em><a href="https://www.youtube.com/watch?v=R-HryG35A2E">Daniel, No!</a> That’s a ratio! That’s going to be hard to bound</em>. And, of course, you are right. But here’s the thing: if <img src="https://latex.codecogs.com/png.latex?M"> is small relative to <img src="https://latex.codecogs.com/png.latex?S">, it is <em>tremendously</em> unlikely that <img src="https://latex.codecogs.com/png.latex?r_%7BS-M+1:S%7D"> is anywhere near zero. This is intuitively true, but also mathematically true.</p>
<p>To attack this expectation, we are going to look at a slightly different quantity that has the good grace of being non-negative.</p>
<div id="lem-lem1" class="theorem lemma">
<p><span class="theorem-title"><strong>Lemma 1</strong></span> Let <img src="https://latex.codecogs.com/png.latex?X_s">, <img src="https://latex.codecogs.com/png.latex?s=%201,%20%5Cldots%20S"> be an iid sample from <img src="https://latex.codecogs.com/png.latex?F_X">, let <img src="https://latex.codecogs.com/png.latex?0%5Cleq%20k%5Cleq%20S"> be an integer. Then <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7Bp%7D%7BF_X(x_%7Bk:S%7D)%7D%20-p%20%5Cstackrel%7Bd%7D%7B=%7D%20%5Cfrac%7Bp(S-k+1)%7D%7Bk%7D%20%5Cmathcal%7BF%7D,%0A"> and <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B1-p%7D%7B1-%20F_x/(x_%7Bk:S%7D)%7D%20-%20(1-p)%20%5Cstackrel%7Bd%7D%7B=%7D%20%5Cfrac%7Bk(1-p)%7D%7BS-k+1%7D%5Cmathcal%7BF%7D%5E%7B-1%7D%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BF%7D"> is an F-distributed random variable with parameters <img src="https://latex.codecogs.com/png.latex?(2(S-k+1),%202k)">.</p>
</div>
<div class="proof">
<p><span class="proof-title"><em>Proof</em>. </span>For any <img src="https://latex.codecogs.com/png.latex?t%5Cgeq%200">, <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5CPr%5Cleft(%5Cfrac%7Bp%7D%7BF_X(x_%7Bk:S%7D)%7D%20-%20p%20%5Cleq%20t%5Cright)%20&amp;=%5CPr%5Cleft(p%20-%20pF_X(x_%7Bk:S%7D)%20%5Cleq%20tF_X(x_%7Bk:S%7D)%5Cright)%20%5C%5C%0A&amp;=%20%5CPr%5Cleft(p%20%20%5Cleq%20(t+p)F_X(x_%7Bk:S%7D)%5Cright)%20%5C%5C%0A&amp;=%5CPr%5Cleft(F_X(x_%7Bk:S%7D)%20%5Cgeq%20%5Cfrac%7Bp%7D%7Bp+t%7D%5Cright)%5C%5C%0A&amp;=%20%5CPr%5Cleft(x_%7Bk:S%7D%20%5Cgeq%20F_X%5E%7B-1%7D%5Cleft(%5Cfrac%7Bp%7D%7Bp+t%7D%5Cright)%5Cright)%5C%5C%0A&amp;=%201-%20I_%7B%5Cfrac%7Bp%7D%7Bp+t%7D%7D(k,%20S-k+1)%20%5C%5C%0A&amp;=%20I_%7B%5Cfrac%7Bt%7D%7Bp+t%7D%7D(S-k+1,%20k),%0A%5Cend%7Balign*%7D"> where <img src="https://latex.codecogs.com/png.latex?I_p(a,b)"> is the incomplete Beta function.</p>
<p>You could, quite reasonably, ask where the hell that incomplete Beta function came from. And if I had thought to look this up, I would say that it came from Equation 2.1.5 in David and Nagaraja’s book on order statistics. Unfortunately, I did not look this up. I derived it, which is honestly not very difficult. The trick is to basically note that the event <img src="https://latex.codecogs.com/png.latex?%5C%7Bx_%7Bk:S%7D%20%5Cleq%20%5Ctau%5C%7D"> is the same as the event that at least <img src="https://latex.codecogs.com/png.latex?k"> of the samples <img src="https://latex.codecogs.com/png.latex?x_s"> are less than or equal to <img src="https://latex.codecogs.com/png.latex?%5Ctau">. Because the <img src="https://latex.codecogs.com/png.latex?x_s"> are independent, this is the probability of observing at least <img src="https://latex.codecogs.com/png.latex?k"> heads from a coin with the probability of a head <img src="https://latex.codecogs.com/png.latex?%5CPr(x%20%5Cleq%20%5Ctau)%20=%20F_X(%5Ctau)">. If you look this up on Wikipedia<sup>14</sup> you see<sup>15</sup> that it is <img src="https://latex.codecogs.com/png.latex?I_%7B1-F_X(%5Ctau)%7D(k,S-k+1)">. The rest just come from noting that <img src="https://latex.codecogs.com/png.latex?%5Ctau%20=%20F_X%5E%7B-1%7D(t/(p+t))"> and using the symmetry <img src="https://latex.codecogs.com/png.latex?1-I_p(a,b)%20=%20I_%7B1-p%7D(b,a)">.</p>
<p>To finish this off, we note that <img src="https://latex.codecogs.com/png.latex?%0A%5CPr(%5Cmathcal%7BF%7D%20%5Cleq%20x)%20=%20I_%7B%5Cfrac%7BS-k+1%7D%7B(S-k+1)x+%20k%7D%7D(S-k+1,k).%0A"> From which, we see that <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5CPr%5Cleft(%5Cfrac%7Bp%7D%7BF_X(x_%7Bk:S%7D)%7D%20-%20p%20%5Cleq%20t%5Cright)%20&amp;=%5CPr%5Cleft(%5Cmathcal%7BF%7D%20%5Cleq%20%5Cfrac%7Bk%7D%7Bp(S-k+1)%7Dt%5Cright)%20%5C%5C%0A&amp;=%20%5CPr%5Cleft(%5Cfrac%7Bp(S-k+1)%7D%7Bk%7D%5Cmathcal%7BF%7D%20%5Cleq%20t%5Cright).%0A%5Cend%7Balign*%7D"></p>
<p>The second result follows the same way and by noting that <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BF%7D%5E%7B-1%7D"> is also F-distributed with parameters <img src="https://latex.codecogs.com/png.latex?(k,%20S-k+1)">.</p>
<p><em>The proof has ended</em></p>
</div>
<p>Now, obviously, in this house we do not trust mathematics. Which is to say that I made a stupid mistake the first time I did this and forgot that when <img src="https://latex.codecogs.com/png.latex?Z"> is binomial, <img src="https://latex.codecogs.com/png.latex?%5CPr(Z%20%5Cgeq%20k)%20=%201%20-%20%5CPr(Z%20%5Cleq%20k-1)"> and had a persistent off-by-one error in my derivation. But we test out our results so we don’t end up doing the dumb thing.</p>
<p>So let’s do that. For this example, we will use generalised Pareto-distributed <img src="https://latex.codecogs.com/png.latex?X">.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2">xi <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span></span>
<span id="cb1-3">s <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb1-4">u <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span></span>
<span id="cb1-5"></span>
<span id="cb1-6">samp <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(S, k, p, </span>
<span id="cb1-7">                 <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Q =</span> \(x) u <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> s<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>((<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>x)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>xi)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>xi, </span>
<span id="cb1-8">                 <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">F =</span> \(x) <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> xi<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> u)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>s)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>xi)) {</span>
<span id="cb1-9">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use theory to draw x_{k:S}</span></span>
<span id="cb1-10">  xk <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Q</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbeta</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, k, S <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> k <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb1-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> p <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">F</span>(xk), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>p)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">F</span>(xk)))</span>
<span id="cb1-12">}</span>
<span id="cb1-13"></span>
<span id="cb1-14">S <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span></span>
<span id="cb1-15">M <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span></span>
<span id="cb1-16">k <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> S <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> M <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb1-17">p <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>M<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>S</span>
<span id="cb1-18">N <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100000</span></span>
<span id="cb1-19"></span>
<span id="cb1-20">fs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rf</span>(N, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (S <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> k <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> k )</span>
<span id="cb1-21"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">theoretical =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>p <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> p <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> fs <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (S <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> k <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>k,</span>
<span id="cb1-22">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xks =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>N, \(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">samp</span>(S, k, p)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_ecdf</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> xks), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb1-24">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_ecdf</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> theoretical), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dashed"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-25">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggtitle</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">frac</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>M<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>S , <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">R</span>(r[S<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>M<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>S]))))</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-06-03-that-psis-proof/that-psis-proof_files/figure-html/unnamed-chunk-1-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">theoretical =</span> p <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>p) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> k<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>(fs <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (S <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> k <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)),</span>
<span id="cb2-2">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xks =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>N, \(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">samp</span>(S, k, p)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>])) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb2-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_ecdf</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> xks), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb2-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_ecdf</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> theoretical), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dashed"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggtitle</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">frac</span>(M<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>S , <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">R</span>(r[S<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>M<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>S]))))</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-06-03-that-psis-proof/that-psis-proof_files/figure-html/unnamed-chunk-1-2.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Fabulous. It follow then that <img src="https://latex.codecogs.com/png.latex?%0A%5Cleft%7C1%20-%20%5Cfrac%7B1-M/S%7D%7BR(r_%7BS-M+1%7D)%7D%20%5Cright%7C%20%5Cstackrel%7Bd%7D=%20%5Cleft%7C%5Cfrac%7BM%7D%7BS%7D%20-%20%20%5Cfrac%7BM(S-M)%7D%7BS(S-M-1)%7D%5Cmathcal%7BF%7D%5Cright%7C%20%5Cleq%20%5Cfrac%7BM%7D%7BS%7D%20+%20%20%5Cfrac%7BM(S-M)%7D%7BS(S-M-1)%7D%20%5Cmathcal%7BF%7D,%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BF%7D"> has an F-distribution with <img src="https://latex.codecogs.com/png.latex?(M,%20S-M+1)"> degrees of freedom. As <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(%5Cmathcal%7BF%7D)%20=%201%20+%201/(S-M-1)">, it follows that this term goes to zero as long as <img src="https://latex.codecogs.com/png.latex?M%20=%20o(S)">. This shows that the first term in the bias goes to zero.</p>
<p>It’s worth noting here that we’ve also calculated that the bias is <em>at most</em> <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(M/S)">, however, this rate is extremely sloppy. That upper bound we just computed is <em>unlikely</em> to be tight. A better person than me would probably check, but honestly I just don’t give a shit<sup>16</sup></p>
<p>The second term in the bias is <img src="https://latex.codecogs.com/png.latex?%0A%5Cleft%5B%5CPr(R%20%5Cgeq%20T)%20-%20%5Cfrac%7BM%7D%7BS%7D%5Cright%5D%20%5Cmathbb%7BE%7D(H(R%20-%20T)%20%5Cmid%20R%20%5Cgeq%20T).%0A"> As before, we can write this as <img src="https://latex.codecogs.com/png.latex?%0A%5Cleft(1%20-%20%5Cfrac%7BM/S%7D%7B1-R(T)%7D%5Cright)%7C%5Cmathbb%7BE%7D(H(R%20-%20T)%201_%7BR%20%5Cgeq%20T%7D)%7C%20%5Cleq%20%5Cleft%7C1%20-%20%5Cfrac%7BM/S%7D%7B1-R(T)%7D%5Cright%7C%5C%7Ch%5C%7C_%7BL%5E1(p)%7D.%0A"> By our lemma, we know that the distribution of the term in the absolute value when <img src="https://latex.codecogs.com/png.latex?T%20=%20r_%7BS-M+1%7D"> is the same as <img src="https://latex.codecogs.com/png.latex?%0A1-%5Cfrac%7BM%7D%7BS%7D%20-%5Cleft(1%20-%20%5Cfrac%7BM%7D%7BS%7D%20+%20%5Cfrac%7B1%7D%7BS%7D%5Cright)%5Cmathcal%7BF%7D%20=%20(%5Cmu_F-%5Cmathcal%7BF%7D)%20%20+%5Cfrac%7BM%7D%7BS%7D(%5Cmathcal%7BF%7D-%5Cmu_F)%20-%20%5Cfrac%7B1%7D%7BS%7D%5Cmathcal%7BF%7D%20+%20%20%5Cfrac%7B1%7D%7BM-1%7D%5Cleft(%5Cfrac%7BM%7D%7BS%7D%20-%201%5Cright),%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BF%7D%20%5Csim%20%5Ctext%7BF%7D_%7B2(S-M+1),%202M%7D">, which has mean <img src="https://latex.codecogs.com/png.latex?%5Cmu_F%20=%201+(M-1)%5E%7B-1%7D"> and variance <img src="https://latex.codecogs.com/png.latex?%0A%5Csigma%5E2_F%20=%20%5Cfrac%7BM%5E2S%7D%7B(S-M+1)(M-1)%5E2(M-2)%7D%20=%20%5Cfrac%7B1%7D%7BM%7D(1%20+%20%5Cmathcal%7BO%7D(M%5E%7B-1%7D%20+%20MS%5E%7B-1%7D).%0A"> From Jensen’s inequality, we get <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(%7C%5Cmathcal%7BF%7D%20-%20%5Cmu_F%7C)%20%5Cleq%20%5Csigma_F%20=%20M%5E%7B-1/2%7D(1%20+%20o(1)).%0A"> If follows that <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D%5Cleft%7C1%20-%20%5Cfrac%7BM/S%7D%7B1-R(r_%7BS-M+1:S%7D)%7D%5Cright%7C%20%5Cleq%20M%5E%7B-1/2%7D(1+o(1))M%5E%7B1/2%7DS%5E%7B-1%7D(1%20+%20o(1))%20+%20S%5E%7B-1%7D(1+%20o(1))%20+%20(M-1)%5E%7B-1%7D(1+o(1)),%0A"> and so we get vanishing bias as long as <img src="https://latex.codecogs.com/png.latex?M%5Crightarrow%20%5Cinfty"> and <img src="https://latex.codecogs.com/png.latex?M/S%20%5Crightarrow%200">.</p>
<p>Once again, I make no claims of tightness<sup>17</sup>. Just because it’s a bit sloppy at this point doesn’t mean the job isn’t done.</p>
<div id="thm-thm1" class="theorem">
<p><span class="theorem-title"><strong>Theorem 1</strong></span> Let <img src="https://latex.codecogs.com/png.latex?%5Ctheta_s">, <img src="https://latex.codecogs.com/png.latex?s%20=%201,%5Cldots,%20S"> be an iid sample from <img src="https://latex.codecogs.com/png.latex?G"> and let <img src="https://latex.codecogs.com/png.latex?r_s%20=%20r(%5Ctheta_s)%20%5Csim%20R">. Assume that</p>
<ol type="1">
<li><p><img src="https://latex.codecogs.com/png.latex?R"> is absolutely continuous</p></li>
<li><p><img src="https://latex.codecogs.com/png.latex?M%20%20%5Crightarrow%20%5Cinfty"> and <img src="https://latex.codecogs.com/png.latex?S%5E%7B-1%7DM%20%5Crightarrow%200"></p></li>
<li><p><img src="https://latex.codecogs.com/png.latex?h%20%5Cin%20L%5E1(p)"></p></li>
</ol>
<p>Then Winsorized importance sampling converges in <img src="https://latex.codecogs.com/png.latex?L%5E1"> and is asymptotically unbiased.</p>
</div>
<p>Ok so that’s nice. But you’ll notice that I did not mention our piss-poor rate. That’s because there is absolutely no way in hell that the bias is <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(M%5E%7B-1/2%7D)">! That rate is an artefact of a <em>very</em> sloppy bound on <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D%7C1-%5Cmathcal%7BF%7D%7C">.</p>
<p>Unfortunately, Mathematica couldn’t help me out. Its asymptotic abilities shit the bed at the sight of <img src="https://latex.codecogs.com/png.latex?%7B%7D_2F_1(a,b;c;z))">, which is everywhere in the exact expression (which I’ve put below in the fold.</p>
<details>
<summary>
Mathematica expression for <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D%7C1-%5Cmathcal%7BF%7D%7C">.
</summary>
<pre><code>-(((M/(1 + S))^(-(1/2) - S/2)*Gamma[(1 + S)/2]*
     (6*(M/(1 + S))^(1/2 + M/2 + S/2)*((1 + S)/(1 - M + S))^(M/2 + S/2) - 
        5*M*(M/(1 + S))^(1/2 + M/2 + S/2)*((1 + S)/(1 - M + S))^(M/2 + S/2) + 
        M^2*(M/(1 + S))^(1/2 + M/2 + S/2)*((1 + S)/(1 - M + S))^(M/2 + S/2) + 
        8*S*(M/(1 + S))^(1/2 + M/2 + S/2)*((1 + S)/(1 - M + S))^(M/2 + S/2) - 
        6*M*S*(M/(1 + S))^(1/2 + M/2 + S/2)*((1 + S)/(1 - M + S))^(M/2 + S/2) + 
        M^2*S*(M/(1 + S))^(1/2 + M/2 + S/2)*((1 + S)/(1 - M + S))^(M/2 + S/2) + 
        2*S^2*(M/(1 + S))^(1/2 + M/2 + S/2)*((1 + S)/(1 - M + S))^(M/2 + S/2) - 
        M*S^2*(M/(1 + S))^(1/2 + M/2 + S/2)*((1 + S)/(1 - M + S))^(M/2 + S/2) - 
         6*Sqrt[-(M/(-1 + M - S))]*Sqrt[(-1 - S)/(-1 + M - S)]*
        (M/(1 - M + S))^(M/2 + S/2)*Hypergeometric2F1[1, (1/2)*(-1 + M - S), 
                                                      M/2, M/(-1 + M - S)] + 8*M*Sqrt[-(M/(-1 + M - S))]*
        Sqrt[(-1 - S)/(-1 + M - S)]*(M/(1 - M + S))^(M/2 + S/2)*
        Hypergeometric2F1[1, (1/2)*(-1 + M - S), M/2, M/(-1 + M - S)] - 
        2*M^2*Sqrt[-(M/(-1 + M - S))]*Sqrt[(-1 - S)/(-1 + M - S)]*
        (M/(1 - M + S))^(M/2 + S/2)*Hypergeometric2F1[1, (1/2)*(-1 + M - S), 
                                                      M/2, M/(-1 + M - S)] - 8*Sqrt[-(M/(-1 + M - S))]*
        Sqrt[(-1 - S)/(-1 + M - S)]*S*(M/(1 - M + S))^(M/2 + S/2)*
        Hypergeometric2F1[1, (1/2)*(-1 + M - S), M/2, M/(-1 + M - S)] + 
        4*M*Sqrt[-(M/(-1 + M - S))]*Sqrt[(-1 - S)/(-1 + M - S)]*S*
        (M/(1 - M + S))^(M/2 + S/2)*Hypergeometric2F1[1, (1/2)*(-1 + M - S), 
                                                      M/2, M/(-1 + M - S)] - 2*Sqrt[-(M/(-1 + M - S))]*
        Sqrt[(-1 - S)/(-1 + M - S)]*S^2*(M/(1 - M + S))^(M/2 + S/2)*
        Hypergeometric2F1[1, (1/2)*(-1 + M - S), M/2, M/(-1 + M - S)] + 
        6*M*(M/(1 + S))^(M/2)*((1 + S)/(1 - M + S))^(M/2 + S/2)*
        Hypergeometric2F1[(1 + S)/2, (1/2)*(1 - M + S), (1/2)*(3 - M + S), 
                          (-1 + M - S)/M] - 5*M^2*(M/(1 + S))^(M/2)*((1 + S)/(1 - M + S))^
        (M/2 + S/2)*Hypergeometric2F1[(1 + S)/2, (1/2)*(1 - M + S), 
                                      (1/2)*(3 - M + S), (-1 + M - S)/M] + M^3*(M/(1 + S))^(M/2)*
        ((1 + S)/(1 - M + S))^(M/2 + S/2)*Hypergeometric2F1[(1 + S)/2, 
                                                            (1/2)*(1 - M + S), (1/2)*(3 - M + S), (-1 + M - S)/M] + 
        2*M*S*(M/(1 + S))^(M/2)*((1 + S)/(1 - M + S))^(M/2 + S/2)*
        Hypergeometric2F1[(1 + S)/2, (1/2)*(1 - M + S), (1/2)*(3 - M + S), 
                          (-1 + M - S)/M] - M^2*S*(M/(1 + S))^(M/2)*((1 + S)/(1 - M + S))^
        (M/2 + S/2)*Hypergeometric2F1[(1 + S)/2, (1/2)*(1 - M + S), 
                                      (1/2)*(3 - M + S), (-1 + M - S)/M] - 2*M*(M/(1 + S))^(M/2)*
        ((1 + S)/(1 - M + S))^(M/2 + S/2)*Hypergeometric2F1[(1 + S)/2, 
                                                            (1/2)*(3 - M + S), (1/2)*(5 - M + S), (-1 + M - S)/M] + 
        3*M^2*(M/(1 + S))^(M/2)*((1 + S)/(1 - M + S))^(M/2 + S/2)*
        Hypergeometric2F1[(1 + S)/2, (1/2)*(3 - M + S), (1/2)*(5 - M + S), 
                          (-1 + M - S)/M] - M^3*(M/(1 + S))^(M/2)*((1 + S)/(1 - M + S))^
        (M/2 + S/2)*Hypergeometric2F1[(1 + S)/2, (1/2)*(3 - M + S), 
                                      (1/2)*(5 - M + S), (-1 + M - S)/M] - 2*M*S*(M/(1 + S))^(M/2)*
        ((1 + S)/(1 - M + S))^(M/2 + S/2)*Hypergeometric2F1[(1 + S)/2, 
                                                            (1/2)*(3 - M + S), (1/2)*(5 - M + S), (-1 + M - S)/M] + 
        M^2*S*(M/(1 + S))^(M/2)*((1 + S)/(1 - M + S))^(M/2 + S/2)*
        Hypergeometric2F1[(1 + S)/2, (1/2)*(3 - M + S), (1/2)*(5 - M + S), 
                          (-1 + M - S)/M]))/(((1 + S)/(1 - M + S))^S*
                                               (2*(-2 + M)*M*Sqrt[(-1 - S)/(-1 + M - S)]*Gamma[M/2]*
                                                  Gamma[(1/2)*(5 - M + S)])))</code></pre>
</details>
<p>But do not fear: we can recover. At the cost of an assumption about the tails of <img src="https://latex.codecogs.com/png.latex?R">. (We’re also going to assume that <img src="https://latex.codecogs.com/png.latex?h"> is bounded because it makes things ever so slightly easier, although unbounded <img src="https://latex.codecogs.com/png.latex?h"> is ok<sup>18</sup> as long as it doesn’t grow too quickly relative to <img src="https://latex.codecogs.com/png.latex?r">.)</p>
<p>We are going to make the assumption that <img src="https://latex.codecogs.com/png.latex?R%20-%20T%20%5Cmid%20R%5Cgeq%20T"> is in the domain of attraction of a generalized Pareto distribution with shape parameter <img src="https://latex.codecogs.com/png.latex?k">. A sufficient condition, due to von Mises, is that <img src="https://latex.codecogs.com/png.latex?%0A%5Clim_%7Br%5Crightarrow%20%5Cinfty%7D%20%5Cfrac%7Br%20R'(r)%7D%7B1-R(r)%7D%20=%20%5Cfrac%7B1%7D%7Bk%7D.%0A"></p>
<p>This seems like a weird condition, but it’s basically just a regularity condition at infinity. For example if <img src="https://latex.codecogs.com/png.latex?1-R(r)"> is regularly varying at infinity<sup>19</sup> and <img src="https://latex.codecogs.com/png.latex?R'(r)"> is, eventually, monotone<sup>20</sup> decreasing, then this condition holds.</p>
<p>The von Mises condition is very natural for us as <a href="https://projecteuclid.org/journals/annals-of-probability/volume-21/issue-3/Von-Mises-Conditions-Revisited/10.1214/aop/1176989120.full">Falk and Marohn (1993)</a> show that the relative error we get when approximating the tail of <img src="https://latex.codecogs.com/png.latex?R"> by a generalised Pareto density is the same as the relative error in the von Mises condition. That is if <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7BrR'(r)%7D%7B1-R(r)%7D%20=%20%5Cfrac%7B1%7D%7Bk%7D(1%20+%20%5Cmathcal%7BO%7D(r%5E%7B-%5Calpha%7D))%0A"> then <img src="https://latex.codecogs.com/png.latex?%0AR'(r)%20=%20c%20w(cr%20-%20d)(1%20+%20%5Cmathcal%7BO%7D(r%5E%7B-%5Calpha%7D)),%0A"> where <img src="https://latex.codecogs.com/png.latex?c,d"> are constants and <img src="https://latex.codecogs.com/png.latex?w"> is the density of a generalised Pareto distribution.</p>
<p>Anyway, under those two assumptions, we can swap out the density of <img src="https://latex.codecogs.com/png.latex?(R-T)%5Cmid%20R%3ET"> with its asymptotic approximation and get that, conditional on <img src="https://latex.codecogs.com/png.latex?T=%20%20r_%7BS-M+1:S%7D">, <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(H(R-T)%20%5Cmid%20R%3ET)%20=%20(k-1)%5E%7B-1%7DT.%0A"></p>
<p>Hence, the second term in the bias goes to zero if <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D%5Cleft(r_%7BS-M+1:S%7D%5Cleft(1%20-%20R(r_%7Bs-M+1:S%7D)%20-%20%5Cfrac%7BM%7D%7BS%7D%5Cright)%5Cright)%0A"> goes to zero.</p>
<p>Now this is not particularly pleasant, but it helps to recognise that even if a distribution doesn’t have finite moments, away from the extremes, its order statistics always do. This means that we can use Cauchy-Schwartz to get <img src="https://latex.codecogs.com/png.latex?%0A%5Cleft%7C%5Cmathbb%7BE%7D%5Cleft(r_%7BS-M+1:S%7D%5Cleft(1%20-%20R(r_%7Bs-M+1:S%7D)%20-%20%5Cfrac%7BM%7D%7BS%7D%5Cright)%5Cright)%5Cright%7C%20%5Cleq%5Cmathbb%7BE%7D%5Cleft(r_%7BS-M+1:S%7D%5E2%5Cright)%5E%7B1/2%7D%5Cmathbb%7BE%7D%5Cleft%5B%5Cleft(1%20-%20R(r_%7Bs-M+1:S%7D)%20-%20%5Cfrac%7BM%7D%7BS%7D%5Cright)%5E2%5Cright%5D%5E%7B1/2%7D.%0A"></p>
<p>Arguably, the most alarming term is the first one, but that can<sup>21</sup> be tamed. To do this, we lean into a result from <a href="https://projecteuclid.org/proceedings/berkeley-symposium-on-mathematical-statistics-and-probability/Proceedings-of-the-Fifth-Berkeley-Symposium-on-Mathematical-Statistics-and/Chapter/Some-contributions-to-the-theory-of-order-statistics/bsmsp/1200513012">Bickel (1967)</a> who, if you examine the proof and translate some obscurely-stated conditions and fix a typo<sup>22</sup>, you get that <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(r_%7Bk:M%7D%5E2)%20%5Cleq%20C%20k%5Cbegin%7Bpmatrix%7D%20S%20%5C%5C%20k%5Cend%7Bpmatrix%7D%20%5Cint_0%5E1%20t%5E%7Bk-2-1%7D(1-t)%5E%7BS-k-2%7D%5C,dt.%0A"> You might worry that this is going to grow too quickly. But it doesn’t. Noting that <img src="https://latex.codecogs.com/png.latex?B(n,m)%20=%20%5CGamma(n)%5CGamma(m)/%5CGamma(n+m)">, we can rewrite the upper bound in terms of the Beta function to get <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(r_%7Bk:M%7D%5E2)%20%5Cleq%20C%20%5Cfrac%7B%5CGamma(S+1)%7D%7B%5CGamma(S-3)%7D%20%5Cfrac%7B%5CGamma(k-2)%7D%7B%5CGamma(k+1)%7D%5Cfrac%7B%5CGamma(S-k-1)%7D%7B%5CGamma(S-k+1)%7D.%0A"></p>
<p>To show that this doesn’t grow too quickly, we use the identity <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5CGamma(x%20+%20a)%7D%7B%5CGamma(x%20+%20b)%7D%20%5Cpropto%20x%5E%7Ba-b%7D(1%20+%20%5Cmathcal%7BO%7D(x%5E%7B-1%7D)).%0A"> From this, it follows that <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(r_%7Bk:M%7D%5E2)%20%5Cleq%20C%20S%5E4k%5E%7B-3%7D(S-k)%5E%7B-2%7D(1+%20%5Cmathcal%7BO%7D(S%5E%7B-1%7D))(1+%20%5Cmathcal%7BO%7D(k%5E%7B-1%7D))(1+%20%5Cmathcal%7BO%7D((S+k)%5E%7B-1%7D)).%0A"> In this case, we are interested in <img src="https://latex.codecogs.com/png.latex?k%20=%20S-M+1">, so <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(r_%7Bk:M%7D%5E2)%20%5Cleq%20C%20S%5E4S%5E%7B-3%7DM%5E%7B-2%7D(1%20-%20M/S%20+%201/S)%5E%7B-3%7D(1%20-%201/M)%5E%7B-2%7D(1+%20%5Cmathcal%7BO%7D(S%5E%7B-1%7D))(1+%20%5Cmathcal%7BO%7D(S%5E%7B-1%7D))(1+%20%5Cmathcal%7BO%7D(M%5E%7B-1%7D)).%0A"></p>
<p>Hence the we get that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(r_%7Bk:M%7D%5E2)%20=%20%5Cmathcal%7BO%7D(SM%5E%7B-2%7D)">. This is increasing<sup>23</sup> in <img src="https://latex.codecogs.com/png.latex?S">, but we will see that it is not going up too fast.</p>
<p>For the second half of this shindig, we are going to attack <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D%5Cleft%5B%5Cleft(1%20-%20R(r_%7Bs-M+1:S%7D)%20-%20%5Cfrac%7BM%7D%7BS%7D%5Cright)%5E2%5Cright%5D%20=%20%5Cmathbb%7BE%7D%5Cleft%5B%5Cleft(1%20-%20R(r_%7Bs-M+1:S%7D)%5Cright)%5E2%20-%202%5Cleft(1%20-%20R(r_%7Bs-M+1:S%7D)%5Cright)%5Cfrac%7BM%7D%7BS%7D%20+%5Cleft(%5Cfrac%7BM%7D%7BS%7D%5Cright)%5E2%5Cright%5D.%0A"> A standard result<sup>24</sup> from extreme value theory is that <img src="https://latex.codecogs.com/png.latex?R(r_%7Bk:S%7D)"> has the same distribution as the <img src="https://latex.codecogs.com/png.latex?k">th order statistics from a sample of <img src="https://latex.codecogs.com/png.latex?S"> iid <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BUniform%7D(%5B0,1%5D)"> random variables. Hence<sup>25</sup>, <img src="https://latex.codecogs.com/png.latex?%0AR(r_%7BS-M+1:S%7D)%20%5Csim%20%5Ctext%7BBeta%7D(S-M+1,%20M).%0A"> If follows<sup>26</sup> that <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(1-%20R(r_%7BS-M+1:S%7D))%20=%20%5Cfrac%7BM%7D%7BS+1%7D%20=%20%5Cfrac%7BM%7D%7BS%7D%5Cfrac%7B1%7D%7B1+S%5E%7B-1%7D%7D%0A"> and <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D((1-%20R(r_%7BS-M+1:S%7D))%5E2)%20=%20%5Cfrac%7BM(M+1)%7D%7B(S+1)(S+2)%7D%20=%20%5Cfrac%7BM%5E2%7D%7BS%5E2%7D%5Cleft(%5Cfrac%7B1%20+%20M%5E%7B-1%7D%7D%7B1%20+%203S%5E%7B-1%7D%20+%202S%5E%7B-2%7D%7D%5Cright).%0A"> Adding these together and doing some asymptotic expansions, we get <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D%5Cleft%5B%5Cleft(1%20-%20R(r_%7Bs-M+1:S%7D)%20-%20%5Cfrac%7BM%7D%7BS%7D%5Cright)%5E2%5Cright%5D%20=%20%5Cfrac%7BM%5E2%7D%7BS%5E2%7D%20+%20%5Cmathcal%7BO%7D%5Cleft(%5Cfrac%7BM%7D%7BS%5E2%7D%5Cright),%0A"> which goes to zero<sup>27</sup> like <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(S%5E%7B-1%7D)"> if <img src="https://latex.codecogs.com/png.latex?M%20=%20%5Cmathcal%7BO%7D(S%5E%7B1/2%7D)">.</p>
<p>We can multiply this rate together and get that the second term in the bias is bounded above by <img src="https://latex.codecogs.com/png.latex?%0A%5Cleft%5B%5Cleft(%5Cfrac%7BS%7D%7BM%5E2%7D%20(1%20+%20%5Cmathcal%7BO%7D(M%5E%7B-1%7D%20+%20MS%5E%7B-1%7D))%5Cright)%5Cleft(%5Cfrac%7BM%5E2%7D%7BS%5E2%7D%20(1%20+%20%5Cmathcal%7BO%7D(M%5E%7B-1%7D%20+%20MS%5E%7B-1%7D)%5Cright)%5Cright%5D%5E%7B1/2%7D%20=%20S%5E%7B-1/2%7D(1%20+%20o(1)).%0A"></p>
<p>Putting all of this together we have proved the following Corollary.</p>
<div id="cor-cor1" class="theorem corollary">
<p><span class="theorem-title"><strong>Corollary 1</strong></span> Let <img src="https://latex.codecogs.com/png.latex?%5Ctheta_s">, <img src="https://latex.codecogs.com/png.latex?s%20=%201,%5Cldots,%20S"> be an iid sample from <img src="https://latex.codecogs.com/png.latex?G"> and let <img src="https://latex.codecogs.com/png.latex?r_s%20=%20r(%5Ctheta_s)%20%5Csim%20R">. Assume that</p>
<ol type="1">
<li><p><img src="https://latex.codecogs.com/png.latex?R"> is absolutely continuous and satisfies the von Mises condition<sup>28</sup> <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7BrR'(r)%7D%7B1-R(r)%7D%20=%20%5Cfrac%7B1%7D%7Bk%7D(1%20+%5Cmathcal%7BO%7D(r%5E%7B-1%7D)).%0A"></p></li>
<li><p><img src="https://latex.codecogs.com/png.latex?M%20%20=%20o(S)"></p></li>
<li><p><img src="https://latex.codecogs.com/png.latex?h"> is bounded<sup>29</sup></p></li>
</ol>
<p>Winsorized importance sampling converges in <img src="https://latex.codecogs.com/png.latex?L%5E1"> with rate of, at most, <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(MS%5E%7B-1%7D%20+%20S%5E%7B-1/2%7D)">, which is balanced when <img src="https://latex.codecogs.com/png.latex?M%20=%20%5Cmathcal%7BO%7D(S%5E%7B1/2%7D)">. Hence, WIS is<sup>30</sup> <img src="https://latex.codecogs.com/png.latex?%5Csqrt%7Bn%7D">-consistent.</p>
</div>
<section id="variance-of-winsorized-importance-sampling" class="level3">
<h3 class="anchored" data-anchor-id="variance-of-winsorized-importance-sampling">Variance of Winsorized Importance Sampling</h3>
<p>Right, that was a bit of a journey, but let’s keep going to the variance.</p>
<p>It turns out that following the route I thought I was going to follow does not end well. That lovely set of tricks breaking up the variance into two conditional terms turns out to be very very unnecessary. Which is good, because I thoroughly failed to make the argument work.</p>
<p>If you’re curious, the problem is that the random variable <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7BMr_%7BS-M+1:S%7D%7D%7BS%7D%20%5Cmathbb%7BE%7D(H%20%5Cmid%20R%20%5Cgeq%20r_%7BS-M+1:S%7D)%20=%20%5Cfrac%7BMr_%7BS-M+1:S%7D%7D%7BS(1-R(r_%7BS-M+1:S%7D))%7D%20%5Cmathbb%7BE%7D(H%201_%7BR%20%5Cgeq%20r_%7BS-M+1:S%7D%7D)%0A"> is an absolute <em>bastard</em> to bound. The problem is that <img src="https://latex.codecogs.com/png.latex?1-%20R(%7Br_%7BS-M+1:S%7D%7D)%20%5Capprox%20M/S"> and so the usual trick of bounding that truncated expectation by <img src="https://latex.codecogs.com/png.latex?%5C%7Ch%5C%7C"> or some such thing will prove that the variance is <em>finite</em> but not that it goes to zero. There is a solid chance that the Cauchy-Schwartz inequality <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7BMr_%7BS-M+1:S%7D%5E%7B1/2%7D%7D%7BS(1-R(r_%7BS-M+1:S%7D))%7D%20%5Cmathbb%7BE%7D(r_%7BS-M+1:S%7D%5E%7B1/2%7DH%201_%7BR%20%5Cgeq%20r_%7BS-M+1:S%7D%7D)%20%5Cleq%5Cfrac%7BMr_%7BS-M+1:S%7D%5E%7B1/2%7D%7D%7BS(1-R(r_%7BS-M+1:S%7D))%7DR(r_%7BS-M+1:S%7D)%5C%7Ch%5C%7C_%7BL%5E2(p)%7D%0A"> would work. But truly that is just bloody messy<sup>31</sup>.</p>
<p>So let’s do it the easy way, shall we. Fundamentally, we will use <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BV%7D%5Cleft(I_%5Ctext%7BWIS%7D%5ES%5Cright)%20%5Cleq%20%5Cmathbb%7BE%7D%5Cleft(%5BI_%5Ctext%7BWIS%7D%5ES%5D%5E2%5Cright).%0A"> Noting that we can write <img src="https://latex.codecogs.com/png.latex?I_%5Ctext%7BWIS%7D%5ES"> compactly as <img src="https://latex.codecogs.com/png.latex?%0AI_%5Ctext%7BWIS%7D%5ES%20=%20%5Cfrac%7B1%7D%7BS%7D%5Csum_%7Bs=1%7D%5ES%20h(%5Ctheta_s)%5Cmin%5C%7Br(%5Ctheta_s),%20r_%7BS-M+1:S%7D%5C%7D.%0A"> Hence, <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Cmathbb%7BE%7D%5Cleft(%5BI_%5Ctext%7BWIS%7D%5ES%5D%5E2%5Cright)%20&amp;=%20%5Cmathbb%7BE%7D_%7BT%5Csim%20r_%7BS-M+1:S%7D%7D%5Cleft%5B%5Cmathbb%7BE%7D%5Cleft(%5BI_%5Ctext%7BWIS%7D%5ES%5D%5E2%20%5Cmid%20r_%7BS-M+1:S%7D%20=%20T%5Cright)%5Cright%5D%5C%5C%0A&amp;=%5Cfrac%7B1%7D%7BS%5E2%7D%5Cmathbb%7BE%7D_%7BT%5Csim%20r_%7BS-M+1:S%7D%7D%5Cleft%5B%5Cmathbb%7BE%7D%5Cleft(H%5E2%20%5Cmin%5C%7BR%5E2,T%5E2%5C%7D%20%5Cmid%20r_%7BS-M+1:S%7D%20=%20T%5Cright)%5Cright%5D%5C%5C%0A&amp;%5Cleq%5Cfrac%7B1%7D%7BS%5E2%7D%5Cmathbb%7BE%7D_%7BT%5Csim%20r_%7BS-M+1:S%7D%7D%5Cleft%5B%5Cmathbb%7BE%7D%5Cleft(RTH%5E2%20%5Cmid%20r_%7BS-M+1:S%7D%20=%20T%5Cright)%5Cright%5D%20%5C%5C%0A&amp;%5Cleq%5Cfrac%7B1%7D%7BS%5E2%7D%5Cmathbb%7BE%7D_%7BT%5Csim%20r_%7BS-M+1:S%7D%7D%5Cleft%5BT%5C%7Ch%5C%7C_%7BL%5E2(p)%7D%5E2%5Cright%5D%0A%5Cend%7Balign*%7D"></p>
<p>This goes to zero as long as <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(r_%7BS-M+1:S%7D)%20=%20o(S%5E2)">.</p>
<p><a href="https://projecteuclid.org/proceedings/berkeley-symposium-on-mathematical-statistics-and-probability/Proceedings-of-the-Fifth-Berkeley-Symposium-on-Mathematical-Statistics-and/Chapter/Some-contributions-to-the-theory-of-order-statistics/bsmsp/1200513012">Bickel (1967)</a> shows that, noting that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(R)%20%3C%20%5Cinfty">, <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(r_%7BS-M+1:S%7D)%20%5Cleq%20C%20(S-M+1)%5Cfrac%7B%5CGamma(S+1)%5CGamma(S-M+1-1)%5CGamma(M)%7D%7B%5CGamma(S-M+1+1)%5CGamma(M+1)%5CGamma(S-1)%7D%20=%20%5Cfrac%7BS%7D%7BM%7D(1%20+%20o(1)),%0A"> and so the variance is bounded.</p>
<p>The previous argument shows that the variance is <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(M%5E%7B-1%7DS%5E%7B-1%7D)">. We can refine that if we assume the von Mises condition hold. In that case we know that <img src="https://latex.codecogs.com/png.latex?R(r)%20=%201-%20cr%5E%7B-1/k%7D%20+%20o(1)"> as <img src="https://latex.codecogs.com/png.latex?r%5Crightarrow%20%5Cinfty"> and therefore <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0AR%5Cleft(R%5E%7B-1%7D%5Cleft(1-%5Cfrac%7BM%7D%7BS%7D%5Cright)%5Cright)%20&amp;=%201-%5Cfrac%7BM%7D%7BS+1%7D%5C%5C%0A1%20-%20cR%5E%7B-1%7D%5Cleft(1-%5Cfrac%7BM%7D%7BS+1%7D%5Cright)%5E%7B-1/k%7D(1+o(1))%20&amp;=%201-%20%5Cfrac%7BM%7D%7BS+1%7D%20%5C%5C%0AR%5E%7B-1%7D%5Cleft(1-%5Cfrac%7BM%7D%7BS+1%7D%5Cright)%20&amp;=%20c%5E%7B-k%7D%5Cleft(%5Cfrac%7BM%7D%7BS+1%7D%5Cright)%5E%7B-k%7D(1%20+%20o(1)).%0A%5Cend%7Balign*%7D"> Bickel (1967) shows that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(r_%7Bk:S%7D)%20=%20R%5E%7B-1%7D(1-M/(S+1))%20+%20o(1)"> so combining this with the previous result gives a variance of <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D((M/S)%5E%7Bk-2%7D)">. If we take <img src="https://latex.codecogs.com/png.latex?M%20=%5Cmathcal%7BO%7D(S%5E%7B1/2%7D)">, this gives <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BS%7D%5E%7Bk/2-1%7D">, which is smaller than the previous bound for <img src="https://latex.codecogs.com/png.latex?k%3C1">. It’s worth noting that Hence the variance goes to zero.</p>
<p>The argument that we used here is a modification of the argument in the TIS paper. This lead to a great deal of panic: did I just make my life extremely difficult? Could I have modified the TIS proof to show the bias goes to zero? To be honest, someone might be able to, but I can’t.</p>
<p>So anyway, we’ve proved the following theorem.</p>
<div id="thm-thm2" class="theorem">
<p><span class="theorem-title"><strong>Theorem 2</strong></span> Let <img src="https://latex.codecogs.com/png.latex?%5Ctheta_s">, <img src="https://latex.codecogs.com/png.latex?s%20=%201,%5Cldots,%20S"> be an iid sample from <img src="https://latex.codecogs.com/png.latex?G"> and let <img src="https://latex.codecogs.com/png.latex?r_s%20=%20r(%5Ctheta_s)%20%5Csim%20R">. Assume that</p>
<ol type="1">
<li><p><img src="https://latex.codecogs.com/png.latex?R"> is absolutely continuous</p></li>
<li><p><img src="https://latex.codecogs.com/png.latex?M%20%5Crightarrow%20%5Cinfty"> and <img src="https://latex.codecogs.com/png.latex?M%5E%7B-1%7DS%20%5Crightarrow%200"></p></li>
<li><p><img src="https://latex.codecogs.com/png.latex?h%20%5Cin%20L%5E2(p)">.</p></li>
</ol>
<p>The variance in Winsorized importance sampling is at most <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(M%5E%7B-1%7DS)">.</p>
</div>
</section>
</section>
<section id="pareto-smoothed-importance-sampling" class="level2">
<h2 class="anchored" data-anchor-id="pareto-smoothed-importance-sampling">Pareto-smoothed importance sampling</h2>
<p>Pareto-smoothed importance sampling (or PSIS) takes the observation that the tails are approximately Pareto distributed to add some bias correction to the mix. Essentially, it works by noting that approximating <img src="https://latex.codecogs.com/png.latex?%0A(1-R(r_%7BS-M+1:S%7D))%5Cmathbb%7BE%7D(HR%20%5Cmid%20R%3Er_%7BS-M+1:S%7D)%20%5Capprox%20%5Cfrac%7B1%7D%7BS%7D%5Csum_%7Bm=1%7D%5EM%20w_m%20h_%7BS-M+m:S%7D,%0A"> where <img src="https://latex.codecogs.com/png.latex?w_m"> is the median<sup>32</sup> <img src="https://latex.codecogs.com/png.latex?m">th order statistic in an iid sample of <img src="https://latex.codecogs.com/png.latex?M"> Generalised Pareto random variables with tail parameters fitted to the distribution.</p>
<p>This is a … funky … quadrature rule. To see that, we can write <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(HR%20%5Cmid%20R%3ET)%20=%20%5Cmathbb%7BE%7D(R%20%5Cmathbb%7BE%7D(H%20%5Cmid%20R)).%0A"> If we approximate the distribution of <img src="https://latex.codecogs.com/png.latex?R%20%3E%20T"> by <img src="https://latex.codecogs.com/png.latex?%0A%5Ctilde%7BR%7D_%5Ctext%7BPSIS%7D(r)%20=%20%5Cfrac%7B1%7D%7BM%7D%5Csum_%7Bm=1%7D%5EM%201(%20w_m%3Cr)%0A"> and approximate the conditional probability by <img src="https://latex.codecogs.com/png.latex?%0A%5CPr(H%20%3C%20h%5Cmid%20R%20=%20w_m)%20%5Capprox%201(h_%7BS-M+m:S%7D%3C%20h).%0A"></p>
<p>Empirically, this is a very good choice (with the mild caveat that you need to truncate the largest expected order statistic by the observed maximum in order to avoid some variability issues). I would love to have a good analysis of why that is so, but honest I do not.</p>
<p>But, to the issue of this blog post the convergence and vanishing variance still holds. To see this, we note that <img src="https://latex.codecogs.com/png.latex?%0Aw_m%20=%20r_%7BS-M+1%7D%20%20+%20k%5E%7B-1%7D%5Csigma%5Cleft%5B%5Cleft(1-%5Cfrac%7Bj-1/2%7D%7BM%7D%5Cright)%5E%7B-k%7D%20-1%5Cright%5D.%0A"> So we are just re-weighting our tail <img src="https://latex.codecogs.com/png.latex?H"> samples by <img src="https://latex.codecogs.com/png.latex?%0A1%20+%20%5Cfrac%7B%5Csigma%7D%7Bkr_%7BS-M+1:S%7D%7D%5Cleft%5B%5Cleft(1-%5Cfrac%7Bj-1/2%7D%7BM%7D%5Cright)%5E%7B-k%7D%20-1%5Cright%5D.%0A"></p>
<p>Recalling that when <img src="https://latex.codecogs.com/png.latex?R(r)%20=%201-%20cr%5E%7B-1/k%7D(1+%20o(1))">, we had <img src="https://latex.codecogs.com/png.latex?%5Csigma%20=%20%5Cmathcal%7BO%7D(r_%7BS-M+1:S%7D)">, this term is at most <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(1%20+%20M%5E%7B-k%7D)">. This will not trouble either of our convergence proofs.</p>
<p>This leads to the following modification of our previous results.</p>
<div id="thm-thm3" class="theorem">
<p><span class="theorem-title"><strong>Theorem 3</strong></span> Let <img src="https://latex.codecogs.com/png.latex?%5Ctheta_s">, <img src="https://latex.codecogs.com/png.latex?s%20=%201,%5Cldots,%20S"> be an iid sample from <img src="https://latex.codecogs.com/png.latex?G"> and let <img src="https://latex.codecogs.com/png.latex?r_s%20=%20r(%5Ctheta_s)%20%5Csim%20R">. Assume that</p>
<ol type="1">
<li><p><img src="https://latex.codecogs.com/png.latex?R"> is absolutely continuous.</p></li>
<li><p><img src="https://latex.codecogs.com/png.latex?M%20%20=%20%5Cmathcal%7BO%7D(S%5E%7B1/2%7D)"></p></li>
<li><p><img src="https://latex.codecogs.com/png.latex?h%20%5Cin%20L%5E2(p)"></p></li>
<li><p><img src="https://latex.codecogs.com/png.latex?k"> and <img src="https://latex.codecogs.com/png.latex?%5Csigma"> are known with <img src="https://latex.codecogs.com/png.latex?%5Csigma%20=%20%5Cmathcal%7BO%7D(r_%7BS-M+1:S%7D)">.</p></li>
</ol>
<p>Pareto smoothed importance sampling converges in <img src="https://latex.codecogs.com/png.latex?L%5E1"> and its variance goes to zero and it is consistent and asymptotically unbiased.</p>
</div>
<div id="cor-cor2" class="theorem corollary">
<p><span class="theorem-title"><strong>Corollary 2</strong></span> Assume further that</p>
<ol type="1">
<li><p>R satisfies the von Mises condition<sup>33</sup> <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7BrR'(r)%7D%7B1-R(r)%7D%20=%20%5Cfrac%7B1%7D%7Bk%7D(1%20+%5Cmathcal%7BO%7D(r%5E%7B-1%7D)).%0A"></p></li>
<li><p><img src="https://latex.codecogs.com/png.latex?h"> is bounded<sup>34</sup>.</p></li>
</ol>
<p>Then the L^1 convergence occurs at a rate of of, at most, <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(S%5E%7B-1/2%7D)">. Furthermore, the variance of the PSIS estimator goes to zero at least as fast as <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(S%5E%7Bk/2-1%7D)">.</p>
</div>
<p>Hence, under these additional conditions PSIS is<sup>35</sup> <img src="https://latex.codecogs.com/png.latex?%5Csqrt%7Bn%7D">-consistent.</p>
</section>
<section id="final-thoughts" class="level2">
<h2 class="anchored" data-anchor-id="final-thoughts">Final thoughts</h2>
<p>So that’s what truncation and winsorization does to importance sampling estimates. I haven’t touched on the fairly important topic of asymptotic normality. Essentially, <a href="https://www.sciencedirect.com/science/article/pii/0304414988900312">Griffin (1988)</a>, in a fairly complex<sup>36</sup> paper that suggests that if you winsorize the product <img src="https://latex.codecogs.com/png.latex?(h(%5Ctheta_s)r(%5Ctheta_s))"> <em>and</em> winsorize it at both ends, the von Mises condition<sup>37</sup> imply that the WIS estimator is asymptotically normal.</p>
<p>Why is this important, well the same proof shows that doubly winsorized importance sampling (dWIS) applied to the vector valued function <img src="https://latex.codecogs.com/png.latex?%5Ctilde%20h(%5Ctheta)%20=%20(h(%5Ctheta),1)"> will also be asymptotically normal, which implies, via the delta method, that the <em>self normalized</em> dWIS estimator <img src="https://latex.codecogs.com/png.latex?%0AI%5ES_%5Ctext%7BSN-IS%7D%20=%20%5Cfrac%7B%5Csum_%7Bs=1%7D%5ES%5Cmax%5C%7B%5Cmin%5C%7Bh(%5Ctheta_i)%20r(%5Ctheta_i),T_%7BS-M+1:S%7D%5C%7D,%20T_%7BM:S%7D%5C%7D%7D%7B%5Csum_%7Bs=1%7D%5ES%5Cmax%5C%7B%5Cmin%5C%7Br(%5Ctheta_i),T_%7BS-M+1:S%7D%5C%7D,T_%7BM:S%7D%5C%7D%7D%0A"> is consistent, where <img src="https://latex.codecogs.com/png.latex?T_%7Bm:S%7D"> is the <img src="https://latex.codecogs.com/png.latex?m">th order statistic of <img src="https://latex.codecogs.com/png.latex?%5Cmax%5C%7Bh(%5Ctheta_s)r(%5Ctheta_s),%20r(%5Ctheta_s)%5C%7D">.</p>
<p>It is very very likely that this can be shown (perhaps under some assumptions) for something closer to the version of PSIS we use in practice. But that is an open question.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>proportional to↩︎</p></li>
<li id="fn2"><p>because <img src="https://latex.codecogs.com/png.latex?p(%5Ctheta_s)"> is very small↩︎</p></li>
<li id="fn3"><p>because <img src="https://latex.codecogs.com/png.latex?p(%5Ctheta_s)"> is a reasonable size, but <img src="https://latex.codecogs.com/png.latex?g(%5Ctheta_s)"> is tiny.↩︎</p></li>
<li id="fn4"><p>I have surreptitiously dropped the <img src="https://latex.codecogs.com/png.latex?h"> subscript because I am gay and sneaky.↩︎</p></li>
<li id="fn5"><p>That it’s parameterised by <img src="https://latex.codecogs.com/png.latex?1/k"> is an artefact of history.↩︎</p></li>
<li id="fn6"><p>We need <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(R)"> to be finite, so we need <img src="https://latex.codecogs.com/png.latex?k%3C1">.↩︎</p></li>
<li id="fn7"><p>very fucking complex↩︎</p></li>
<li id="fn8"><p>I have used that old trick of using the same letter for the CDF as the random variable when I have a lot of random variables. ↩︎</p></li>
<li id="fn9"><p>aka the tail index↩︎</p></li>
<li id="fn10"><p>This is a relevant case. But if you think a little bit about it, our problem happens when <img src="https://latex.codecogs.com/png.latex?r(%5Ctheta)"> grows <em>much</em> faster than <img src="https://latex.codecogs.com/png.latex?h(%5Ctheta)">. For example if <img src="https://latex.codecogs.com/png.latex?P%20=%20%5Coperatorname%7BExp%7D(1)"> and <img src="https://latex.codecogs.com/png.latex?G%20=%20%5Coperatorname%7BExp%7D(1/%5Clambda)"> for <img src="https://latex.codecogs.com/png.latex?%5Clambda%3E1">, then <img src="https://latex.codecogs.com/png.latex?k%20=%201-1/%5Clambda">, <img src="https://latex.codecogs.com/png.latex?r(%5Ctheta)%20=%20%5Cexp((%5Clambda-1)%5Ctheta)"> and if <img src="https://latex.codecogs.com/png.latex?%7Ch(%5Ctheta)%7C%20%3C%20%7C%5Ctheta%7C%5E%5Calpha">, then <img src="https://latex.codecogs.com/png.latex?%7Ch(%5Ctheta)%7C%20%5Cleq%20C%20%5Clog(r)%5E%5Calpha">, which is a slowly growing function.↩︎</p></li>
<li id="fn11"><p>Because the truncation depends on <img src="https://latex.codecogs.com/png.latex?S">, moving from the <img src="https://latex.codecogs.com/png.latex?S">th partial sum to the <img src="https://latex.codecogs.com/png.latex?S+1">th partial sum changes the distribution of <img src="https://latex.codecogs.com/png.latex?z_ih_ir_i">. This is exactly why the dead Russians gifted us with triangular arrays.↩︎</p></li>
<li id="fn12"><p>Also practical unbounded <img src="https://latex.codecogs.com/png.latex?h">, but it’s just easier for bounded <img src="https://latex.codecogs.com/png.latex?h">↩︎</p></li>
<li id="fn13"><p>Shut up. I know. Don’t care.↩︎</p></li>
<li id="fn14"><p>or, hell, even in a book↩︎</p></li>
<li id="fn15"><p>Straight up, though, I spent 2 days dicking around with tail bounds on sums of Bernoulli random variables for some bloody reason before I just looked at the damn formula.↩︎</p></li>
<li id="fn16"><p>Ok. I checked. And yeah. Same technique as below using Jensen in its <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(%7CX-%5Cmathbb%7BE%7D(X)%7C)%5E2%20%5Cleq%20%5Cmathbb%7BV%7D(X)">. If you put that together you get something that goes to zero like <img src="https://latex.codecogs.com/png.latex?M%5E%7B1/2%7DS%5E%7B-1%7D">, which is <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(S%5E%7B-3/4%7D)"> for our usual choice of <img src="https://latex.codecogs.com/png.latex?M">. Which confirms the suspicion that the first term in the bias goes to zero <em>much</em> faster than the second (remembering, of course, that Jensen’s inequality is notoriously loose!).↩︎</p></li>
<li id="fn17"><p>It’s Pride month↩︎</p></li>
<li id="fn18"><p>The result holds exactly if <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(H%20%5Cmid%20R=r)%20=%20%5Cmathcal%7BO%7D(%5Clog%5Ek(r))"> and with a <img src="https://latex.codecogs.com/png.latex?k"> turning up somewhere if it’s <img src="https://latex.codecogs.com/png.latex?o(r%5E%7B1/k%20-%201%7D)">.↩︎</p></li>
<li id="fn19"><p><img src="https://latex.codecogs.com/png.latex?1-R(r)%20%5Csim%20c%20r%5E%7B(-1/k)%7D%5Cmathcal%7BL(r)%7D"> for a slowly varying function (eg a power of a logarithm) <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BL%7D(r)">.↩︎</p></li>
<li id="fn20"><p>A property that implies this is that <img src="https://latex.codecogs.com/png.latex?1-R(r)"> is differentiable and <em>convex at infinity</em>, which is to say that there is some finite <img src="https://latex.codecogs.com/png.latex?r_0"> such that <img src="https://latex.codecogs.com/png.latex?R'(r)"> exists for all <img src="https://latex.codecogs.com/png.latex?r%20%5Cgeq%20r_0"> and <img src="https://latex.codecogs.com/png.latex?1-R(r)"> is a monotone function on <img src="https://latex.codecogs.com/png.latex?%5Br_0,%20%5Cinfty)">.↩︎</p></li>
<li id="fn21"><p>There’s a condition here that <img src="https://latex.codecogs.com/png.latex?S"> has to be large enough, but it’s enough if <img src="https://latex.codecogs.com/png.latex?(S-M+1)%20%3E%202">.↩︎</p></li>
<li id="fn22"><p>The first <img src="https://latex.codecogs.com/png.latex?k"> in the equation below is missing in the paper. If you miss this, you suddenly get the expected value converging to zero, which would be <em>very</em> surprising. Always sense-check the proofs, people. Even if a famous person did it in the 60s.↩︎</p></li>
<li id="fn23"><p>We need to take <img src="https://latex.codecogs.com/png.latex?M%20=%20%5Cmathcal%7BO%7D(S%5E%7B1/2%7D)"> to be able to estimate the tail index <img src="https://latex.codecogs.com/png.latex?k"> from a sample, which gives an upper bound by a constant.↩︎</p></li>
<li id="fn24"><p>Note that if <img src="https://latex.codecogs.com/png.latex?U%20%5Csim%20%5Ctext%7BUnif%7D(0,1)">, then <img src="https://latex.codecogs.com/png.latex?R%5E%7B-1%7D(U)%20%5Csim%20R">. Because this is monotone, it doesn’t change ordering of the sample↩︎</p></li>
<li id="fn25"><p>This is, incidentally, how Bickel got the upper bound on the moments. He combined this with an upper bound on the quantile function.↩︎</p></li>
<li id="fn26"><p>Save the cheerleader, save the world. Except it’s one minus a beta is still beta but with the parameters reversed.↩︎</p></li>
<li id="fn27"><p>As long as <img src="https://latex.codecogs.com/png.latex?M%20=%20o(S)">↩︎</p></li>
<li id="fn28"><p>The rate here is probably not optimal, but it will guarantee that the error in the Pareto approximation doesn’t swamp the other terms.↩︎</p></li>
<li id="fn29"><p>Or <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(h(%5Ctheta)%20%5Cmid%20r(%5Ctheta)%20=%20r)"> doesn’t grow to quickly, with some modification of the rates in the unlikely case that it grows polynomially.↩︎</p></li>
<li id="fn30"><p>almost, there’s an epsilon gap but I don’t give a shit↩︎</p></li>
<li id="fn31"><p>And girl do not get me started on messy. I ended up going down a route where I used the [inequality]((https://www.sciencedirect.com/science/article/pii/0167715288900077) <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BV%7D(g(U))%20%5Cleq%20%5Cmathbb%7BE%7D(U)%5Cint_0%5E1%5Cleft%5BF_U(u)%20-%20%5Cfrac%7B%5Cmathbb%7BE%7D(U1_%7BU%5Cleq%20u%7D)%7D%7B%5Cmathbb%7BE%7D(U)%7D%5Cright%5D%5Bg'(u)%5D%5E2%5C,du%0A"> which holds for any <img src="https://latex.codecogs.com/png.latex?U"> supported on <img src="https://latex.codecogs.com/png.latex?%5B0,1%5D"> with differentiable density. And let me tell you. If you dick around with enough beta distributions you can get something. Is it what you want? Fucking no. It is <em>a lot</em> of work, including having to differentiate the conditional expectation, and it gives you sweet bugger all.↩︎</p></li>
<li id="fn32"><p>Or, the expected within <img src="https://latex.codecogs.com/png.latex?o(S%5E%7B-1/2%7D)">↩︎</p></li>
<li id="fn33"><p>The rate here is probably not optimal, but it will guarantee that the error in the Pareto approximation doesn’t swamp the other terms.↩︎</p></li>
<li id="fn34"><p>Or <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(h(%5Ctheta)%20%5Cmid%20r(%5Ctheta)%20=%20r)"> doesn’t grow to quickly, with some modification of the rates in the unlikely case that it grows polynomially.↩︎</p></li>
<li id="fn35"><p>almost, there’s an epsilon gap but I don’t give a shit↩︎</p></li>
<li id="fn36"><p>I mean, the tools are elementary. It’s just a lot of detailed estimates and Berry-Esseen as far as the eye can see.↩︎</p></li>
<li id="fn37"><p>and more general things↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2022,
  author = {Simpson, Dan},
  title = {Tail Stabilization of Importance Sampling Etimators: {A} Bit
    of Theory},
  date = {2022-06-15},
  url = {https://dansblog.netlify.app/2022-06-03-that-psis-proof},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2022" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2022. <span>“Tail Stabilization of Importance Sampling
Etimators: A Bit of Theory.”</span> June 15, 2022. <a href="https://dansblog.netlify.app/2022-06-03-that-psis-proof">https://dansblog.netlify.app/2022-06-03-that-psis-proof</a>.
</div></div></section></div> ]]></description>
  <category>Importance sampling</category>
  <category>Computation</category>
  <category>Truncated importance sampling</category>
  <category>Windsorized importance sampling</category>
  <category>Pareto smoothed importance sampling</category>
  <category>PSIS</category>
  <guid>https://dansblog.netlify.app/posts/2022-06-03-that-psis-proof/that-psis-proof.html</guid>
  <pubDate>Tue, 14 Jun 2022 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2022-06-03-that-psis-proof/judy.JPG" medium="image"/>
</item>
<item>
  <title>Sparse matrices 6: To catch a derivative, first you’ve got to think like a derivative</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2022-05-20-to-catch-a-derivative-first-youve-got-to-think-like-a-derivative/to-catch-a-derivative-first-youve-got-to-think-like-a-derivative.html</link>
  <description><![CDATA[ 





<p>Welcome to part six!!! of our ongoing series on making sparse linear algebra differentiable in JAX with the eventual hope to be able to do some <a href="https://dansblog.netlify.app/posts/2022-03-22-a-linear-mixed-effects-model/">cool statistical shit</a>. We are <em>nowhere near done</em>.</p>
<p><a href="https://dansblog.netlify.app/posts/2022-05-14-sparse4-some-primatives/">Last time</a>, we looked at making JAX primitives. We built four of them. Today we are going to implement the corresponding differentiation rules! For three<sup>1</sup> of them.</p>
<p>So strap yourselves in. This is gonna be detailed.</p>
<p>If you’re interested in the code<sup>2</sup>, the git repo for this post is linked at the bottom and in there you will find a folder with the python code in a python file.</p>
<section id="she-is-beauty-and-she-is-grace.-she-is-queen-of-50-states.-she-is-elegance-and-taste.-she-is-miss-autodiff" class="level2">
<h2 class="anchored" data-anchor-id="she-is-beauty-and-she-is-grace.-she-is-queen-of-50-states.-she-is-elegance-and-taste.-she-is-miss-autodiff">She is beauty and she is grace. She is queen of 50 states. She is elegance and taste. She is miss autodiff</h2>
<p>Derivatives are computed in JAX through the glory and power of automatic differentiation. If you came to this blog hoping for a great description of how autodiff works, I am terribly sorry but I absolutely do not have time for that. Might I suggest google? Or maybe flick through <a href="https://arxiv.org/abs/1811.05031">this survey by Charles Margossian.</a>.</p>
<p>The most important thing to remember about algorithmic differentiation is that it is <em>not</em> symbolic differentiation. That is, it does not create the functional form of the derivative of the function and compute that. Instead, it is a system for cleverly composing derivatives in each bit of the program to compute the <em>value</em> of the derivative of the function.</p>
<p>But for that to work, we need to implement those clever little mini-derivatives. In particular, every function <img src="https://latex.codecogs.com/png.latex?f(%5Ccdot):%20%5Cmathbb%7BR%7D%5En%20%5Crightarrow%20%5Cmathbb%7BR%7D%5Em"> needs to have a function to compute the corresponding Jacobian-vector product <img src="https://latex.codecogs.com/png.latex?%0A(%5Ctheta,%20v)%20%5Crightarrow%20J(%5Ctheta)%20v,%0A"> where the <img src="https://latex.codecogs.com/png.latex?n%20%5Ctimes%20m"> matrix <img src="https://latex.codecogs.com/png.latex?J(%5Ctheta)"> has entries <img src="https://latex.codecogs.com/png.latex?%0AJ(%5Ctheta)_%7Bij%7D%20=%20%5Cfrac%7B%5Cpartial%20f_j%20%7D%7B%5Cpartial%20%5Ctheta_j%7D.%0A"></p>
<p>Ok. So let’s get onto this. We are going to derive and implement some Jacobian-vector products. And all of the assorted accoutrement. And by crikey. We are going to do it all in a JAX-traceable way.</p>
</section>
<section id="jvp-number-one-the-linear-solve." class="level2">
<h2 class="anchored" data-anchor-id="jvp-number-one-the-linear-solve.">JVP number one: The linear solve.</h2>
<p>The first of the derivatives that we need to work out is the derivative of a linear solve <img src="https://latex.codecogs.com/png.latex?A%5E%7B-1%7Db">. Now, intrepid readers, the obvious thing to do is look the damn derivative up. You get exactly no hero points for computing it yourself.</p>
<p>But I’m not you, I’m a dickhead.</p>
<p>So I’m going to derive it. I could pretend there are reasons<sup>3</sup>, but that would just be lying. I’m doing it because I can.</p>
<p>Beyond the obvious fun of working out a matrix derivative from first principles, this is fun because we have <em>two</em> arguments instead of just one. Double the fun.</p>
<p>And we really should make sure the function is differentiated with respect to every reasonable argument. Why? Because if you write code other people might use, you don’t get to control how they use it (or what they will email you about). So it’s always good practice to limit surprises (like a function not being differentiable wrt some argument) to cases<sup>4</sup> where it absolutely necessary. This reduces the emails.</p>
<p>To that end, let’s take an arbitrary SPD matrix <img src="https://latex.codecogs.com/png.latex?A"> with a <em>fixed</em> sparsity pattern. Let’s take another symmetric matrix <img src="https://latex.codecogs.com/png.latex?%5CDelta"> with <em>the same sparsity pattern</em> and assume that <img src="https://latex.codecogs.com/png.latex?%5CDelta"> is small enough<sup>5</sup> that <img src="https://latex.codecogs.com/png.latex?A%20+%20%5CDelta"> is still symmetric positive definite. We also need a vector <img src="https://latex.codecogs.com/png.latex?%5Cdelta"> with a small <img src="https://latex.codecogs.com/png.latex?%5C%7C%5Cdelta%5C%7C">.</p>
<p>Now let’s get algebraing. <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Af(A%20+%20%5CDelta,%20b%20+%20%5Cdelta)%20&amp;=%20(A+%5CDelta)%5E%7B-1%7D(b%20+%20%5Cdelta)%20%5C%5C%0A&amp;=%20(I%20+%20A%5E%7B-1%7D%5CDelta)%5E%7B-1%7DA%5E%7B-1%7D(b%20+%20%5Cdelta)%20%5C%5C%0A&amp;=%20(I%20-%20A%5E%7B-1%7D%5CDelta%20+%20o(%5C%7C%5CDelta%5C%7C))A%5E%7B-1%7D(b%20+%20%5Cdelta)%20%5C%5C%0A&amp;=%20A%5E%7B-1%7Db%20+%20A%5E%7B-1%7D(%5Cdelta%20-%20%5CDelta%20A%5E%7B-1%7Db%20)%20+%20o(%5C%7C%5CDelta%5C%7C%20+%20%5C%7C%5Cdelta%5C%7C)%0A%5Cend%7Balign*%7D"></p>
<p>Easy<sup>6</sup> as.</p>
<p>We’ve actually calculated the derivative now, but it’s a little more work to recognise it.</p>
<p>To do that, we need to remember the practical definition of the Jacobian of a function <img src="https://latex.codecogs.com/png.latex?f(x)"> that takes an <img src="https://latex.codecogs.com/png.latex?n">-dimensional input and produces an <img src="https://latex.codecogs.com/png.latex?m">-dimensional output. It is the <img src="https://latex.codecogs.com/png.latex?n%20%5Ctimes%20m"> matrix <img src="https://latex.codecogs.com/png.latex?J_f(x)"> such that <img src="https://latex.codecogs.com/png.latex?%0Af(x%20+%20%5Cdelta)%20%20=%20f(x)%20+%20J_f(x)%5Cdelta%20+%20o(%5C%7C%5Cdelta%5C%7C).%0A"></p>
<p>The formulas further simplify if we write <img src="https://latex.codecogs.com/png.latex?c%20=%20A%5E%7B-1%7Db">. Then, if we want the Jacobian-vector product for the first argument, it is <img src="https://latex.codecogs.com/png.latex?%0A-A%5E%7B-1%7D%5CDelta%20c,%0A"> while the Jacobian-vector product for the second argument is <img src="https://latex.codecogs.com/png.latex?%0AA%5E%7B-1%7D%5Cdelta.%0A"></p>
<p>The only wrinkle in doing this is we need to remember that we are only storing the lower triangle of <img src="https://latex.codecogs.com/png.latex?A">. Because we need to represent <img src="https://latex.codecogs.com/png.latex?%5CDelta"> the same way, it is represented as a vector <code>Delta_x</code> that contains only the lower triangle of <img src="https://latex.codecogs.com/png.latex?%5CDelta">. So we need to make sure we remember to form the <em>whole</em> matrix before we do the matrix-vector product <img src="https://latex.codecogs.com/png.latex?%5CDelta%20c">!</p>
<p>But otherwise, the implementation is going to be pretty straightforward. The Jacobian-vector product costs one additional linear solve (beyond the one needed to compute the value <img src="https://latex.codecogs.com/png.latex?c%20=%20A%5E%7B-1%7Db">).</p>
<p>In the language of JAX (and autodiff in general), we refer to <img src="https://latex.codecogs.com/png.latex?%5CDelta"> and <img src="https://latex.codecogs.com/png.latex?%5Cdelta"> as <em>tangent vectors</em>. In search of a moderately coherent naming convention, we are going to refer to the tangent associated with the variable <code>x</code> as <code>xt</code>.</p>
<p>So let’s implement this. Remember: it needs<sup>7</sup> to be JAX traceable.</p>
</section>
<section id="primitive-two-the-triangular-solve" class="level2">
<h2 class="anchored" data-anchor-id="primitive-two-the-triangular-solve">Primitive two: The triangular solve</h2>
<p>For some sense of continuity, we are going to keep the naming of the primitives from the last blog post, but we are <em>not</em> going to attack them in the same order. Why not? Because we work in order of complexity.</p>
<p>So first off we are going to do the triangular solve. As I have yet to package up the code (I promise, that will happen next<sup>8</sup>), I’m just putting it here under the fold.</p>
<details>
<summary>
The primal implementation
</summary>
<div class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> scipy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> sparse</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> jnp</span>
<span id="cb1-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> core</span>
<span id="cb1-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax._src <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> abstract_arrays</span>
<span id="cb1-6"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> core</span>
<span id="cb1-7"></span>
<span id="cb1-8">sparse_triangular_solve_p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> core.Primitive(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sparse_triangular_solve"</span>)</span>
<span id="cb1-9"></span>
<span id="cb1-10"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_triangular_solve(L_indices, L_indptr, L_x, b, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>, transpose: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">bool</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>):</span>
<span id="cb1-11">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""A JAX traceable sparse  triangular solve"""</span></span>
<span id="cb1-12">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_triangular_solve_p.bind(L_indices, L_indptr, L_x, b, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> transpose)</span>
<span id="cb1-13"></span>
<span id="cb1-14"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@sparse_triangular_solve_p.def_impl</span></span>
<span id="cb1-15"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_triangular_solve_impl(L_indices, L_indptr, L_x, b, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>):</span>
<span id="cb1-16">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""The implementation of the sparse triangular solve. This is not JAX traceable."""</span></span>
<span id="cb1-17">  L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.csc_array((L_x, L_indices, L_indptr)) </span>
<span id="cb1-18">  </span>
<span id="cb1-19">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> L.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb1-20">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> L.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> b.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb1-21">  </span>
<span id="cb1-22">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> transpose:</span>
<span id="cb1-23">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse.linalg.spsolve_triangular(L.T, b, lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb1-24">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span>:</span>
<span id="cb1-25">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse.linalg.spsolve_triangular(L.tocsr(), b, lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb1-26"></span>
<span id="cb1-27"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@sparse_triangular_solve_p.def_abstract_eval</span></span>
<span id="cb1-28"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_triangular_solve_abstract_eval(L_indices, L_indptr, L_x, b, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>):</span>
<span id="cb1-29">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> L_indices.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_x.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb1-30">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> b.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_indptr.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb1-31">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> abstract_arrays.ShapedArray(b.shape, b.dtype)</span></code></pre></div>
</div>
</details>
<section id="the-jacobian-vector-product" class="level3">
<h3 class="anchored" data-anchor-id="the-jacobian-vector-product">The Jacobian-vector product</h3>
<div class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax._src <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> ad_util</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax.interpreters <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> ad</span>
<span id="cb2-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> lax</span>
<span id="cb2-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax.experimental <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> sparse <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> jsparse</span>
<span id="cb2-5"></span>
<span id="cb2-6"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_triangular_solve_value_and_jvp(arg_values, arg_tangent, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>, transpose):</span>
<span id="cb2-7">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb2-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  A jax-traceable jacobian-vector product. In order to make it traceable, </span></span>
<span id="cb2-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  we use the experimental sparse CSC matrix in JAX.</span></span>
<span id="cb2-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  </span></span>
<span id="cb2-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  Input:</span></span>
<span id="cb2-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    arg_values:   A tuple of (L_indices, L_indptr, L_x, b) that describe</span></span>
<span id="cb2-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                  the triangular matrix L and the rhs vector b</span></span>
<span id="cb2-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    arg_tangent:  A tuple of tangent values (same lenght as arg_values).</span></span>
<span id="cb2-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                  The first two values are nonsense - we don't differentiate</span></span>
<span id="cb2-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                  wrt integers!</span></span>
<span id="cb2-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    transpose:    (boolean) If true, solve L^Tx = b. Otherwise solve Lx = b.</span></span>
<span id="cb2-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  Output:         A tuple containing the maybe_transpose(L)^{-1}b and the corresponding</span></span>
<span id="cb2-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                  Jacobian-vector product.</span></span>
<span id="cb2-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  """</span></span>
<span id="cb2-21">  L_indices, L_indptr, L_x, b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> arg_values</span>
<span id="cb2-22">  _, _, L_xt, bt <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> arg_tangent</span>
<span id="cb2-23">  value <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve(L_indices, L_indptr, L_x, b, transpose<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>transpose)</span>
<span id="cb2-24">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(bt) <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> ad.Zero <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">and</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(L_xt) <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> ad.Zero:</span>
<span id="cb2-25">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># I legit do not think this ever happens. But I'm honestly not sure.</span></span>
<span id="cb2-26">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I have arrived!"</span>)</span>
<span id="cb2-27">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> value, lax.zeros_like_array(value) </span>
<span id="cb2-28">  </span>
<span id="cb2-29">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(L_xt) <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> ad.Zero:</span>
<span id="cb2-30">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># L is variable</span></span>
<span id="cb2-31">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> transpose:</span>
<span id="cb2-32">      Delta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jsparse.CSC((L_xt, L_indices, L_indptr), shape <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (b.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>], b.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])).transpose()</span>
<span id="cb2-33">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span>:</span>
<span id="cb2-34">      Delta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jsparse.CSC((L_xt, L_indices, L_indptr), shape <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (b.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>], b.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]))</span>
<span id="cb2-35"></span>
<span id="cb2-36">    jvp_Lx <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve(L_indices, L_indptr, L_x, Delta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> value, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> transpose) </span>
<span id="cb2-37">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span>:</span>
<span id="cb2-38">    jvp_Lx <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.zeros_like_array(value) </span>
<span id="cb2-39"></span>
<span id="cb2-40">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(bt) <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> ad.Zero:</span>
<span id="cb2-41">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># b is variable</span></span>
<span id="cb2-42">    jvp_b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve(L_indices, L_indptr, L_x, bt, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> transpose)</span>
<span id="cb2-43">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span>:</span>
<span id="cb2-44">    jvp_b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.zeros_like_array(value)</span>
<span id="cb2-45"></span>
<span id="cb2-46">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> value, jvp_b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> jvp_Lx</span>
<span id="cb2-47"></span>
<span id="cb2-48">ad.primitive_jvps[sparse_triangular_solve_p] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve_value_and_jvp</span></code></pre></div>
</div>
<p>Before we see if this works, let’s first have talk about the structure of the function I just wrote. Generally speaking, we want a function that takes in the primals and tangents at tuples and then returns the value and the<sup>9</sup> Jacobian-vector product.</p>
<p>The main thing you will notice in the code is that there is <em>a lot</em> of checking for <code>ad.Zero</code>. This is a special type defined in JAX that is, essentially, telling the autodiff system that we are not differentiating wrt that variable. This is different to a tangent that just happens to be numerically equal to zero. Any code for a Jacobian-vector product needs to handle this special value.</p>
<p>As we have two arguments, we have 3 interesting options:</p>
<ol type="1">
<li><p>Both <code>L_xt</code> and <code>bt</code> are <code>ad.Zero</code>: This means the function is a constant and the derivative is zero. I am fairly certain that we do not need to manually handle this case, but because I don’t know and I do not like surprises, it’s in there.</p></li>
<li><p><code>L_xt</code> is <em>not</em> <code>ad.Zero</code>: This means that we need to differentiate wrt the matrix. In this case we need to compute <img src="https://latex.codecogs.com/png.latex?%5CDelta%20c"> or <img src="https://latex.codecogs.com/png.latex?%5CDelta%5ET%20c">, depending on the <code>transpose</code> argument. In order to do this, I used the <code>jax.experimental.sparse.CSC</code> class, which has some very limited sparse matrix support (basically matrix-vector products). This is <em>extremely</em> convenient because it means I don’t need to write the matrix-vector product myself!</p></li>
<li><p><code>bt</code> is <em>not</em> <code>ad.Zero</code>: This means that we need to differentiate wrt the rhs vector. This part of the formula is pretty straightforward: just an application of the primal.</p></li>
</ol>
<p>In the case that either <code>L_xt</code> or <code>bt</code> are <code>ad.Zero</code>, we simply set the corresponding contribution to the jvp to zero.</p>
<p>It’s worth saying that you can bypass all of this <code>ad.Zero</code> logic by writing separate functions for the JVP contribution from each input and then chaining them together using<sup>10</sup> <code>ad.defjvp2()</code> to <a href="https://github.com/google/jax/blob/41417d70c03b6089c93a42325111a0d8348c2fa3/jax/_src/lax/linalg.py#L791">chain them together</a>. This is what the <code>lax.linalg.triangular_solve()</code> implementation does.</p>
<p>So why didn’t I do this? I avoided this because in the other primitives I have to implement, there are expensive computations (like Cholesky factorisations) that I want to share between the primal and the various tangent calculations. The <code>ad.defjvp</code> frameworks don’t allow for that. So I decided not to demonstrate/learn two separate patterns.</p>
</section>
<section id="transposition" class="level3">
<h3 class="anchored" data-anchor-id="transposition">Transposition</h3>
<p>Now I’ve never actively wanted a Jacobian-vector product in my whole life. I’m sorry. I want a gradient. Gimme a gradient. I am the Veruca Salt of gradients.</p>
<p>In may autodiff systems, if you want<sup>11</sup> a gradient, you need to implement vector-Jacobian products<sup>12</sup> explicitly.</p>
<p>One of the odder little innovations in JAX is that instead of forcing you to implement this as well<sup>13</sup>, you only need to implement half of it.</p>
<p>You see, some clever analysis that, as far as I far as I can tell<sup>14</sup>, is detailed in <a href="https://arxiv.org/abs/2204.10923">this paper</a> shows that you only need to form explicit vector-Jacobian products for the structurally linear arguments of the function.</p>
<p>In JAX (and maybe elsewhere), this is known as a <em>transposition rule</em>. The combination of a transopition rule and a JAX-traceable Jacobian-vector product is enough for JAX to compute all of the directional derivatives and gradients we could ever hope for.</p>
<p>As far as I understand, it is all about functions that are <em>structurally linear</em> in some arguments. For instance, if <img src="https://latex.codecogs.com/png.latex?A(x)"> is a matrix-valued function and <img src="https://latex.codecogs.com/png.latex?x"> and <img src="https://latex.codecogs.com/png.latex?y"> are vectors, then the function <img src="https://latex.codecogs.com/png.latex?%0Af(x,%20y)%20=%20A(x)y%20+%20g(x)%0A"> is structurally linear in <img src="https://latex.codecogs.com/png.latex?y"> in the sense that for every fixed value of <img src="https://latex.codecogs.com/png.latex?x">, the function <img src="https://latex.codecogs.com/png.latex?%0Af_x(y)%20=%20A(x)%20y%20+%20g(x)%0A"> is linear in <img src="https://latex.codecogs.com/png.latex?y">. The resulting transpositon rule is then</p>
<div class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> f_transpose(x, y):</span>
<span id="cb3-2">  Ax <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A(x)</span>
<span id="cb3-3">  gx <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> g(x)</span>
<span id="cb3-4">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, Ax.T <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> gx)</span></code></pre></div>
</div>
<p>The first element of the return is <code>None</code> because <img src="https://latex.codecogs.com/png.latex?f(x,y)"> is not<sup>15</sup> structurally linear in <img src="https://latex.codecogs.com/png.latex?x"> so there is nothing to transpose. The second element simply takes the matrix in the linear function and transposes it.</p>
<p>If you know anything about autodiff, you’ll think “this doesn’t <em>feel</em> like enough” and it’s not. JAX deals with the non-linear part of <img src="https://latex.codecogs.com/png.latex?f(x,y)"> by tracing the evaluation tree for its Jacobian-vector product and … manipulating<sup>16</sup> it.</p>
<p>We already built the abstract evaluation function last time around, so the tracing part can be done. All we need is the transposition rule.</p>
<p>The linear solve <img src="https://latex.codecogs.com/png.latex?f(A,%20b)%20=%20A%5E%7B-1%7Db"> is non-linear in the first argument but linear in the second argument. So we only need to implement <img src="https://latex.codecogs.com/png.latex?%0AJ%5ET_b(A,b)w%20=%20A%5E%7B-T%7Dw,%0A"> where the subscript <img src="https://latex.codecogs.com/png.latex?b"> indicates we’re only computing the Jacobian wrt <img src="https://latex.codecogs.com/png.latex?b">.</p>
<p>Initially, I struggled to work out what needed to be implemented here. The thing that clarified the process for me was looking at JAX’s <a href="https://github.com/google/jax/blob/41417d70c03b6089c93a42325111a0d8348c2fa3/jax/_src/lax/linalg.py#L747">internal implementation</a> of the Jacobian-vector product for a dense matrix. From there, I understood what this had to look like for a vector-valued function and this is the result.</p>
<div class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_triangular_solve_transpose_rule(cotangent, L_indices, L_indptr, L_x, b, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>, transpose):</span>
<span id="cb4-2">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb4-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  Transposition rule for the triangular solve. </span></span>
<span id="cb4-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  Translated from here https://github.com/google/jax/blob/41417d70c03b6089c93a42325111a0d8348c2fa3/jax/_src/lax/linalg.py#L747.</span></span>
<span id="cb4-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  Inputs:</span></span>
<span id="cb4-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    cotangent: Output cotangent (aka adjoint). (produced by JAX)</span></span>
<span id="cb4-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    L_indices, L_indptr, L_x: Represenation of sparse matrix. L_x should be concrete</span></span>
<span id="cb4-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    b: The right hand side. Must be an jax.interpreters.ad.UndefinedPrimal</span></span>
<span id="cb4-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    transpose: (boolean) True: solve $L^Tx = b$. False: Solve $Lx = b$.</span></span>
<span id="cb4-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  Output:</span></span>
<span id="cb4-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    A 4-tuple with the adjoints (None, None, None, b_adjoint)</span></span>
<span id="cb4-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  """</span></span>
<span id="cb4-13">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> ad.is_undefined_primal(L_x) <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">and</span> ad.is_undefined_primal(b)</span>
<span id="cb4-14">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(cotangent) <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> ad_util.Zero:</span>
<span id="cb4-15">    cot_b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ad_util.Zero(b.aval)</span>
<span id="cb4-16">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span>:</span>
<span id="cb4-17">    cot_b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve(L_indices, L_indptr, L_x, cotangent, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> transpose)</span>
<span id="cb4-18">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, cot_b</span>
<span id="cb4-19"></span>
<span id="cb4-20">ad.primitive_transposes[sparse_triangular_solve_p] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve_transpose_rule</span></code></pre></div>
</div>
<p>If this doesn’t make a lot of sense to you, that’s because it’s confusing.</p>
<p>One way to think of it is in terms of the more ordinary notation. Mike Giles has <a href="https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf">a classic paper</a> that covers these results for basic linear algebra. The idea is to imagine that, as part of your larger program, you need to compute <img src="https://latex.codecogs.com/png.latex?c%20=%20A%5E%7B-1%7Db">.</p>
<p>Forward-mode autodiff computes the <em>sensitivity</em> of <img src="https://latex.codecogs.com/png.latex?c">, usually denoted <img src="https://latex.codecogs.com/png.latex?%5Cdot%20c"> from the sensitivies <img src="https://latex.codecogs.com/png.latex?%5Cdot%20A"> and <img src="https://latex.codecogs.com/png.latex?%5Cdot%20b">. These have already been computed. The formula in Giles is <img src="https://latex.codecogs.com/png.latex?%0A%5Cdot%20c%20=%20A%5E%7B-1%7D(%5Cdot%20b%20-%20%5Cdot%20A%20c).%0A"> The canny reader will recognise this as exactly<sup>17</sup> the formula for the Jacobian-vector product.</p>
<p>So what does reverse-mode autodiff do? Well it moves through the program in the other direction. So instead of starting with the sensitivities <img src="https://latex.codecogs.com/png.latex?%5Cdot%20A"> and <img src="https://latex.codecogs.com/png.latex?%5Cdot%20b"> already computed, we instead start with the<sup>18</sup> <em>adjoint sensitivity</em> <img src="https://latex.codecogs.com/png.latex?%5Cbar%20c">. Our aim is to compute <img src="https://latex.codecogs.com/png.latex?%5Cbar%20A"> and <img src="https://latex.codecogs.com/png.latex?%5Cbar%20b"> from <img src="https://latex.codecogs.com/png.latex?%5Cbar%20c">.</p>
<p>The details of how to do this are<sup>19</sup> <em>beyond the scope</em>, but without tooooooo much effort you can show that <img src="https://latex.codecogs.com/png.latex?%0A%5Cbar%20b%20=%20A%5E%7B-T%7D%20%5Cbar%20c,%0A"> which you should recognise as the equation that was just implemented.</p>
<p>The thing that we <em>do not</em> have to implement in JAX is the other adjoint that, for dense matrices<sup>20</sup>, is <img src="https://latex.codecogs.com/png.latex?%0A%5Cbar%7BA%7D%20=%20-%5Cbar%7Bb%7Dc%5ET.%0A"> Through the healing power of … something?—Truly I do not know.— JAX can work that bit out itself. woo.</p>
</section>
<section id="testing-the-numerical-implementation-of-the-jacobian-vector-product" class="level3">
<h3 class="anchored" data-anchor-id="testing-the-numerical-implementation-of-the-jacobian-vector-product">Testing the numerical implementation of the Jacobian-vector product</h3>
<p>So let’s see if this works. I’m not going to lie, I’m flying by the seat of my pants here. I’m not super familiar with the JAX internals, so I have written a lot of test cases. You may wish to skip this part. But rest assured that almost every single one of these cases was useful to me working out how this thing actually worked!</p>
<div class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb5-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> make_matrix(n):</span>
<span id="cb5-2">    one_d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.diags([[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>n, [<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)], [<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb5-3">    A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (sparse.kronsum(one_d, one_d) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> sparse.eye(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>n)).tocsc()</span>
<span id="cb5-4">    A_lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.tril(A, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">format</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"csc"</span>)</span>
<span id="cb5-5">    A_index <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_lower.indices</span>
<span id="cb5-6">    A_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_lower.indptr</span>
<span id="cb5-7">    A_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_lower.data</span>
<span id="cb5-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (A_index, A_indptr, A_x, A)</span>
<span id="cb5-9"></span>
<span id="cb5-10">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span></code></pre></div>
</div>
<p>This is the same test case as the last blog. We will just use the lower triangle of <img src="https://latex.codecogs.com/png.latex?A"> as the test matrix.</p>
<p>First things first, let’s check out the numerical implementation of the function. We will do that by comparing the implemented Jacobian-vector product with the <em>definition</em> of the Jacobian-vector product (aka the forward<sup>21</sup> difference approximation).</p>
<p>There are lots of things that we could do here to turn these into <em>actual</em> tests. For instance, the test suite inside JAX has a lot of nice convenience functions for checking implementations of derivatives. But I went with homespun because that was how I was feeling.</p>
<p>You’ll also notice that I’m using random numbers here, which is fine for a blog. Not so fine for a test that you don’t want to be potentially<sup>22</sup> flaky.</p>
<p>The choice of <code>eps = 1e-4</code> is roughly<sup>23</sup> because it’s the square root of the single precision machine epsilon<sup>24</sup>. A very rough back of the envelope calculation for the forward difference approximation to the derivative shows that the square root of the machine epislon is about the size you want your perturbation to be.</p>
<div class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb6-1">b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.standard_normal(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb6-2"></span>
<span id="cb6-3">bt <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.standard_normal(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb6-4">bt <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/=</span> np.linalg.norm(bt)</span>
<span id="cb6-5"></span>
<span id="cb6-6">A_xt <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.standard_normal(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_x))</span>
<span id="cb6-7">A_xt <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/=</span> np.linalg.norm(A_xt)</span>
<span id="cb6-8"></span>
<span id="cb6-9">arg_values <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (A_indices, A_indptr, A_x, b )</span>
<span id="cb6-10"></span>
<span id="cb6-11">arg_tangent_A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, A_xt, ad.Zero(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(b)))</span>
<span id="cb6-12">arg_tangent_b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, ad.Zero(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(A_xt)), bt)</span>
<span id="cb6-13">arg_tangent_Ab <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, A_xt, bt)</span>
<span id="cb6-14"></span>
<span id="cb6-15">p, t_A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve_value_and_jvp(arg_values, arg_tangent_A, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb6-16">_, t_b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve_value_and_jvp(arg_values, arg_tangent_b, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb6-17">_, t_Ab <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve_value_and_jvp(arg_values, arg_tangent_Ab, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb6-18">pT, t_AT <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve_value_and_jvp(arg_values, arg_tangent_A, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb6-19">_, t_bT <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve_value_and_jvp(arg_values, arg_tangent_b, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb6-20"></span>
<span id="cb6-21">eps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1e-4</span></span>
<span id="cb6-22">tt_A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (sparse_triangular_solve(A_indices, A_indptr, A_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> eps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> A_xt, b) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> p) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>eps</span>
<span id="cb6-23">tt_b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (sparse_triangular_solve(A_indices, A_indptr, A_x, b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> eps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> bt) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> p) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> eps</span>
<span id="cb6-24">tt_Ab <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (sparse_triangular_solve(A_indices, A_indptr, A_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> eps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> A_xt, b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> eps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> bt) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> p) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> eps</span>
<span id="cb6-25">tt_AT <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (sparse_triangular_solve(A_indices, A_indptr, A_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> eps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> A_xt, b, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> pT) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> eps</span>
<span id="cb6-26">tt_bT <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (sparse_triangular_solve(A_indices, A_indptr, A_x, b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> eps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> bt, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> pT) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> eps</span>
<span id="cb6-27"></span>
<span id="cb6-28"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"""</span></span>
<span id="cb6-29"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Transpose = False:</span></span>
<span id="cb6-30"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Error A varying: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(t_A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> tt_A)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb6-31"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Error b varying: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(t_b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> tt_b)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb6-32"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Error A and b varying: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(t_Ab <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> tt_Ab)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb6-33"></span>
<span id="cb6-34"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Transpose = True:</span></span>
<span id="cb6-35"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Error A varying: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(t_AT <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> tt_AT)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb6-36"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Error b varying: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(t_bT <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> tt_bT)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb6-37"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Transpose = False:
  Error A varying:  1.08e-07
  Error b varying:  0.00e+00
  Error A and b varying:  4.19e-07

Transpose = True:
  Error A varying:  1.15e-07
  Error b varying:  0.00e+00
</code></pre>
</div>
</div>
<p>Brilliant! Everythign correct withing single precision!</p>
</section>
<section id="checking-on-the-plumbing" class="level3">
<h3 class="anchored" data-anchor-id="checking-on-the-plumbing">Checking on the plumbing</h3>
<p>Making the numerical implementation work is only half the battle. We also have to make it work <em>in the context of JAX</em>.</p>
<p>Now I would be lying if I pretended this process went smoothly. But the first time is for experience. It’s mostly a matter of just reading the documentation carefully and going through similar examples that have already been implemented.</p>
<p>And testing. I learnt how this was supposed to work by testing it.</p>
<p>(For full disclosure, I also wrote a big block f-string in the <code>sparse_triangular_solve()</code> function at one point that told me the types, shapes, and what <code>transpose</code> was, which was how I worked out that my code was breaking because I forgot the first to <code>None</code> outputs in the transposition rule. When it doubt, print shit.)</p>
<p>As you will see from my testing code, I was not going for elegance. I was running the damn permutations. If you’re looking for elegance, look elsewhere.</p>
<div class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb8-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> jvp, grad</span>
<span id="cb8-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> scipy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> jsp</span>
<span id="cb8-3"></span>
<span id="cb8-4"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> f(theta):</span>
<span id="cb8-5">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(A_x)</span>
<span id="cb8-6">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Ax_theta.at[A_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>]].add(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb8-7">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Ax_theta.at[A_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>]].add(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb8-8">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.ones(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb8-9">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_triangular_solve(A_indices, A_indptr, Ax_theta, b, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb8-10"></span>
<span id="cb8-11"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> f_jax(theta):</span>
<span id="cb8-12">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(sparse.tril(A).todense())</span>
<span id="cb8-13">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Ax_theta.at[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>].add(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb8-14">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Ax_theta.at[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>].add(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb8-15">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.ones(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb8-16">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> jsp.linalg.solve_triangular(Ax_theta, b, lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, trans <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"T"</span>)</span>
<span id="cb8-17"></span>
<span id="cb8-18"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> g(theta):</span>
<span id="cb8-19">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(A_x)</span>
<span id="cb8-20">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.ones(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb8-21">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> b.at[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb8-22">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> b.at[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">51</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb8-23">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_triangular_solve(A_indices, A_indptr, Ax_theta, b, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb8-24"></span>
<span id="cb8-25"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> g_jax(theta):</span>
<span id="cb8-26">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(sparse.tril(A).todense())</span>
<span id="cb8-27">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.ones(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb8-28">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> b.at[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb8-29">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> b.at[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">51</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb8-30">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> jsp.linalg.solve_triangular(Ax_theta, b, lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, trans <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"T"</span>)</span>
<span id="cb8-31"></span>
<span id="cb8-32"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> h(theta):</span>
<span id="cb8-33">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(A_x)</span>
<span id="cb8-34">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Ax_theta.at[A_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>]].add(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]) </span>
<span id="cb8-35">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.ones(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb8-36">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> b.at[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">51</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb8-37">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_triangular_solve(A_indices, A_indptr, Ax_theta, b, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb8-38"></span>
<span id="cb8-39"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> h_jax(theta):</span>
<span id="cb8-40">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(sparse.tril(A).todense())</span>
<span id="cb8-41">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Ax_theta.at[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>].add(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb8-42">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.ones(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb8-43">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> b.at[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">51</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb8-44">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> jsp.linalg.solve_triangular(Ax_theta, b, lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, trans <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"N"</span>)</span>
<span id="cb8-45"></span>
<span id="cb8-46"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> no_diff(theta):</span>
<span id="cb8-47">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_triangular_solve(A_indices, A_indptr, A_x, jnp.ones(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>), transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb8-48"></span>
<span id="cb8-49"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> no_diff_jax(theta):</span>
<span id="cb8-50">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> jsp.linalg.solve_triangular(jnp.array(sparse.tril(A).todense()), jnp.ones(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>), lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, trans <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"N"</span>)</span>
<span id="cb8-51"></span>
<span id="cb8-52">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span>
<span id="cb8-53">primal1, jvp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(f, (jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]),), (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]),))</span>
<span id="cb8-54">primal2, jvp2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(f_jax, (jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]),), (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]),))</span>
<span id="cb8-55">grad1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: jnp.mean(f(x)))(jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]))</span>
<span id="cb8-56">grad2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: jnp.mean(f_jax(x)))(jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]))</span>
<span id="cb8-57"></span>
<span id="cb8-58">primal3, jvp3 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(g, (jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]),), (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]),))</span>
<span id="cb8-59">primal4, jvp4 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(g_jax, (jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]),), (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]),))</span>
<span id="cb8-60">grad3 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: jnp.mean(g(x)))(jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]))</span>
<span id="cb8-61">grad4 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: jnp.mean(g_jax(x)))(jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]))  </span>
<span id="cb8-62"></span>
<span id="cb8-63">primal5, jvp5 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(h, (jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]),), (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]),))</span>
<span id="cb8-64">primal6, jvp6 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(h_jax, (jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]),), (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]),))</span>
<span id="cb8-65">grad5 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: jnp.mean(h(x)))(jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]))</span>
<span id="cb8-66">grad6 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: jnp.mean(h_jax(x)))(jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]))</span>
<span id="cb8-67"></span>
<span id="cb8-68">primal7, jvp7 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(no_diff, (jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]),), (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]),))</span>
<span id="cb8-69">primal8, jvp8 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(no_diff_jax, (jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]),), (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]),))</span>
<span id="cb8-70">grad7 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: jnp.mean(no_diff(x)))(jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]))</span>
<span id="cb8-71">grad8 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: jnp.mean(no_diff_jax(x)))(jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]))</span>
<span id="cb8-72"></span>
<span id="cb8-73"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"""</span></span>
<span id="cb8-74"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Variable L:</span></span>
<span id="cb8-75"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Primal difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(primal1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> primal2)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb8-76"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  JVP difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(jvp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> jvp2)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb8-77"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Gradient difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(grad1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> grad2)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb8-78"></span>
<span id="cb8-79"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Variable b:</span></span>
<span id="cb8-80"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Primal difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(primal3 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> primal4)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb8-81"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  JVP difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(jvp3 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> jvp4)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb8-82"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Gradient difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(grad3 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> grad4)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span></span>
<span id="cb8-83"></span>
<span id="cb8-84"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Variable L and b:</span></span>
<span id="cb8-85"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Primal difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(primal5 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> primal6)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb8-86"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  JVP difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(jvp5 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> jvp6)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb8-87"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Gradient difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(grad5 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> grad6)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb8-88"></span>
<span id="cb8-89"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">No diff:</span></span>
<span id="cb8-90"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Primal difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(primal7 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> primal8)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb8-91"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  JVP difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(jvp7 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> jvp8)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb8-92"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Gradient difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(grad7 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> grad8)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb8-93"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Variable L:
  Primal difference:  1.98e-07
  JVP difference:  2.58e-12
  Gradient difference:  0.00e+00

Variable b:
  Primal difference:  7.94e-06
  JVP difference:  1.83e-08
  Gradient difference:  3.29e-10 

Variable L and b:
  Primal difference:  2.08e-06
  JVP difference:  1.08e-08
  Gradient difference:  2.33e-10

No diff:
  Primal difference: 2.2101993124579167e-07
  JVP difference: 0.0
  Gradient difference: 0.0
</code></pre>
</div>
</div>
<p>Stunning!</p>
</section>
</section>
<section id="primitive-one-the-general-a-1b" class="level2">
<h2 class="anchored" data-anchor-id="primitive-one-the-general-a-1b">Primitive one: The general <img src="https://latex.codecogs.com/png.latex?A%5E%7B-1%7Db"></h2>
<p>Ok. So this is a very similar problem to the one that we just solved. But, as fate would have it, the solution is going to look quite different. Why? Because we need to compute a Cholesky factorisation.</p>
<p>First things first, though, we are going to need a JAX-traceable way to compute a Cholesky factor. This means that we need<sup>25</sup> to tell our <code>sparse_solve</code> function the how many non-zeros the sparse Cholesky will have. Why? Well. It has to do with how the function is used.</p>
<p>When <code>sparse_cholesky()</code> is called with concrete inputs<sup>26</sup>, then it can quite happily work out the sparsity structure of <img src="https://latex.codecogs.com/png.latex?L">. But when JAX is preparing to transform the code, eg when it’s building a gradient, it calls <code>sparse_cholesky()</code> using abstract arguments that only share the shape information from the inputs. This is <em>not</em> enough to compute the sparsity structure. We <em>need</em> the <code>indices</code> and <code>indptr</code> arrays.</p>
<p>This means that we need <code>sparse_cholesky()</code> to throw an error if <code>L_nse</code> isn’t passed. This wasn’t implemented well last time, so here it is done properly.</p>
<p>(If you’re wondering about that <code>None</code> argument, it is the identity transform. So if <code>A_indices</code> is a concrete value, <code>ind = A_indices</code>. Otherwise an error is called.)</p>
<div class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb10-1">sparse_cholesky_p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> core.Primitive(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sparse_cholesky"</span>)</span>
<span id="cb10-2"></span>
<span id="cb10-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_cholesky(A_indices, A_indptr, A_x, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>, L_nse: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>):</span>
<span id="cb10-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""A JAX traceable sparse cholesky decomposition"""</span></span>
<span id="cb10-5">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> L_nse <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>:</span>
<span id="cb10-6">    err_string <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You need to pass a value to L_nse when doing fancy sparse_cholesky."</span></span>
<span id="cb10-7">    ind <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> core.concrete_or_error(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, A_indices, err_string)</span>
<span id="cb10-8">    ptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> core.concrete_or_error(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, A_indptr, err_string)</span>
<span id="cb10-9">    L_ind, _ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _symbolic_factor(ind, ptr)</span>
<span id="cb10-10">    L_nse <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_ind)</span>
<span id="cb10-11">  </span>
<span id="cb10-12">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_cholesky_p.bind(A_indices, A_indptr, A_x, L_nse <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_nse)</span></code></pre></div>
</div>
<details>
<summary>
The rest of the Choleksy code
</summary>
<div class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb11-1"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@sparse_cholesky_p.def_impl</span></span>
<span id="cb11-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_cholesky_impl(A_indices, A_indptr, A_x, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>, L_nse):</span>
<span id="cb11-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""The implementation of the sparse cholesky This is not JAX traceable."""</span></span>
<span id="cb11-4">  </span>
<span id="cb11-5">  L_indices, L_indptr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _symbolic_factor(A_indices, A_indptr)</span>
<span id="cb11-6">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> L_nse <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>:</span>
<span id="cb11-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_indices) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_nse</span>
<span id="cb11-8">    </span>
<span id="cb11-9">  L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _structured_copy(A_indices, A_indptr, A_x, L_indices, L_indptr)</span>
<span id="cb11-10">  L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _sparse_cholesky_impl(L_indices, L_indptr, L_x)</span>
<span id="cb11-11">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_indices, L_indptr, L_x</span>
<span id="cb11-12"></span>
<span id="cb11-13"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> _symbolic_factor(A_indices, A_indptr):</span>
<span id="cb11-14">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Assumes A_indices and A_indptr index the lower triangle of $A$ ONLY.</span></span>
<span id="cb11-15">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb11-16">  L_sym <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [np.array([], dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n)]</span>
<span id="cb11-17">  children <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [np.array([], dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n)]</span>
<span id="cb11-18">  </span>
<span id="cb11-19">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n):</span>
<span id="cb11-20">    L_sym[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_indices[A_indptr[j]:A_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb11-21">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> child <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> children[j]:</span>
<span id="cb11-22">      tmp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_sym[child][L_sym[child] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> j]</span>
<span id="cb11-23">      L_sym[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.unique(np.append(L_sym[j], tmp))</span>
<span id="cb11-24">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_sym[j]) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:</span>
<span id="cb11-25">      p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_sym[j][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb11-26">      children[p] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.append(children[p], j)</span>
<span id="cb11-27">        </span>
<span id="cb11-28">  L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb11-29">  L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum([<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> L_sym])</span>
<span id="cb11-30">  L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.concatenate(L_sym)</span>
<span id="cb11-31">  </span>
<span id="cb11-32">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_indices, L_indptr</span>
<span id="cb11-33"></span>
<span id="cb11-34"></span>
<span id="cb11-35"></span>
<span id="cb11-36"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> _structured_copy(A_indices, A_indptr, A_x, L_indices, L_indptr):</span>
<span id="cb11-37">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb11-38">  L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_indices))</span>
<span id="cb11-39">  </span>
<span id="cb11-40">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n):</span>
<span id="cb11-41">    copy_idx <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.nonzero(np.in1d(L_indices[L_indptr[j]:L_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]],</span>
<span id="cb11-42">                                  A_indices[A_indptr[j]:A_indptr[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]))[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb11-43">    L_x[L_indptr[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> copy_idx] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_x[A_indptr[j]:A_indptr[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb11-44">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_x</span>
<span id="cb11-45"></span>
<span id="cb11-46"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> _sparse_cholesky_impl(L_indices, L_indptr, L_x):</span>
<span id="cb11-47">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb11-48">  descendant <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [[] <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n)]</span>
<span id="cb11-49">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n):</span>
<span id="cb11-50">    tmp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_x[L_indptr[j]:L_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb11-51">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> bebe <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> descendant[j]:</span>
<span id="cb11-52">      k <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bebe[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb11-53">      Ljk<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_x[bebe[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb11-54">      pad <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.nonzero(                                                       <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb11-55">          L_indices[L_indptr[k]:L_indptr[k<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_indices[L_indptr[j]])[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb11-56">      update_idx <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.nonzero(np.in1d(                                        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb11-57">                    L_indices[L_indptr[j]:L_indptr[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]],                     <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb11-58">                    L_indices[(L_indptr[k] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> pad):L_indptr[k<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]))[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb11-59">      tmp[update_idx] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tmp[update_idx] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>                                     <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb11-60">                        Ljk <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> L_x[(L_indptr[k] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> pad):L_indptr[k <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb11-61">            </span>
<span id="cb11-62">    diag <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.sqrt(tmp[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb11-63">    L_x[L_indptr[j]] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> diag</span>
<span id="cb11-64">    L_x[(L_indptr[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>):L_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tmp[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> diag</span>
<span id="cb11-65">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> idx <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(L_indptr[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, L_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]):</span>
<span id="cb11-66">      descendant[L_indices[idx]].append((j, idx))</span>
<span id="cb11-67">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_x</span>
<span id="cb11-68"></span>
<span id="cb11-69"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@sparse_cholesky_p.def_abstract_eval</span></span>
<span id="cb11-70"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_cholesky_abstract_eval(A_indices, A_indptr, A_x, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>, L_nse):</span>
<span id="cb11-71">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> core.ShapedArray((L_nse,), A_indices.dtype),                   <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb11-72">         core.ShapedArray(A_indptr.shape, A_indptr.dtype),             <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb11-73">         core.ShapedArray((L_nse,), A_x.dtype)</span></code></pre></div>
</div>
</details>
<section id="why-do-we-need-a-new-pattern-for-this-very-very-similar-problem" class="level3">
<h3 class="anchored" data-anchor-id="why-do-we-need-a-new-pattern-for-this-very-very-similar-problem">Why do we need a new pattern for this very very similar problem?</h3>
<p>Ok. So now on to the details. If we try to repeat our previous pattern it would look like this.</p>
<div class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb12-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_solve_value_and_jvp(arg_values, arg_tangents, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>, L_nse):</span>
<span id="cb12-2">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">""" </span></span>
<span id="cb12-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  Jax-traceable jacobian-vector product implmentation for sparse_solve.</span></span>
<span id="cb12-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  """</span></span>
<span id="cb12-5">  </span>
<span id="cb12-6">  A_indices, A_indptr, A_x, b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> arg_values</span>
<span id="cb12-7">  _, _, A_xt, bt <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> arg_tangents</span>
<span id="cb12-8"></span>
<span id="cb12-9">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Needed for shared computation</span></span>
<span id="cb12-10">  L_indices, L_indptr, L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_cholesky(A_indices, A_indptr, A_x)</span>
<span id="cb12-11"></span>
<span id="cb12-12">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make the primal</span></span>
<span id="cb12-13">  primal_out <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve(L_indices, L_indptr, L_x, b, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb12-14">  primal_out <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve(L_indices, L_indptr, L_x, primal_out, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb12-15"></span>
<span id="cb12-16">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(A_xt) <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> ad.Zero:</span>
<span id="cb12-17">    Delta_lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jsparse.CSC((A_xt, A_indices, A_indptr), shape <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (b.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>], b.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]))</span>
<span id="cb12-18">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># We need to do Delta @ primal_out, but we only have the lower triangle</span></span>
<span id="cb12-19">    rhs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Delta_lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> primal_out <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Delta_lower.transpose() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> primal_out <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> A_xt[A_indptr[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> primal_out</span>
<span id="cb12-20">    jvp_Ax <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve(L_indices, L_indptr, L_x, rhs)</span>
<span id="cb12-21">    jvp_Ax <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve(L_indices, L_indptr, L_x, jvp_Ax, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb12-22">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span>:</span>
<span id="cb12-23">    jvp_Ax <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.zeros_like_array(primal_out)</span>
<span id="cb12-24"></span>
<span id="cb12-25">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(bt) <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> ad.Zero:</span>
<span id="cb12-26">    jvp_b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve(L_indices, L_indptr, L_x, bt)</span>
<span id="cb12-27">    jvp_b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve(L_indices, L_indptr, L_x, jvp_b, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb12-28">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span>:</span>
<span id="cb12-29">    jvp_b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lax.zeros_like_array(primal_out)</span>
<span id="cb12-30"></span>
<span id="cb12-31">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> primal_out, jvp_b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> jvp_Ax</span></code></pre></div>
</div>
<p>That’s all well and good. Nothing weird there.</p>
<p>The problem comes when you need to implement the transposition rule. Remembering that <img src="https://latex.codecogs.com/png.latex?%5Cbar%20b%20=%20A%5E%7B-T%7D%5Cbar%20c%20=%20A%5E%7B-1%7D%5Cbar%20c">, you might see the issue: we are going to need the Cholesky factorisation. <em>But we have no way to pass</em> <img src="https://latex.codecogs.com/png.latex?L"> <em>to the transpose function</em>.</p>
<p>This means that we would need to compute <em>two</em> Cholesky factorisations per gradient instead of one. As the Cholesky factorisation is our slowest operation, we do not want to do extra ones! We want to compute the Cholesky triangle once and pass it around like a party bottom<sup>27</sup>. We do not want each of our functions to have to make a deep and meaningful connection with the damn matrix<sup>28</sup>.</p>
</section>
<section id="a-different-solution" class="level3">
<h3 class="anchored" data-anchor-id="a-different-solution">A different solution</h3>
<p>So how do we pass around our Cholesky triangle? Well, I do love a good class so my first thought was “fuck it. I’ll make a class and I’ll pass it that way”. But the developers of JAX had a <em>much</em> better idea.</p>
<p>Their idea was to abstract the idea of a linear solve and its gradients. They do this through <code>lax.custom_linear_solve</code>. This is a function that takes all of the bits that you would need to compute <img src="https://latex.codecogs.com/png.latex?A%5E%7B-1%7Db"> and all of its derivatives. In particular it takes<sup>29</sup>:</p>
<ul>
<li><code>matvec</code>: A function that <code>matvec(x)</code> that computes <img src="https://latex.codecogs.com/png.latex?Ax">. This might seem a bit weird, but it’s the most common atrocity committed by mathematicians is abstracting<sup>30</sup> a matrix to a linear mapping. So we might as well just suck it up.</li>
<li><code>b</code>: The right hand side vector<sup>31</sup></li>
<li><code>solve</code>: A function that takes takes the <code>matvec</code> and a vector so that<sup>32</sup> <code>solve(matvec, matvec(x)) == x</code></li>
<li><code>symmetric</code>: A boolean indicating if <img src="https://latex.codecogs.com/png.latex?A"> is symmetric.</li>
</ul>
<p>The idea (happily copped from the implementation of <code>jax.scipy.linalg.solve</code>) is to wrap our Cholesky decomposition in the solve function. Through the never ending miracle of partial evaluation.</p>
<div class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb13-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> functools <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> partial</span>
<span id="cb13-2"></span>
<span id="cb13-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_solve(A_indices, A_indptr, A_x, b, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>, L_nse <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>):</span>
<span id="cb13-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb13-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  A JAX-traceable sparse solve. For this moment, only for vector b</span></span>
<span id="cb13-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  """</span></span>
<span id="cb13-7">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> b.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> A_indptr.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb13-8">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> b.ndim <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb13-9">  </span>
<span id="cb13-10">  L_indices, L_indptr, L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_cholesky(</span>
<span id="cb13-11">    lax.stop_gradient(A_indices), </span>
<span id="cb13-12">    lax.stop_gradient(A_indptr), </span>
<span id="cb13-13">    lax.stop_gradient(A_x), L_nse <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_nse)</span>
<span id="cb13-14">  </span>
<span id="cb13-15">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> chol_solve(L_indices, L_indptr, L_x, b):</span>
<span id="cb13-16">    out <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve(L_indices, L_indptr, L_x, b, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb13-17">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_triangular_solve(L_indices, L_indptr, L_x, out, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb13-18">  </span>
<span id="cb13-19">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> matmult(A_indices, A_indptr, A_x, b):</span>
<span id="cb13-20">    A_lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jsparse.CSC((A_x, A_indices, A_indptr), shape <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (b.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>], b.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]))</span>
<span id="cb13-21">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> A_lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> A_lower.transpose() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> A_x[A_indptr[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> b</span>
<span id="cb13-22"></span>
<span id="cb13-23">  solver <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> partial(</span>
<span id="cb13-24">    lax.custom_linear_solve,</span>
<span id="cb13-25">    <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: matmult(A_indices, A_indptr, A_x, x),</span>
<span id="cb13-26">    solve <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> _, x: chol_solve(L_indices, L_indptr, L_x, x),</span>
<span id="cb13-27">    symmetric <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb13-28"></span>
<span id="cb13-29">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> solver(b)</span></code></pre></div>
</div>
<p>There are three things of note in that implementation.</p>
<ol type="1">
<li><p>The calls to <code>lax.stop_gradient()</code>: These tell JAX to not bother computing the gradient of these terms. The relevant parts of the derivatives are computed explicitly by <code>lax.custom_linear_solve</code> in terms of <code>matmult</code> and <code>solve</code>, neither of which need the explicit derivative of the cholesky factorisation.!</p></li>
<li><p>That definition of <code>matmult()</code><sup>33</sup>: Look. I don’t know what to tell you. Neither addition nor indexing is implemented for <code>jsparse.CSC</code> objects. So we did it the semi-manual way. (I am thankful that matrix-vector multiplication is available)</p></li>
<li><p>The definition of <code>solver()</code>: Partial evaluation is a wonderful wonderful thing. <code>functools.partial()</code> transforms <code>lax.custom_linear_solve()</code> from a function that takes 3 arguments (and some keywords), into a function <code>solver()</code> that takes one<sup>34</sup> argument<sup>35</sup> (<code>b</code>, the only positional argument of <code>lax.custom_linear_solve()</code> that isn’t specified).</p></li>
</ol>
</section>
<section id="does-it-work" class="level3">
<h3 class="anchored" data-anchor-id="does-it-work">Does it work?</h3>
<div class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb14-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> f(theta):</span>
<span id="cb14-2">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> A_x)</span>
<span id="cb14-3">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Ax_theta.at[A_indptr[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]].add(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb14-4">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.ones(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb14-5">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_solve(A_indices, A_indptr, Ax_theta, b)</span>
<span id="cb14-6"></span>
<span id="cb14-7"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> f_jax(theta):</span>
<span id="cb14-8">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> A.todense())</span>
<span id="cb14-9">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Ax_theta.at[np.arange(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>),np.arange(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)].add(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb14-10">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.ones(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb14-11">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> jsp.linalg.solve(Ax_theta, b)</span>
<span id="cb14-12"></span>
<span id="cb14-13"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> g(theta):</span>
<span id="cb14-14">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(A_x)</span>
<span id="cb14-15">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.ones(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb14-16">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> b.at[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb14-17">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> b.at[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">51</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb14-18">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_solve(A_indices, A_indptr, Ax_theta, b)</span>
<span id="cb14-19"></span>
<span id="cb14-20"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> g_jax(theta):</span>
<span id="cb14-21">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(A.todense())</span>
<span id="cb14-22">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.ones(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb14-23">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> b.at[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb14-24">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> b.at[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">51</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb14-25">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> jsp.linalg.solve(Ax_theta, b)</span>
<span id="cb14-26"></span>
<span id="cb14-27"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> h(theta):</span>
<span id="cb14-28">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(A_x)</span>
<span id="cb14-29">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Ax_theta.at[A_indptr[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]].add(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb14-30">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.ones(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb14-31">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> b.at[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">51</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb14-32">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_solve(A_indices, A_indptr, Ax_theta, b)</span>
<span id="cb14-33"></span>
<span id="cb14-34"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> h_jax(theta):</span>
<span id="cb14-35">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(A.todense())</span>
<span id="cb14-36">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Ax_theta.at[np.arange(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>),np.arange(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)].add(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb14-37">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.ones(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb14-38">  b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> b.at[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">51</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb14-39">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> jsp.linalg.solve(Ax_theta, b)</span>
<span id="cb14-40"></span>
<span id="cb14-41">primal1, jvp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(f, (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span>]),), (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]),))</span>
<span id="cb14-42">primal2, jvp2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(f_jax, (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span>]),), (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]),))</span>
<span id="cb14-43">grad1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: jnp.mean(f(x)))(jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span>]))</span>
<span id="cb14-44">grad2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: jnp.mean(f_jax(x)))(jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span>]))</span>
<span id="cb14-45"></span>
<span id="cb14-46"></span>
<span id="cb14-47">primal3, jvp3 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(g, (jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]),), (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]),))</span>
<span id="cb14-48">primal4, jvp4 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(g_jax, (jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]),), (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]),))</span>
<span id="cb14-49">grad3 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: jnp.mean(g(x)))(jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]))</span>
<span id="cb14-50">grad4 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: jnp.mean(g_jax(x)))(jnp.array([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">142.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]))</span>
<span id="cb14-51"></span>
<span id="cb14-52">primal5, jvp5 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(h, (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]),), (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]),))</span>
<span id="cb14-53">primal6, jvp6 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(h_jax, (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]),), (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]),))</span>
<span id="cb14-54">grad5 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: jnp.mean(f(x)))(jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]))</span>
<span id="cb14-55">grad6 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: jnp.mean(f_jax(x)))(jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">342.</span>]))</span>
<span id="cb14-56"></span>
<span id="cb14-57"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"""</span></span>
<span id="cb14-58"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Check the plumbing!</span></span>
<span id="cb14-59"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Variable A:</span></span>
<span id="cb14-60"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Primal difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(primal1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> primal2)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb14-61"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  JVP difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(jvp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> jvp2)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb14-62"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Gradient difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(grad1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> grad2)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb14-63"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  </span></span>
<span id="cb14-64"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Variable b:</span></span>
<span id="cb14-65"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Primal difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(primal3 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> primal4)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb14-66"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  JVP difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(jvp3 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> jvp4)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb14-67"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Gradient difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(grad3 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> grad4)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span></span>
<span id="cb14-68"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">    </span></span>
<span id="cb14-69"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Variable A and b:</span></span>
<span id="cb14-70"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Primal difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(primal5 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> primal6)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb14-71"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  JVP difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(jvp5 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> jvp6)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb14-72"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Gradient difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(grad5 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> grad6)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb14-73"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  """</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Check the plumbing!
Variable A:
  Primal difference:  1.98e-07
  JVP difference:  1.43e-07
  Gradient difference:  0.00e+00
  
Variable b:
  Primal difference:  4.56e-06
  JVP difference:  6.52e-08
  Gradient difference:  9.31e-10 
    
Variable A and b:
  Primal difference:  8.10e-06
  JVP difference:  1.83e-06
  Gradient difference:  1.82e-12
  </code></pre>
</div>
</div>
<p>Yes.</p>
</section>
<section id="why-is-this-better-than-just-differentiating-through-the-cholesky-factorisation" class="level3">
<h3 class="anchored" data-anchor-id="why-is-this-better-than-just-differentiating-through-the-cholesky-factorisation">Why is this better than just differentiating through the Cholesky factorisation?</h3>
<p>The other option for making this work would’ve been to implement the Cholesky factorisation as a primitive (~which we are about to do!~ which we will do another day) and then write the sparse solver directly as a pure JAX function.</p>
<div class="cell" data-execution_count="13">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb16-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_solve_direct(A_indices, A_indptr, A_x, b, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>, L_nse <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>):</span>
<span id="cb16-2">  L_indices, L_indptr, L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_cholesky(A_indices, A_indptr, A_x)</span>
<span id="cb16-3">  out <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve(L_indices, L_indptr, L_x, b)</span>
<span id="cb16-4">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_triangular_solve(L_indices, L_indptr, L_x, out, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span></code></pre></div>
</div>
<p>This function is JAX-traceable<sup>36</sup> and, therefore, we could compute the gradient of it directly. It turns out that this is going to be a bad idea.</p>
<p>Why? Because the derivative of <code>sparse_cholesky</code>, which we would have to chain together with the derivatives from the solver, is pretty complicated. Basically, this means that we’d have to do a lot more work<sup>37</sup> than we do if we just implement the symbolic formula for the derivatives.</p>
</section>
</section>
<section id="primitive-three-the-dreaded-log-determinant" class="level2">
<h2 class="anchored" data-anchor-id="primitive-three-the-dreaded-log-determinant">Primitive three: The dreaded log determinant</h2>
<p>Ok, so now we get to the good one. The log-determinant of <img src="https://latex.codecogs.com/png.latex?A">. The first thing that we need to do is wrench out a derivative. This is not as easy as it was for the linear solve. So what follows is a modification for sparse matrices from Appendix A of <a href="https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf">Boyd’s convex optimisation book</a>.</p>
<p>It’s pretty easy to convince yourself that <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Clog(%7CA%20+%20%5CDelta%7C)%20&amp;=%20%5Clog%5Cleft(%20%5Cleft%7CA%5E%7B1/2%7D(I%20+%20A%5E%7B-1/2%7D%5CDelta%20A%5E%7B-1/2%7D)A%5E%7B1/2%7D%5Cright%7C%5Cright)%20%5C%5C%0A&amp;=%20%5Clog(%7CA%7C)%20+%20%5Clog%5Cleft(%20%5Cleft%7CI%20+%20A%5E%7B-1/2%7D%5CDelta%20A%5E%7B-1/2%7D%5Cright%7C%5Cright).%0A%5Cend%7Balign*%7D"></p>
<p>It is harder to convince yourself how this could possibly be a useful fact.</p>
<p>If we write <img src="https://latex.codecogs.com/png.latex?%5Clambda_i">, <img src="https://latex.codecogs.com/png.latex?i%20=%201,%20%5Cldots,%20n"> as the eigenvalues of <img src="https://latex.codecogs.com/png.latex?A%5E%7B-1/2%7D%5CDelta%20A%5E%7B-1/2%7D">, then we have <img src="https://latex.codecogs.com/png.latex?%0A%5Clog(%7CA%20+%20%5CDelta%20%7C)%20=%20%5Clog(%7CA%7C)%20+%20%5Csum_%7Bi=1%7D%5En%20%5Clog(%201%20+%20%5Clambda_i).%0A"> Remembering that <img src="https://latex.codecogs.com/png.latex?%5CDelta"> is very small, it follows that <img src="https://latex.codecogs.com/png.latex?A%5E%7B-1/2%7D%5CDelta%20A%5E%7B-1/2%7D"> will <em>also</em> be small. That translates to the eigenvalues of <img src="https://latex.codecogs.com/png.latex?A%5E%7B-1/2%7D%5CDelta%20A%5E%7B-1/2%7D"> all being small. Therefore, we can use the approximation <img src="https://latex.codecogs.com/png.latex?%5Clog(1%20+%20%5Clambda_i)%20%20=%20%5Clambda_i%20%20+%20%5Cmathcal%7BO%7D(%5Clambda_i%5E2)">.</p>
<p>This means that<sup>38</sup> <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Clog(%7CA%20+%20%5CDelta%20%7C)%20&amp;=%20%5Clog(%7CA%7C)%20+%20%5Csum_%7Bi=1%7D%5En%20%20%5Clambda_i%20+%20%5Cmathcal%7BO%7D%5Cleft(%5C%7C%5CDelta%5C%7C%5E2%5Cright)%20%5C%5C%0A&amp;=%5Clog(%7CA%7C)%20+%20%5Coperatorname%7Btr%7D%5Cleft(A%5E%7B-1/2%7D%20%5CDelta%20A%5E%7B-1%7D%20%5Cright)%20+%20%5Cmathcal%7BO%7D%5Cleft(%5C%7C%5CDelta%5C%7C%5E2%5Cright)%20%5C%5C%0A&amp;=%20%5Clog(%7CA%7C)%20+%20%5Coperatorname%7Btr%7D%5Cleft(A%5E%7B-1%7D%20%5CDelta%20%5Cright)%20+%20%5Cmathcal%7BO%7D%5Cleft(%5C%7C%5CDelta%5C%7C%5E2%5Cright),%0A%5Cend%7Balign*%7D"> which follows from the cyclic property of the trace.</p>
<p>If we recall the formula from the last section defining the Jacobian-vector product, in our context <img src="https://latex.codecogs.com/png.latex?m%20=%201">, <img src="https://latex.codecogs.com/png.latex?x"> is the vector of non-zero entries of the lower triangle of <img src="https://latex.codecogs.com/png.latex?A"> stacked by column, and <img src="https://latex.codecogs.com/png.latex?%5Cdelta"> is the vector of non-zero entries of the lower triangle of <img src="https://latex.codecogs.com/png.latex?%5CDelta">. That means the Jacobian-vector product is <img src="https://latex.codecogs.com/png.latex?%0AJ(x)%5Cdelta%20=%20%5Coperatorname%7Btr%7D%5Cleft(A%5E%7B-1%7D%20%5CDelta%20%5Cright)%20=%20%5Csum_%7Bi=1%7D%5En%5Csum_%7Bj=1%7D%5En%5BA%5E%7B-1%7D%5D_%7Bij%7D%20%5CDelta_%7Bij%7D.%0A"></p>
<p>Remembering that <img src="https://latex.codecogs.com/png.latex?%5CDelta"> is sparse with the same sparsity pattern as <img src="https://latex.codecogs.com/png.latex?A">, we see that the Jacobian-vector product requires us to know the values of <img src="https://latex.codecogs.com/png.latex?A%5E%7B-1%7D"> that correspond to non-zero elements of <img src="https://latex.codecogs.com/png.latex?A">. That’s good news because we will see that these entries are relatively cheap and easy to compute. Whereas the full inverse is dense and very expensive to compute.</p>
<p>But before we get to that, I need to point out a trap for young players<sup>39</sup>. Lest your implementations go down faster than me when someone asks politely.</p>
<p>The problem comes from how we store our matrix. A mathematician would suggest that it’s our representation. A physicist<sup>40</sup> would shit on about being coordinate free with such passion that he<sup>41</sup> will keep going even after you quietly leave the room.</p>
<p>The problem is that we only store the non-zero entries of the lower-triangular part of <img src="https://latex.codecogs.com/png.latex?A">. This means that <em>we need to be careful</em> that when we compute the Jacobian-vector product that we properly compute the Matrix-vector product.</p>
<p>Let <code>A_indices</code> and <code>A_indptr</code> define the sparsity structure of <img src="https://latex.codecogs.com/png.latex?A"> (and <img src="https://latex.codecogs.com/png.latex?%5CDelta">). Then if <img src="https://latex.codecogs.com/png.latex?A_x"> is our input and <img src="https://latex.codecogs.com/png.latex?v"> is our vector, then we need to do the follow steps to compute the Jacobian-vector product:</p>
<ol type="1">
<li>Compute <code>Ainv_x</code> (aka the non-zero elements of <img src="https://latex.codecogs.com/png.latex?A%5E%7B-1%7D"> that correspond to the sparsity pattern of <img src="https://latex.codecogs.com/png.latex?A">)</li>
<li>Compute the matrix vector product as</li>
</ol>
<div class="cell" data-execution_count="14">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb17-1">jvp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(Ainv_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> v) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(Ainv_x[A_indptr[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> v[A_indptr[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]])</span></code></pre></div>
</div>
<p>Why does it look like that? Well we need to add the contribution from the upper triangle as well as the lower triangle. And one way to do that is to just double the sum and then subtract off the diagonal terms that we’ve counted twice.</p>
<p>(I’m making a pretty big assumption here, which is fine in our context, that <img src="https://latex.codecogs.com/png.latex?A"> has a non-zero diagonal. If that doesn’t hold, it’s just a change of the indexing in the second term to just pull out the diagonal terms.)</p>
<p>Using similar reasoning, we can compute the Jacobian as <img src="https://latex.codecogs.com/png.latex?%0A%5BJ_f(x)%5D_%7Bi1%7D%20=%20%5Cbegin%7Bcases%7D%0A%5Coperatorname%7Bpartial-inverse%7D(x)_i,%20%5Cqquad%20&amp;%20x_i%20%20%5Ctext%7B%20is%20a%20diagonal%20element%20of%20%7DA%20%5C%5C%0A2%5Coperatorname%7Bpartial-inverse%7D(x)_i,%20%5Cqquad%20&amp;%20%5Ctext%7Botherwise%7D,%0A%5Cend%7Bcases%7D%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Coperatorname%7Bpartial-inverse%7D(x)"> is the vector that stacks the columns of the elements of <img src="https://latex.codecogs.com/png.latex?A%5E%7B-1%7D"> that correspond to the non-zero elements of <img src="https://latex.codecogs.com/png.latex?A">. (Yikes!)</p>
<section id="computing-the-partial-inverse" class="level3">
<h3 class="anchored" data-anchor-id="computing-the-partial-inverse">Computing the partial inverse</h3>
<p>So now we need to actually work out how to compute this <em>partial inverse</em> of a symmetric positive definite matrix <img src="https://latex.codecogs.com/png.latex?A">. To do this, we are going to steal a technique that goes back to Takahashi, Fagan, and Chen<sup>42</sup> in 1973. (For this presentation, I’m basically pillaging <a href="https://www.sciencedirect.com/science/article/pii/S0378375807000845">Håvard Rue and Sara Martino’s 2007 paper.</a>)</p>
<p>Their idea was that if we write <img src="https://latex.codecogs.com/png.latex?A%20=%20VDV%5ET">, where <img src="https://latex.codecogs.com/png.latex?V"> is a lower-triangular matrix with ones on the diagonal and <img src="https://latex.codecogs.com/png.latex?D"> is diagonal. This links up with our usual Cholesky factorisation through the identity <img src="https://latex.codecogs.com/png.latex?L%20=%20VD%5E%7B1/2%7D">. It follows that if <img src="https://latex.codecogs.com/png.latex?S%20=%20A%5E%7B-1%7D">, then <img src="https://latex.codecogs.com/png.latex?VDV%5ETS%20=%20I">. Then, we make some magic manipulations<sup>43</sup>. <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0AV%5ETS%20&amp;=%20D%5E%7B-1%7DV%5E%7B-1%7D%20%5C%5C%0AS%20+%20V%5ETS%20&amp;=%20S%20+%20D%5E%7B-1%7DV%5E%7B-1%7D%20%5C%5C%0AS%20&amp;=%20D%5E%7B-1%7DV%5E%7B-1%7D%20+%20(I%20-%20V%5ET)S.%0A%5Cend%7Balign*%7D"></p>
<p>Once again, this does not look super-useful. The trick is to notice 2 things.</p>
<ol type="1">
<li><p>Because <img src="https://latex.codecogs.com/png.latex?V"> is lower triangular, <img src="https://latex.codecogs.com/png.latex?V%5E%7B-1%7D"> is also lower triangular and the elements of <img src="https://latex.codecogs.com/png.latex?V%5E%7B-1%7D"> are the inverse of the diagonal elements of <img src="https://latex.codecogs.com/png.latex?V"> (aka they are all 1). Therefore, <img src="https://latex.codecogs.com/png.latex?D%5E%7B-1%7DV%5E%7B-1%7D"> is a lower triangular matrix with a diagonal given by the diagonal of <img src="https://latex.codecogs.com/png.latex?D%5E%7B-1%7D">.</p></li>
<li><p><img src="https://latex.codecogs.com/png.latex?I%20-%20V%5ET"> is an upper triangular matrix and <img src="https://latex.codecogs.com/png.latex?%5BI%20-%20V%5ET%5D_%7Bnn%7D%20=%200">.</p></li>
</ol>
<p>These two things together lead to the somewhat unexpected situation where the upper triangle of <img src="https://latex.codecogs.com/png.latex?S%20=%20D%5E%7B-1%7DV%5E%7B-1%7D%20+%20(I-%20%20V%5ET)S"> defines a set of recursions for the upper triangle of <img src="https://latex.codecogs.com/png.latex?S">. (And, therefore, all of <img src="https://latex.codecogs.com/png.latex?S"> because <img src="https://latex.codecogs.com/png.latex?S"> is symmetric!) These are sometimes referred to as the Takahashi recursions.</p>
<p>But we don’t want the whole upper triangle of <img src="https://latex.codecogs.com/png.latex?S">, we just want the ones that correspond to the non-zero elements of <img src="https://latex.codecogs.com/png.latex?A">. Unfortunately, the set of recursions are not, in general, solveable using only that subset of <img src="https://latex.codecogs.com/png.latex?S">. But we are in luck: they are solveable using the elements of <img src="https://latex.codecogs.com/png.latex?S"> that correspond to the non-zeros of <img src="https://latex.codecogs.com/png.latex?L%20+%20L%5ET">, which, as we know from a few posts ago, is a superset of the non-zero elements of <img src="https://latex.codecogs.com/png.latex?A">!</p>
<p>From this, we get the recursions running from <img src="https://latex.codecogs.com/png.latex?i%20=%20n,%20%5Cldots,%201">, <img src="https://latex.codecogs.com/png.latex?j%20=%20n,%20%5Cldots,%20i"> (the order is important!) such that <img src="https://latex.codecogs.com/png.latex?L_%7Bji%7D%20%5Cneq%200"> <img src="https://latex.codecogs.com/png.latex?%0AS_%7Bji%7D%20=%20%20%20%5Cbegin%7Bcases%7D%0A%5Cfrac%7B1%7D%7BL_%7Bii%7D%5E2%7D%20-%20%5Cfrac%7B1%7D%7BL_%7Bii%7D%7D%5Csum_%7Bk=i+1%7D%5E%7Bn%7D%20L_%7Bki%7D%20S_%7Bkj%7D%20%5Cqquad&amp;%20%20%5Ctext%7Bif%20%7D%20i=j,%20%5C%5C%20%20%20%20%20%20%20%20%20%0A-%20%5Cfrac%7B1%7D%7BL_%7Bii%7D%7D%5Csum_%7Bk=i+1%7D%5E%7Bn%7D%20L_%7Bki%7D%20S_%7Bkj%7D%20%20&amp;%20%5Ctext%7Botherwise%7D.%0A%5Cend%7Bcases%7D%0A"></p>
<p>If you recall our discussion way back when about the way the non-zero structure of the <img src="https://latex.codecogs.com/png.latex?j"> the column of <img src="https://latex.codecogs.com/png.latex?L"> relates to the non-zero structure of the <img src="https://latex.codecogs.com/png.latex?i"> th column for <img src="https://latex.codecogs.com/png.latex?j%20%5Cgeq%20i">, it’s clear that we have computed enough<sup>44</sup> of <img src="https://latex.codecogs.com/png.latex?S"> at every step to complete the recursions.</p>
<p>Now we just need to Python it. (And thanks to Finn Lindgren who helped me understand how to implement this, which he may or may not remember because it happened about five years ago.)</p>
<p>Actually, we need this to be JAX-traceable, so we are going to implement a very basic primitive. In particular, we don’t need to implement a derivative or anything like that, just an abstract evaluation and an implementation.</p>
<div class="cell" data-execution_count="15">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb18-1">sparse_partial_inverse_p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> core.Primitive(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sparse_partial_inverse"</span>)</span>
<span id="cb18-2"></span>
<span id="cb18-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_partial_inverse(L_indices, L_indptr, L_x, out_indices, out_indptr):</span>
<span id="cb18-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb18-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  Computes the elements (out_indices, out_indptr) of the inverse of a sparse matrix (A_indices, A_indptr, A_x)</span></span>
<span id="cb18-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">   with Choleksy factor (L_indices, L_indptr, L_x). (out_indices, out_indptr) is assumed to be either</span></span>
<span id="cb18-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">   the sparsity pattern of A or a subset of it in lower triangular form. </span></span>
<span id="cb18-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  """</span></span>
<span id="cb18-9">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_partial_inverse_p.bind(L_indices, L_indptr, L_x, out_indices, out_indptr)</span>
<span id="cb18-10"></span>
<span id="cb18-11"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@sparse_partial_inverse_p.def_abstract_eval</span></span>
<span id="cb18-12"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_partial_inverse_abstract_eval(L_indices, L_indptr, L_x, out_indices, out_indptr):</span>
<span id="cb18-13">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> abstract_arrays.ShapedArray(out_indices.shape, L_x.dtype)</span>
<span id="cb18-14"></span>
<span id="cb18-15"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@sparse_partial_inverse_p.def_impl</span></span>
<span id="cb18-16"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_partial_inverse_impl(L_indices, L_indptr, L_x, out_indices, out_indptr):</span>
<span id="cb18-17">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb18-18">  Linv <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.dok_array((n,n), dtype <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_x.dtype)</span>
<span id="cb18-19">  counter <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_x) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb18-20">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> col <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>):</span>
<span id="cb18-21">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> row <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> L_indices[L_indptr[col]:L_indptr[col<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]][::<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]:</span>
<span id="cb18-22">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> row <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> col:</span>
<span id="cb18-23">        Linv[row, col] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Linv[col, row] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span></span>
<span id="cb18-24">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span>:</span>
<span id="cb18-25">        Linv[row, col] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> L_x[L_indptr[col]]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb18-26">      L_col  <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_x[L_indptr[col]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:L_indptr[col<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> L_x[L_indptr[col]]</span>
<span id="cb18-27"> </span>
<span id="cb18-28">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> k, L_kcol <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(L_indices[L_indptr[col]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:L_indptr[col<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]], L_col):</span>
<span id="cb18-29">         Linv[col,row] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Linv[row,col] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>  Linv[row, col] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>  L_kcol <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> Linv[k, row]</span>
<span id="cb18-30">        </span>
<span id="cb18-31">  Linv_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.tril(Linv, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">format</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"csc"</span>).data</span>
<span id="cb18-32">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(out_indices) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_indices):</span>
<span id="cb18-33">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> Linv_x</span>
<span id="cb18-34"></span>
<span id="cb18-35">  out_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(out_indices))</span>
<span id="cb18-36">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> col <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n):</span>
<span id="cb18-37">    ind <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.nonzero(np.in1d(L_indices[L_indptr[col]:L_indptr[col<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]],</span>
<span id="cb18-38">      out_indices[out_indptr[col]:out_indptr[col<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]))[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb18-39">    out_x[out_indptr[col]:out_indptr[col<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Linv_x[L_indptr[col] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> ind]</span>
<span id="cb18-40">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> out_x</span></code></pre></div>
</div>
<p>The implementation makes use of the<sup>45</sup> <em>dictionary of keys</em> representation of a sparse matrix from <code>scipy.sparse</code>. This is an efficient storage scheme when you need to modify the sparsity structure (as we are doing here) or do a lot of indexing. It would definitely be possible to implement this directly on the CSC data structure, but it gets a little bit tricky to access the elements of <code>L_inv</code> that are above the diagonal. The resulting code is honestly a mess and there’s lots of non-local memory access anyway, so I implemented it this way.</p>
<p>But let’s be honest: this thing is crying out for a proper symmetric matrix class with sensible reverse iterators. But hey. Python.</p>
<p>The second chunk of the code is just the opposite of our <code>_structured_copy()</code> function. It takes a matrix with the sparsity pattern of <img src="https://latex.codecogs.com/png.latex?L"> and returns one with the sparsity pattern of <code>out</code> (which is assumed to be a subset, and is usually the sparsity pattern of <img src="https://latex.codecogs.com/png.latex?A"> or a diagonal matrix).</p>
<p>Let’s check that it works.</p>
<div class="cell" data-execution_count="16">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb19-1">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>)</span>
<span id="cb19-2">n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb19-3"></span>
<span id="cb19-4"></span>
<span id="cb19-5">L_indices, L_indptr, L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_cholesky(A_indices, A_indptr, A_x)</span>
<span id="cb19-6"></span>
<span id="cb19-7">a_inv_L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_partial_inverse(L_indices, L_indptr, L_x, L_indices, L_indptr)</span>
<span id="cb19-8"></span>
<span id="cb19-9">col_counts_L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [L_indptr[i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> L_indptr[i] <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n)]</span>
<span id="cb19-10">cols_L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.repeat(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n), col_counts_L)</span>
<span id="cb19-11"></span>
<span id="cb19-12">true_inv <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.linalg.inv(A.todense())</span>
<span id="cb19-13">truth_L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> true_inv[L_indices, cols_L]</span>
<span id="cb19-14"></span>
<span id="cb19-15">a_inv_A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_partial_inverse(L_indices, L_indptr, L_x, A_indices, A_indptr)</span>
<span id="cb19-16">col_counts_A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [A_indptr[i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> A_indptr[i] <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n)]</span>
<span id="cb19-17">cols_A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.repeat(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n), col_counts_A)</span>
<span id="cb19-18">truth_A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> true_inv[A_indices, cols_A]</span>
<span id="cb19-19"></span>
<span id="cb19-20"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"""</span></span>
<span id="cb19-21"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Error in partial inverse (all of L): </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(a_inv_L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> truth_L)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb19-22"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Error in partial inverse (all of A): </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(a_inv_A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> truth_A)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb19-23"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Error in partial inverse (all of L):  1.57e-15
Error in partial inverse (all of A):  1.53e-15
</code></pre>
</div>
</div>
</section>
<section id="putting-the-log-determinant-together" class="level3">
<h3 class="anchored" data-anchor-id="putting-the-log-determinant-together">Putting the log-determinant together</h3>
<p>All of our bits are in place, so now all we need is to implement the primitive for the log-determinant. One nice thing here is that we don’t need to implement a transposition rule as the function is not structurally linear in any of its arguments. At this point we take our small wins where we can get them.</p>
<p>There isn’t anything particularly interesting in the implementation. But do note that the trace has been implemented in a way that’s aware that we’re only storing the bottom triangle of <img src="https://latex.codecogs.com/png.latex?A">.</p>
<div class="cell" data-execution_count="17">
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb21-1">sparse_log_det_p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> core.Primitive(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sparse_log_det"</span>)</span>
<span id="cb21-2"></span>
<span id="cb21-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_log_det(A_indices, A_indptr, A_x):</span>
<span id="cb21-4">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_log_det_p.bind(A_indices, A_indptr, A_x)</span>
<span id="cb21-5"></span>
<span id="cb21-6"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@sparse_log_det_p.def_impl</span></span>
<span id="cb21-7"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_log_det_impl(A_indices, A_indptr, A_x):</span>
<span id="cb21-8">  L_indices, L_indptr, L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_cholesky(A_indices, A_indptr, A_x)</span>
<span id="cb21-9">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jnp.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(jnp.log(L_x[L_indptr[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]))</span>
<span id="cb21-10"></span>
<span id="cb21-11"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@sparse_log_det_p.def_abstract_eval</span></span>
<span id="cb21-12"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_log_det_abstract_eval(A_indices, A_indptr, A_x):</span>
<span id="cb21-13">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> abstract_arrays.ShapedArray((<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,), A_x.dtype)</span>
<span id="cb21-14"></span>
<span id="cb21-15"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_log_det_value_and_jvp(arg_values, arg_tangent):</span>
<span id="cb21-16">  A_indices, A_indptr, A_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> arg_values</span>
<span id="cb21-17">  _, _, A_xt <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> arg_tangent</span>
<span id="cb21-18">  L_indices, L_indptr, L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_cholesky(A_indices, A_indptr, A_x)</span>
<span id="cb21-19">  value <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jnp.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(jnp.log(L_x[L_indptr[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]))</span>
<span id="cb21-20">  Ainv_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_partial_inverse(L_indices, L_indptr, L_x, A_indices, A_indptr)</span>
<span id="cb21-21">  jvp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(Ainv_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> A_xt) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(Ainv_x[A_indptr[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> A_xt[A_indptr[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]])</span>
<span id="cb21-22">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> value, jvp</span>
<span id="cb21-23"></span>
<span id="cb21-24">ad.primitive_jvps[sparse_log_det_p] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_log_det_value_and_jvp</span></code></pre></div>
</div>
<p>Finally, we can test it out.</p>
<div class="cell" data-execution_count="18">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb22-1">ld_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.log(np.linalg.det(A.todense())) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#np.sum(np.log(lu.U.diagonal()))</span></span>
<span id="cb22-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Error in log-determinant = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>ld_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> sparse_log_det(A_indices, A_indptr, A_x)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb22-3"></span>
<span id="cb22-4"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> f(theta):</span>
<span id="cb22-5">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> A_x) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> n</span>
<span id="cb22-6">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Ax_theta.at[A_indptr[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]].add(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb22-7">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_log_det(A_indices, A_indptr, Ax_theta)</span>
<span id="cb22-8"></span>
<span id="cb22-9"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> f_jax(theta):</span>
<span id="cb22-10">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> A.todense()) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> n </span>
<span id="cb22-11">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Ax_theta.at[np.arange(n),np.arange(n)].add(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb22-12">  L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.linalg.cholesky(Ax_theta)</span>
<span id="cb22-13">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>jnp.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(jnp.log(jnp.diag(L)))</span>
<span id="cb22-14"></span>
<span id="cb22-15">primal1, jvp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(f, (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span>]),), (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]),))</span>
<span id="cb22-16">primal2, jvp2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(f_jax, (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span>]),), (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]),))</span>
<span id="cb22-17"></span>
<span id="cb22-18">eps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1e-4</span></span>
<span id="cb22-19">jvp_fd <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (f(jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span>]) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> eps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]) ) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> f(jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span>]))) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> eps</span>
<span id="cb22-20"></span>
<span id="cb22-21">grad1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(f)(jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span>]))</span>
<span id="cb22-22">grad2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(f_jax)(jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span>]))</span>
<span id="cb22-23"></span>
<span id="cb22-24"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"""</span></span>
<span id="cb22-25"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Check the Derivatives!</span></span>
<span id="cb22-26"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Variable A:</span></span>
<span id="cb22-27"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Primal difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(primal1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> primal2)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb22-28"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  JVP difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(jvp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> jvp2)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb22-29"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  JVP difference (FD): </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(jvp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> jvp_fd)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb22-30"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Gradient difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(grad1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> grad2)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb22-31"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Error in log-determinant =  0.00e+00</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>
Check the Derivatives!
Variable A:
  Primal difference: 0.0
  JVP difference: 0.000885009765625
  JVP difference (FD): 0.221893310546875
  Gradient difference: 1.526623782410752e-05
</code></pre>
</div>
</div>
<p>I’m not going to lie, I am <em>not happy</em> with that JVP difference. I was somewhat concerned that there was a bug somewhere in my code. I did a little bit of exploring and the error got larger as the problem got larger. It also depended a little bit more than I was comfortable on how I had implemented<sup>46</sup> the baseline dense version.</p>
<p>That second fact suggested to me that it might be a floating point problem. By default, JAX uses single precision (32-bit) floating point. Most modern systems that don’t try and run on GPUs use double precision (64-bit) floating point. So I tried it with double precision and lo and behold, the problem disappears.</p>
<p>Matrix factorisations are bloody hard in single precision.</p>
<div class="cell" data-execution_count="19">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb25-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax.config <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> config</span>
<span id="cb25-2">config.update(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"jax_enable_x64"</span>, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb25-3"></span>
<span id="cb25-4">ld_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.log(np.linalg.det(A.todense())) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#np.sum(np.log(lu.U.diagonal()))</span></span>
<span id="cb25-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Error in log-determinant = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>ld_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> sparse_log_det(A_indices, A_indptr, A_x)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb25-6"></span>
<span id="cb25-7"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> f(theta):</span>
<span id="cb25-8">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> A_x, dtype <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.float64) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> n</span>
<span id="cb25-9">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Ax_theta.at[A_indptr[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]].add(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb25-10">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_log_det(A_indices, A_indptr, Ax_theta)</span>
<span id="cb25-11"></span>
<span id="cb25-12"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> f_jax(theta):</span>
<span id="cb25-13">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.array(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> A.todense(), dtype <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.float64) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> n </span>
<span id="cb25-14">  Ax_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Ax_theta.at[np.arange(n),np.arange(n)].add(theta[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb25-15">  L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.linalg.cholesky(Ax_theta)</span>
<span id="cb25-16">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>jnp.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(jnp.log(jnp.diag(L)))</span>
<span id="cb25-17"></span>
<span id="cb25-18">primal1, jvp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(f, (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span>], dtype <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.float64),), (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>], dtype <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.float64),))</span>
<span id="cb25-19">primal2, jvp2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jvp(f_jax, (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span>], dtype <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.float64),), (jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>], dtype <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.float64),))</span>
<span id="cb25-20"></span>
<span id="cb25-21">eps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1e-7</span></span>
<span id="cb25-22">jvp_fd <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (f(jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span>], dtype <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.float64) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> eps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>], dtype <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.float64) ) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> f(jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span>], dtype <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.float64))) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> eps</span>
<span id="cb25-23"></span>
<span id="cb25-24">grad1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(f)(jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span>], dtype <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.float64))</span>
<span id="cb25-25">grad2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(f_jax)(jnp.array([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.</span>], dtype <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.float64))</span>
<span id="cb25-26"></span>
<span id="cb25-27"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"""</span></span>
<span id="cb25-28"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Check the Derivatives!</span></span>
<span id="cb25-29"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Variable A:</span></span>
<span id="cb25-30"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Primal difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(primal1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> primal2)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb25-31"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  JVP difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(jvp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> jvp2)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb25-32"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  JVP difference (FD): </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(jvp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> jvp_fd)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb25-33"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Gradient difference: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>linalg<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>norm(grad1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> grad2)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb25-34"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Error in log-determinant =  0.00e+00</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>
Check the Derivatives!
Variable A:
  Primal difference: 0.0
  JVP difference: 8.526512829121202e-13
  JVP difference (FD): 4.171707900013644e-06
  Gradient difference: 8.881784197001252e-16
</code></pre>
</div>
</div>
<p>Much better!</p>
</section>
</section>
<section id="wrapping-up" class="level2">
<h2 class="anchored" data-anchor-id="wrapping-up">Wrapping up</h2>
<p>And that is where we will leave it for today. Next up, I’m probably going to need to do the autodiff for the Cholesky factorisation. It’s not <em>hard</em>, but it is tedious<sup>47</sup> and this post is already very long.</p>
<p>After that we need a few more things:</p>
<ol type="1">
<li><p>Compilation rules for all of these things. For the most part, we can just wrap the relevant parts of <a href="https://github.com/libigl/eigen">Eigen</a>. The only non-trivial code would be the partial inverse. That will allow us to JIT shit.</p></li>
<li><p>We need to beef up the sparse matrix class a little. In particular, we are going to need addition and scalar multiplication at the very minimum to make this useful.</p></li>
<li><p>Work out how <a href="https://aesara.readthedocs.io/en/latest/">Aesara</a> works so we can try to prototype a PyMC model.</p></li>
</ol>
<p>That will be <em>a lot</em> more blog posts. But I’m having fun. So why the hell not.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>I am sorry Cholesky factorisation, this blog is already too long and there is simply too much code I need to make nicer to even start on that journey. So it will happen in a later blog.↩︎</p></li>
<li id="fn2"><p>Which I have spent <em>zero</em> effort making pretty or taking to any level above scratch code↩︎</p></li>
<li id="fn3"><p>Like making it clear how this works for a <em>sparse</em> matrix compared to a general one↩︎</p></li>
<li id="fn4"><p>To the best of my knowledge, for example, we don’t know how to differentiate with respect to the order parameter <img src="https://latex.codecogs.com/png.latex?%5Cnu"> in the modified Bessel function of the second kind <img src="https://latex.codecogs.com/png.latex?K_%5Cnu(x)">. This is important in spatial statistics (and general GP stuff).↩︎</p></li>
<li id="fn5"><p><em>You</em> may need to convince yourself that this is possible. But it is. The cone of SPD matrices is very nice.↩︎</p></li>
<li id="fn6"><p>Don’t despair if you don’t recognise the third line, it’s the Neumann series, which gives an approximation to <img src="https://latex.codecogs.com/png.latex?(I%20+%20B)%5E%7B-1%7D"> whenever <img src="https://latex.codecogs.com/png.latex?%5C%7CB%5C%7C%20%5Cll%201">.↩︎</p></li>
<li id="fn7"><p>I recognise that I’ve not explained why everything needs to be JAX-traceable. Basically it’s because JAX does clever transformations to the Jacobian-vector product code to produce things like gradients. And the only way that can happen is if the JVP code can take abstract JAX types. So we need to make it traceable because we <em>really</em> want to have gradients!↩︎</p></li>
<li id="fn8"><p>Why not now, Daniel? Why not now? Well mostly because I might need to do some tweaking down the line, so I am not messing around until I am done.↩︎</p></li>
<li id="fn9"><p>This is the primary difference between implementing forward mode and reverse mode: there is only one output here. When we move onto reverse mode, we will output a tuple Jacobian-transpose-vector products, one for each input. You can see the structure of that reflected in the transposition rule we are going to write later.↩︎</p></li>
<li id="fn10"><p>Some things: Firstly your function needs to have the correct signature for this to work. Secondly, you could also use <code>ad.defjvp()</code> if you didn’t need to use the primal value to define the tangent (recall one of our tangents is <img src="https://latex.codecogs.com/png.latex?A%5E%7B-1%7D%5CDelta%20c">, where <img src="https://latex.codecogs.com/png.latex?c%20=%20A%5E%7B-1%7Db"> is the primal value).↩︎</p></li>
<li id="fn11"><p>This is because it is the efficient way of computing a gradient. Forward-mode autodiff chains together Jacobian-vector products in such a way that a single sweep of the entire function computes a single directional derivative. Reverse-mode autodiff chains together Jacobian-transpose-vector products (aka vector-Jacobian products) in such a way that a single sweep produces an entire gradient. (This happens at the cost of quite a bit of storage.) Depending on what you are trying to do, you usually want one or the other (or sometimes a clever combination of both).↩︎</p></li>
<li id="fn12"><p>or gradients or some sort of thing.↩︎</p></li>
<li id="fn13"><p>to be honest, in Stan we sometimes just don’t dick around with the forward-mode autodiff, because gradients are our bread and butter.↩︎</p></li>
<li id="fn14"><p>I mean, love you programming language people. But fuck me this paper could’ve been written in Babylonic cuneiform for all I understood it.↩︎</p></li>
<li id="fn15"><p>That is, if you fix a value of <img src="https://latex.codecogs.com/png.latex?y">, <img src="https://latex.codecogs.com/png.latex?f_y(x)%20=%20f(x,%20y)"> is not an affine function.↩︎</p></li>
<li id="fn16"><p>Details bore me.↩︎</p></li>
<li id="fn17"><p>In general, there might need to be a little bit of reshaping, but it’s equivalent.↩︎</p></li>
<li id="fn18"><p>Have you noticed this is like the third name I’ve used for this equivalent concept. Or the fourth? The code calls it a cotangent because that’s another damn synonym. I’m so very sorry.↩︎</p></li>
<li id="fn19"><p>not difficult, I’m just lazy and Mike does it better that I can. Read his paper.↩︎</p></li>
<li id="fn20"><p>For sparse matrices it’s just the non-zero mask of that.↩︎</p></li>
<li id="fn21"><p>Yes. I know. Central differences. I am what I am.↩︎</p></li>
<li id="fn22"><p>Some of the stuff I’ve done like normalising all of the inputs would help make these tests more stable. You should also just pick up Nick Higham’s backwards error analysis book to get some ideas of what your guarantees actually are in floating point, but I truly cannot be bothered. This is scratch code.↩︎</p></li>
<li id="fn23"><p>It should be slightly bigger, it isn’t.↩︎</p></li>
<li id="fn24"><p>The largest number <img src="https://latex.codecogs.com/png.latex?%5Cepsilon"> such that <code>float(1.0) == float(1.0 + machine_eps)</code> in single precision floating point.↩︎</p></li>
<li id="fn25"><p>Fun fact: I implemented this and the error never spawned, so I guess JAX is keeping the index arrays concrete, which is very nice of it!↩︎</p></li>
<li id="fn26"><p>actual damn numbers↩︎</p></li>
<li id="fn27"><p>We want that <a href="https://youtu.be/wrnUJoj14ag?t=288">auld triangle to go jingle bloody jangle</a>↩︎</p></li>
<li id="fn28"><p>We definitely do not want someone to write an eight hour, two part play that really seems to have the point of view that our Cholesky triangle deserved his downfall. Espoused while periodically reading deadshit tumblr posts. I mean, it would win a Tony. But we still do not want that.↩︎</p></li>
<li id="fn29"><p>There are more arguments. Read the help. This is what we need↩︎</p></li>
<li id="fn30"><p>What if I told you that this would work perfectly well if <img src="https://latex.codecogs.com/png.latex?A"> was a linear partial differential operator or an integral operator? Probably not much because why would you give a shit?↩︎</p></li>
<li id="fn31"><p>It can be more general, but it isn’t↩︎</p></li>
<li id="fn32"><p>I think there is a typo in the docs↩︎</p></li>
<li id="fn33"><p>Full disclosure: I screwed this up multiple times today and my tests caught it. What does that look like? The derivatives for <img src="https://latex.codecogs.com/png.latex?A"> being off, but everything else being good.↩︎</p></li>
<li id="fn34"><p>And some optional keyword arguments, but we don’t need to worry about those↩︎</p></li>
<li id="fn35"><p>This is not quite the same but similar to something that functional programming people call <em>currying</em>, which was named after famous Australian Olympic swimmer Lisa Curry.↩︎</p></li>
<li id="fn36"><p>and a shitload simpler!↩︎</p></li>
<li id="fn37"><p>And we have to store a bunch more. This is less of a big deal when <img src="https://latex.codecogs.com/png.latex?L"> is sparse, but for an ordinary linear solve, we’d be hauling around an extra <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(n%5E2)"> floats containing tangents for no good reason.↩︎</p></li>
<li id="fn38"><p>If you are worrying about the suppressed constant, remember that <img src="https://latex.codecogs.com/png.latex?A"> (and therefore <img src="https://latex.codecogs.com/png.latex?n"> and <img src="https://latex.codecogs.com/png.latex?%5C%7CA%5C%7C">) is fixed.↩︎</p></li>
<li id="fn39"><p>I think I’ve made this mistake about four times already while writing this blog. So I am going to write it <em>out</em>.↩︎</p></li>
<li id="fn40"><p>Not to “some of my best friends are physicists”, but I do love them. I just wished a man would talk about me the way they talk about being coordinate free. Rather than with the same ambivalence physicist use when speaking about a specific atlas. I’ve been listening to lesbian folk music all evening. I’m having feelings.↩︎</p></li>
<li id="fn41"><p>pronoun on purpose↩︎</p></li>
<li id="fn42"><p>Takahashi, K., Fagan, J., Chen, M.S., 1973. Formation of a sparse bus impedance matrix and its application to short circuit study. In: Eighth PICA Conference Proceedings.IEEE Power Engineering Society, pp.&nbsp;63–69 (Papers Presented at the 1973 Power Industry Computer Application Conference in Minneapolis, MN).↩︎</p></li>
<li id="fn43"><p>Thanks to Jerzy Baranowski for finding a very very bad LaTeX error that made these questions quite wrong!↩︎</p></li>
<li id="fn44"><p>Indeed, in the notation of post two <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BL%7D_i%20%5Ccap%20%5C%7Bi+1,%20%5Cdots,%20n%5C%7D%20%5Csubseteq%20%5Cmathcal%7BL%7D_j"> for all <img src="https://latex.codecogs.com/png.latex?i%20%5Cleq%20j">, where <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BL%7D_i"> is the set of non-zeros in the <img src="https://latex.codecogs.com/png.latex?i">th column of <img src="https://latex.codecogs.com/png.latex?L">.↩︎</p></li>
<li id="fn45"><p>The sparse matrix is stored as a dictionary <code>{(i,j): value}</code>, which is a very natural way to build a sparse matrix, even if its quite inefficient to do anything with it in that form.↩︎</p></li>
<li id="fn46"><p>You can’t just use <code>jnp.linalg.det()</code> because there’s a tendency towards <code>nan</code>s. (The true value is something like <code>r exp(250.49306761204593)</code>!)↩︎</p></li>
<li id="fn47"><p>Would it be less tedious if my implementation of the Cholesky was less shit? Yes. But hey. It was the first non-trivial piece of python code I’d written in more than a decade (or maybe ever?) so it is what it is. Anyway. I’m gonna run into the same problem I had in <a href="https://dansblog.netlify.app/posts/2022-05-14-jax-ing-a-sparse-cholesky-factorisation-part-3-in-an-ongoing-journey/">Part 3</a>↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2022,
  author = {Simpson, Dan},
  title = {Sparse Matrices 6: {To} Catch a Derivative, First You’ve Got
    to Think Like a Derivative},
  date = {2022-05-30},
  url = {https://dansblog.netlify.app/to-catch-a-derivative-first-youve-got-to-think-like-a-derivative},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2022" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2022. <span>“Sparse Matrices 6: To Catch a Derivative,
First You’ve Got to Think Like a Derivative.”</span> May 30, 2022. <a href="https://dansblog.netlify.app/to-catch-a-derivative-first-youve-got-to-think-like-a-derivative">https://dansblog.netlify.app/to-catch-a-derivative-first-youve-got-to-think-like-a-derivative</a>.
</div></div></section></div> ]]></description>
  <category>JAX</category>
  <category>Sparse matrices</category>
  <category>Autodiff</category>
  <guid>https://dansblog.netlify.app/posts/2022-05-20-to-catch-a-derivative-first-youve-got-to-think-like-a-derivative/to-catch-a-derivative-first-youve-got-to-think-like-a-derivative.html</guid>
  <pubDate>Sun, 29 May 2022 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2022-05-20-to-catch-a-derivative-first-youve-got-to-think-like-a-derivative/sob.JPG" medium="image"/>
</item>
<item>
  <title>Sparse Matrices 5: I bind you Nancy</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2022-05-18-sparse4-some-primatives/sparse4-some-primatives.html</link>
  <description><![CDATA[ 





<p>This is part <em>five</em> of our <a href="https://dansblog.netlify.app/posts/2022-03-22-a-linear-mixed-effects-model/">ongoing</a> <a href="https://dansblog.netlify.app/posts/2022-03-23-getting-jax-to-love-sparse-matrices/">series</a> <a href="https://dansblog.netlify.app/posts/2022-05-14-jax-ing-a-sparse-cholesky-factorisation-part-3-in-an-ongoing-journey/">on</a> <a href="https://dansblog.netlify.app/posts/2022-05-16-design-is-my-passion-sparse-matrices-part-four/">implementing</a> differentiable sparse linear algebra in JAX. In some sense this is the last boring post before we get to the derivatives. Was this post going to include the derivatives? It sure was but then I realised that a different choice was to go to bed so I can get up nice and early in the morning and vote in our election.</p>
<p>It goes without saying that before I split the posts, it was more than twice as long and I was nowhere near finished. So probably the split was a good choice.</p>
<section id="but-how-do-you-add-a-primative-to-jax" class="level2">
<h2 class="anchored" data-anchor-id="but-how-do-you-add-a-primative-to-jax">But how do you add a primative to JAX?</h2>
<p>Well, the first step is you <a href="https://jax.readthedocs.io/en/latest/notebooks/How_JAX_primitives_work.html">read the docs.</a></p>
<p>They tell you that you need to implement a few things:</p>
<ul>
<li>An implementation of the call with “abstract types”</li>
<li>An implementation of the call with concrete types (aka evaluation the damn function)</li>
</ul>
<p>Then,</p>
<ul>
<li><p>if you want your primitive to be JIT-able, you need to implement a compilation rule.</p></li>
<li><p>if you want your primitive to be batch-able, you need to implement a batching rule.</p></li>
<li><p>if you want your primitive to be differentiable, you need to implement the derivatives in a way that allows them to be propagated appropriately.</p></li>
</ul>
<p>In this post, we are going to do the first task: we are going to register JAX-traceable versions of the four main primitives we are going to need for our task. For the most part, the implementations here will be replaced with C++ bindings (because only a fool writes their own linear algebra code). But this is the beginning<sup>1</sup> of our serious journey into JAX.</p>
</section>
<section id="first-things-first-some-primitives" class="level2">
<h2 class="anchored" data-anchor-id="first-things-first-some-primitives">First things first, some primitives</h2>
<p>In JAX-speak, a primitive is a function that is JAX-traceable<sup>2</sup>. It is not necessary for every possible transformation to be implemented. In fact, today I’m not going to implement <em>any</em> transformations. That is a problem for future Dan.</p>
<p>We have enough today problems.</p>
<p>Because today we need to write four new primitives.</p>
<p>But first of all, let’s build up a test matrix so we can at least check that this code runs. This is the same example from <a href="https://dansblog.netlify.app/posts/2022-05-14-jax-ing-a-sparse-cholesky-factorisation-part-3-in-an-ongoing-journey/">blog 3</a>. You can tell my PhD was in numerical analysis because I fucking love a 2D Laplacian.</p>
<div class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> scipy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> sparse</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb1-3"></span>
<span id="cb1-4"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> make_matrix(n):</span>
<span id="cb1-5">    one_d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.diags([[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>n, [<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)], [<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb1-6">    A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (sparse.kronsum(one_d, one_d) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> sparse.eye(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>n)).tocsc()</span>
<span id="cb1-7">    A_lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.tril(A, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">format</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"csc"</span>)</span>
<span id="cb1-8">    A_index <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_lower.indices</span>
<span id="cb1-9">    A_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_lower.indptr</span>
<span id="cb1-10">    A_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_lower.data</span>
<span id="cb1-11">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (A_index, A_indptr, A_x, A)</span>
<span id="cb1-12"></span>
<span id="cb1-13">A_indices, A_indptr, A_x, A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span></code></pre></div>
</div>
<section id="primitive-one-a-1b" class="level3">
<h3 class="anchored" data-anchor-id="primitive-one-a-1b">Primitive one: <img src="https://latex.codecogs.com/png.latex?A%5E%7B-1%7Db"></h3>
<p>Because I’m feeling lazy today and we don’t actually need the Cholesky directly for any of this, I’m going to just use scipy. Why? Well, honestly, just because I’m lazy. But also so I can prove an important point: the implementation of the primitive <em>does not</em> need to be JAX traceable. So I’m implementing it in a way that is not now and will likely never be JAX traceable<sup>3</sup>.</p>
<p>First off, we need to write the solve function and bind it<sup>4</sup> to JAX. Specific information about what exactly some of these commands are doing can be found <a href="https://jax.readthedocs.io/en/latest/notebooks/How_JAX_primitives_work.html#primal-evaluation-rules">in the docs</a>, but the key thing is that there is <em>no reason</em> to dick around whit JAX types in any of these implementation functions. They are only ever called using (essentially) numpy<sup>5</sup> arrays. So we can just program like normal human beings.</p>
<div class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> jnp</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> core</span>
<span id="cb2-3"></span>
<span id="cb2-4">sparse_solve_p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> core.Primitive(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sparse_solve"</span>)</span>
<span id="cb2-5"></span>
<span id="cb2-6"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_solve(A_indices, A_indptr, A_x, b):</span>
<span id="cb2-7">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""A JAX traceable sparse solve"""</span></span>
<span id="cb2-8">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_solve_p.bind(A_indices, A_indptr, A_x, b)</span>
<span id="cb2-9"></span>
<span id="cb2-10"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@sparse_solve_p.def_impl</span></span>
<span id="cb2-11"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_solve_impl(A_indices, A_indptr, A_x, b):</span>
<span id="cb2-12">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""The implementation of the sparse solve. This is not JAX traceable."""</span></span>
<span id="cb2-13">  A_lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.csc_array((A_x, A_indices, A_indptr)) </span>
<span id="cb2-14">  </span>
<span id="cb2-15">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> A_lower.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> A_lower.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb2-16">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> A_lower.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> b.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb2-17">  </span>
<span id="cb2-18">  A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> A_lower.T <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> sparse.diags(A_lower.diagonal())</span>
<span id="cb2-19">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse.linalg.spsolve(A, b)</span>
<span id="cb2-20"></span>
<span id="cb2-21"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Check it works</span></span>
<span id="cb2-22">b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.ones(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb2-23">x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_solve(A_indices, A_indptr, A_x, b)</span>
<span id="cb2-24"></span>
<span id="cb2-25"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"The error in the sparse sovle is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>(b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> x))<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>The error in the sparse sovle is  0.00e+00</code></pre>
</div>
</div>
<p>In order to facilitate its transformations, JAX will occasionally<sup>6</sup> call functions using <em>abstract</em> data types. These data types know the shape of the inputs and their data type. So our next step is to specialise the <code>sparse_solve</code> function for this case. We might as well do some shape checking while we’re just hanging around. But the essential part of this function is just saying that the output of <img src="https://latex.codecogs.com/png.latex?A%5E%7B-1%7Db"> is the same shape as <img src="https://latex.codecogs.com/png.latex?b"> (which is usually a vector, but the code is no more complex if it’s a [dense] matrix).</p>
<div class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax._src <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> abstract_arrays</span>
<span id="cb4-2"></span>
<span id="cb4-3"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@sparse_solve_p.def_abstract_eval</span></span>
<span id="cb4-4"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_solve_abstract_eval(A_indices, A_indptr, A_x, b):</span>
<span id="cb4-5">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> A_indices.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> A_x.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb4-6">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> b.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> A_indptr.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb4-7">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> abstract_arrays.ShapedArray(b.shape, b.dtype)</span></code></pre></div>
</div>
</section>
</section>
<section id="primitive-two-the-triangular-solve" class="level2">
<h2 class="anchored" data-anchor-id="primitive-two-the-triangular-solve">Primitive two: The triangular solve</h2>
<p>This is very similar. We need to have a function that computes <img src="https://latex.codecogs.com/png.latex?L%5E%7B-1%7Db"> and <img src="https://latex.codecogs.com/png.latex?L%5E%7B-T%7Db">. The extra wrinkle from the last time around is that we need to pass a keyword argument <code>transpose</code> to indicate which system should be solved.</p>
<p>Once again, we are going to use the appropriate <code>scipy</code> function (in this case <code>sparse.linalg.spsolve_triangular</code>). There’s a little bit of casting between sparse matrix types here as <code>sparse.linalg.spsolve_triangular</code> assumes the matrix is in CSR format.</p>
<div class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb5-1">sparse_triangular_solve_p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> core.Primitive(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sparse_triangular_solve"</span>)</span>
<span id="cb5-2"></span>
<span id="cb5-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_triangular_solve(L_indices, L_indptr, L_x, b, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>, transpose: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">bool</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>):</span>
<span id="cb5-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""A JAX traceable sparse  triangular solve"""</span></span>
<span id="cb5-5">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_triangular_solve_p.bind(L_indices, L_indptr, L_x, b, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> transpose)</span>
<span id="cb5-6"></span>
<span id="cb5-7"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@sparse_triangular_solve_p.def_impl</span></span>
<span id="cb5-8"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_triangular_solve_impl(L_indices, L_indptr, L_x, b, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>):</span>
<span id="cb5-9">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""The implementation of the sparse triangular solve. This is not JAX traceable."""</span></span>
<span id="cb5-10">  L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.csc_array((L_x, L_indices, L_indptr)) </span>
<span id="cb5-11">  </span>
<span id="cb5-12">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> L.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb5-13">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> L.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> b.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb5-14">  </span>
<span id="cb5-15">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> transpose:</span>
<span id="cb5-16">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse.linalg.spsolve_triangular(L.T, b, lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb5-17">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span>:</span>
<span id="cb5-18">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse.linalg.spsolve_triangular(L.tocsr(), b, lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span></code></pre></div>
</div>
<p>Now we can check if it works. We can use the fact that our matrix <code>(A_indices, A_indptr, A_x)</code> is lower-triangular (because we only store the lower triangle) to make our test case.</p>
<div class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb6-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">## Check if it works</span></span>
<span id="cb6-2">b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.standard_normal(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb6-3">x1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve(A_indices, A_indptr, A_x, b)</span>
<span id="cb6-4">x2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_triangular_solve(A_indices, A_indptr, A_x, b, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb6-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"""Error in trianglular solve: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>(b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> sparse.tril(A) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> x1))<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span></span>
<span id="cb6-6"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Error in triangular transpose solve: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>(b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> sparse.triu(A) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> x2))<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Error in trianglular solve:  3.53e-15
Error in triangular transpose solve:  5.08e-15</code></pre>
</div>
</div>
<p>And we can also do the abstract evaluation.</p>
<div class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb8-1"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@sparse_triangular_solve_p.def_abstract_eval</span></span>
<span id="cb8-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_triangular_solve_abstract_eval(L_indices, L_indptr, L_x, b, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>, transpose <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>):</span>
<span id="cb8-3">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> L_indices.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_x.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb8-4">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> b.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_indptr.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb8-5">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> abstract_arrays.ShapedArray(b.shape, b.dtype)</span></code></pre></div>
</div>
<p>Great! Now on to the next one!</p>
<section id="primitive-three-the-sparse-cholesky" class="level3">
<h3 class="anchored" data-anchor-id="primitive-three-the-sparse-cholesky">Primitive three: The sparse cholesky</h3>
<p>Ok. This one is gonna be a pain in the arse. But we need to do it. Why? Because we are going to need a JAX-traceable version further on down the track.</p>
<p>The issue here is that the non-zero pattern of the Cholesky decomposition is computed <em>on the fly</em>. This is absolutely not allowed in JAX. It <em>must</em> know the shape of all things at the moment it is called.</p>
<p>This is going to make for a somewhat shitty user experience for this function. It’s unavoidable with JAX designed<sup>7</sup> the way it is.</p>
<p>The code in <code>jax.experimental.sparse.bcoo.fromdense</code> has this exact problem. In their case, they are turning a dense matrix into a sparse matrix and they can’t know until they see the dense matrix how many non-zeros there are. So they do the sensible thing and ask the user to specify it. They do this using the <code>nse</code> keyword parameter. If you’re curious what <code>nse</code> stands for, it turns out it’s not “non-standard evaluation” but rather “number of specified entries”. Most other systems use the abbreviation <code>nnz</code> for “number of non-zeros”, but I’m going to stick with the JAX notation.</p>
<p>The one little thing we need to add to this code is a guard to make sure that if the <code>sparse_cholesky</code> function is called without specifying</p>
<div class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb9-1">sparse_cholesky_p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> core.Primitive(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sparse_cholesky"</span>)</span>
<span id="cb9-2"></span>
<span id="cb9-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_cholesky(A_indices, A_indptr, A_x, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>, L_nse: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>):</span>
<span id="cb9-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""A JAX traceable sparse cholesky decomposition"""</span></span>
<span id="cb9-5">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> L_nse <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>:</span>
<span id="cb9-6">    err_string <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You need to pass a value to L_nse when doing fancy sparse_cholesky."</span></span>
<span id="cb9-7">    _ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> core.concrete_or_error(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, A_x, err_string)</span>
<span id="cb9-8">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_cholesky_p.bind(A_indices, A_indptr, A_x, L_nse <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_nse)</span>
<span id="cb9-9"></span>
<span id="cb9-10"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@sparse_cholesky_p.def_impl</span></span>
<span id="cb9-11"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_cholesky_impl(A_indices, A_indptr, A_x, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>, L_nse <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>):</span>
<span id="cb9-12">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""The implementation of the sparse cholesky This is not JAX traceable."""</span></span>
<span id="cb9-13">  </span>
<span id="cb9-14">  L_indices, L_indptr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _symbolic_factor(A_indices, A_indptr)</span>
<span id="cb9-15">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> L_nse <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>:</span>
<span id="cb9-16">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">assert</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_indices) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> nse</span>
<span id="cb9-17">    </span>
<span id="cb9-18">  L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _structured_copy(A_indices, A_indptr, A_x, L_indices, L_indptr)</span>
<span id="cb9-19">  L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _sparse_cholesky_impl(L_indices, L_indptr, L_x)</span>
<span id="cb9-20">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_indices, L_indptr, L_x</span></code></pre></div>
</div>
<p>The rest of the code is just the sparse Cholesky code from <a href="https://dansblog.netlify.app/posts/2022-03-23-getting-jax-to-love-sparse-matrices/">blog 2</a> and I’ve hidden it under the fold. (You would think I would package this up properly, but I simply haven’t. Why not? Who knows<sup>8</sup>.)</p>
<details>
<summary>
Click here to see the implementation
</summary>
<div class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb10-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> _symbolic_factor(A_indices, A_indptr):</span>
<span id="cb10-2">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Assumes A_indices and A_indptr index the lower triangle of $A$ ONLY.</span></span>
<span id="cb10-3">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb10-4">  L_sym <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [np.array([], dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n)]</span>
<span id="cb10-5">  children <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [np.array([], dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n)]</span>
<span id="cb10-6">  </span>
<span id="cb10-7">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n):</span>
<span id="cb10-8">    L_sym[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_indices[A_indptr[j]:A_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb10-9">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> child <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> children[j]:</span>
<span id="cb10-10">      tmp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_sym[child][L_sym[child] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> j]</span>
<span id="cb10-11">      L_sym[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.unique(np.append(L_sym[j], tmp))</span>
<span id="cb10-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_sym[j]) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:</span>
<span id="cb10-13">      p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_sym[j][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb10-14">      children[p] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.append(children[p], j)</span>
<span id="cb10-15">        </span>
<span id="cb10-16">  L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb10-17">  L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum([<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> L_sym])</span>
<span id="cb10-18">  L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.concatenate(L_sym)</span>
<span id="cb10-19">  </span>
<span id="cb10-20">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_indices, L_indptr</span>
<span id="cb10-21"></span>
<span id="cb10-22"></span>
<span id="cb10-23"></span>
<span id="cb10-24"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> _structured_copy(A_indices, A_indptr, A_x, L_indices, L_indptr):</span>
<span id="cb10-25">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb10-26">  L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_indices))</span>
<span id="cb10-27">  </span>
<span id="cb10-28">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n):</span>
<span id="cb10-29">    copy_idx <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.nonzero(np.in1d(L_indices[L_indptr[j]:L_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]],</span>
<span id="cb10-30">                                  A_indices[A_indptr[j]:A_indptr[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]))[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb10-31">    L_x[L_indptr[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> copy_idx] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_x[A_indptr[j]:A_indptr[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb10-32">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_x</span>
<span id="cb10-33"></span>
<span id="cb10-34"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> _sparse_cholesky_impl(L_indices, L_indptr, L_x):</span>
<span id="cb10-35">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb10-36">  descendant <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [[] <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n)]</span>
<span id="cb10-37">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n):</span>
<span id="cb10-38">    tmp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_x[L_indptr[j]:L_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb10-39">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> bebe <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> descendant[j]:</span>
<span id="cb10-40">      k <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bebe[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb10-41">      Ljk<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_x[bebe[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb10-42">      pad <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.nonzero(                                                       <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb10-43">          L_indices[L_indptr[k]:L_indptr[k<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_indices[L_indptr[j]])[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb10-44">      update_idx <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.nonzero(np.in1d(                                        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb10-45">                    L_indices[L_indptr[j]:L_indptr[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]],                     <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb10-46">                    L_indices[(L_indptr[k] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> pad):L_indptr[k<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]))[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb10-47">      tmp[update_idx] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tmp[update_idx] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>                                     <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb10-48">                        Ljk <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> L_x[(L_indptr[k] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> pad):L_indptr[k <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb10-49">            </span>
<span id="cb10-50">    diag <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.sqrt(tmp[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb10-51">    L_x[L_indptr[j]] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> diag</span>
<span id="cb10-52">    L_x[(L_indptr[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>):L_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tmp[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> diag</span>
<span id="cb10-53">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> idx <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(L_indptr[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, L_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]):</span>
<span id="cb10-54">      descendant[L_indices[idx]].append((j, idx))</span>
<span id="cb10-55">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_x</span></code></pre></div>
</div>
</details>
<p>Once again, we can check to see if this worked!</p>
<div class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb11-1">L_indices, L_indptr, L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_cholesky(A_indices, A_indptr, A_x)</span>
<span id="cb11-2">L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.csc_array((L_x, L_indices, L_indptr))</span>
<span id="cb11-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"The error in the sparse cholesky is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>((A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> L.T).todense()))<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>The error in the sparse cholesky is  1.02e-13</code></pre>
</div>
</div>
<p>And, of course, we can do abstract evaluation. Here is where we actually need to use <code>L_nse</code> to work out the dimension of our output.</p>
<div class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb13-1"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@sparse_cholesky_p.def_abstract_eval</span></span>
<span id="cb13-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_cholesky_abstract_eval(A_indices, A_indptr, A_x, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>, L_nse):</span>
<span id="cb13-3">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> core.ShapedArray((L_nse,), A_indices.dtype),                   <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb13-4">         core.ShapedArray(A_indptr.shape, A_indptr.dtype),             <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb13-5">         core.ShapedArray((L_nse,), A_x.dtype)</span></code></pre></div>
</div>
</section>
</section>
<section id="primitive-four-loga" class="level2">
<h2 class="anchored" data-anchor-id="primitive-four-loga">Primitive four: <img src="https://latex.codecogs.com/png.latex?%5Clog(%7CA%7C)"></h2>
<p>And now we have our final primitive: the log determinant! Wow. So much binding. For this one, we compute the Cholesky factorisation and note that <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%7CA%7C%20=%20%7CLL%5ET%7C%20=%20%7CL%7C%7CL%5ET%7C%20=%20%7CL%7C%5E2.%0A%5Cend%7Balign*%7D"> If we successfully remember that the determinant of a triangular matrix is the product of its diagonal entries, we have a formula we can implement.</p>
<p>Same deal as last time.</p>
<div class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb14-1">sparse_log_det_p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> core.Primitive(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sparse_log_det"</span>)</span>
<span id="cb14-2"></span>
<span id="cb14-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_log_det(A_indices, A_indptr, A_x):</span>
<span id="cb14-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""A JAX traceable sparse log-determinant"""</span></span>
<span id="cb14-5">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> sparse_log_det_p.bind(A_indices, A_indptr, A_x)</span>
<span id="cb14-6"></span>
<span id="cb14-7"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@sparse_log_det_p.def_impl</span></span>
<span id="cb14-8"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_log_det_impl(A_indices, A_indptr, A_x):</span>
<span id="cb14-9">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""The implementation of the sparse log-determinant. This is not JAX traceable.</span></span>
<span id="cb14-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  """</span></span>
<span id="cb14-11">  L_indices, L_indptr, L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_cholesky_impl(A_indices, A_indptr, A_x)</span>
<span id="cb14-12">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(np.log(L_x[L_indptr[:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]))</span></code></pre></div>
</div>
<p>A canny reader may notice that I’m assuming that the first element in each column is the diagonal. This will be true as long as the diagonal elements of <img src="https://latex.codecogs.com/png.latex?L"> are non-zero, which is true as long as <img src="https://latex.codecogs.com/png.latex?A"> is symmetric positive definite.</p>
<p>Let’s test<sup>9</sup> it out.</p>
<div class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb15-1">ld <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_log_det(A_indices, A_indptr, A_x)</span>
<span id="cb15-2">LU <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.linalg.splu(A)</span>
<span id="cb15-3">ld_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(np.log(LU.U.diagonal()))</span>
<span id="cb15-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"The error in the log-determinant is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>ld <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> ld_true<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: .2e}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>The error in the log-determinant is  0.00e+00</code></pre>
</div>
</div>
<p>Finally, we can do the abstract evaluation.</p>
<div class="cell" data-execution_count="13">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb17-1"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@sparse_log_det_p.def_abstract_eval</span></span>
<span id="cb17-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_log_det_abstract_eval(A_indices, A_indptr, A_x):</span>
<span id="cb17-3">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> core.ShapedArray((<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,), A_x.dtype)</span></code></pre></div>
</div>
</section>
<section id="where-are-we-now-but-nowhere" class="level2">
<h2 class="anchored" data-anchor-id="where-are-we-now-but-nowhere">Where are we now but nowhere?</h2>
<p>So we are done for today. Our next step will be to implement all of the bits that are needed to make the derivatives work. So in the next instalment we will differentiate log-determinants, Cholesky decompositions, and all kinds of other fun things.</p>
<p>It should be a blast.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>The second half of this post is half written but, to be honest, I want to go to bed more than I want to implement more derivatives, so I’m splitting the post.↩︎</p></li>
<li id="fn2"><p>aka JAX can map out how the pieces of the function go together and it can then use that map to make its weird transformations↩︎</p></li>
<li id="fn3"><p>But mostly because although I’m going to have to implement the Cholesky and triangular solves later on down the line, I’m writing this in order and I don’t wanna.↩︎</p></li>
<li id="fn4"><p>The JAX docs don’t use decorators for their bindings but I use decorators because I like decorators.↩︎</p></li>
<li id="fn5"><p>Something something duck type. They’re arrays with numbers in them that work in numpy and scipy. Get off my arse.↩︎</p></li>
<li id="fn6"><p>This is mostly for JIT, so it’s not necessary today, but to be very honest it’s the only easy thing to do here and I’m not above giving myself a participation trophy.↩︎</p></li>
<li id="fn7"><p>This is a … fringe problem in JAX-land, so it makes sense that there is a less than beautiful solution to the problem. I think this would be less of a design problem in Stan, where it’s possible to make the number of unknowns in the autodiff tree depend on <code>int</code> arrays is a complex way.↩︎</p></li>
<li id="fn8"><p>Well, me. I’m who knows. I’m still treating this like scratch code in a notepad. Although we are moving towards the point where I’m going to have to set everything out properly. Maybe that’s the next post?↩︎</p></li>
<li id="fn9"><p>Full disclosure: first time out I forgot to multiply by two. This is why we test.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2022,
  author = {Simpson, Dan},
  title = {Sparse {Matrices} 5: {I} Bind You {Nancy}},
  date = {2022-05-20},
  url = {https://dansblog.netlify.app/2022-05-18-sparse4-some-primatives},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2022" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2022. <span>“Sparse Matrices 5: I Bind You Nancy.”</span>
May 20, 2022. <a href="https://dansblog.netlify.app/2022-05-18-sparse4-some-primatives">https://dansblog.netlify.app/2022-05-18-sparse4-some-primatives</a>.
</div></div></section></div> ]]></description>
  <category>Sparse matrices</category>
  <category>Sparse Cholesky factorisation</category>
  <category>Python</category>
  <category>JAX</category>
  <guid>https://dansblog.netlify.app/posts/2022-05-18-sparse4-some-primatives/sparse4-some-primatives.html</guid>
  <pubDate>Thu, 19 May 2022 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2022-05-18-sparse4-some-primatives/nancy.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Sparse Matrices 4: Design is my passion</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2022-05-16-design-is-my-passion-sparse-matrices-part-four/design-is-my-passion-sparse-matrices-part-four.html</link>
  <description><![CDATA[ 





<p>This is the fourth post in a series where I try to squeeze autodiffable sparse matrices into JAX with the aim to speed up some model classes in PyMC. So far, I have:</p>
<ul>
<li>Outlined the problem <a href="https://dansblog.netlify.app/posts/2022-03-22-a-linear-mixed-effects-model/">Post 1</a></li>
<li>Worked through a basic python implementation of a sparse Cholesky decomposition <a href="https://dansblog.netlify.app/posts/2022-03-23-getting-jax-to-love-sparse-matrices/">Post 2</a></li>
<li>Failed to get JAX to transform some numpy code into efficient, JIT-compileable code <a href="https://dansblog.netlify.app/posts/2022-05-14-jax-ing-a-sparse-cholesky-factorisation-part-3-in-an-ongoing-journey/">Post 3</a></li>
</ul>
<p>I am in the process of writing a blog on building new primitives<sup>1</sup> into JAX, but as I was doing it I accidentally wrote a long section about options for exposing sparse matrices. It really didn’t fit very well into that blog, so here it is.</p>
<section id="what-are-we-trying-to-do-here" class="level2">
<h2 class="anchored" data-anchor-id="what-are-we-trying-to-do-here">What are we trying to do here?</h2>
<p>If you recall from <a href="">the first blog</a>, we need to be able to compute the value and gradients of the (un-normalised) log-posterior <img src="https://latex.codecogs.com/png.latex?%0A%5Clog(p(%5Ctheta%20%5Cmid%20y))%20=%20%5Cfrac%7B1%7D%7B2%7D%20%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%5ETA%5ETW%5E%7B-1%7Dy%20+%20%5Cfrac%7B1%7D%7B2%7D%20%5Clog(%7CQ(%5Ctheta)%7C)%20-%20%5Cfrac%7B1%7D%7B2%7D%5Clog(%7CQ_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%7C)%20+%20%5Ctext%7Bconst%7D,%0A"> where <img src="https://latex.codecogs.com/png.latex?Q(%5Ctheta)"> is a sparse matrix, and <img src="https://latex.codecogs.com/png.latex?%0A%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%20=%20%5Cfrac%7B1%7D%7B%5Csigma%5E2%7D%20Q_%7Bu%5Cmid%20y,%5Ctheta%7D(%5Ctheta)%5E%7B-1%7D%20A%5ETW%5E%7B-1%7Dy.%0A"></p>
<p>Overall, our task is to design a system where this un-normalised log-posterior can be evaluated and differentiated efficiently. As with all design problems, there are a lot of different ways that we can implement it. They share a bunch of similarities, so we will actually end up implementing the guts of all of the systems.</p>
<p>To that end, let’s think of all of the ways we can implement our target<sup>2</sup>.</p>
</section>
<section id="option-1-the-direct-design" class="level2">
<h2 class="anchored" data-anchor-id="option-1-the-direct-design">Option 1: The direct design</h2>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?A%20%5Crightarrow%20%5Clog(%7CA%7C)">, for a sparse, symmetric positive definite matrix <img src="https://latex.codecogs.com/png.latex?A"></li>
<li><img src="https://latex.codecogs.com/png.latex?(A,b)%20%5Crightarrow%20A%5E%7B-1%7Db">, for a sparse, symmetric positive definite matrix <img src="https://latex.codecogs.com/png.latex?A"> and a vector <img src="https://latex.codecogs.com/png.latex?b"></li>
</ul>
<p>This option is, in some sense, the most straightforward. We implement primitives for both of the major components of our target and combine them using existing JAX primitives (like addition, scalar multiplication, and dot products).</p>
<p>This is a bad idea.</p>
<p>The problem is that both primitives require the Cholesky decomposition of <img src="https://latex.codecogs.com/png.latex?A">, so if we take this route we might end up computing an extra Cholesky decomposition. And you may ask yourself: <em>what’s an extra Cholesky decomposition between friends?</em></p>
<p>Well, Jonathan, it’s the most expensive operation we are doing for these models, so perhaps we should avoid the 1/3 increase in running time!</p>
<p>There are some ways around this. We might implement sparse, symmetric positive definite matrices as a class that, upon instantiation, computes the Cholesky factorisation.</p>
<div class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">class</span> SPDSparse: </span>
<span id="cb1-2">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">__init__</span>(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>, A_indices, A_indptr, A_x):</span>
<span id="cb1-3">    <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>._perm, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>._iperm <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _find_perm(A_indices, A_indptr)</span>
<span id="cb1-4">    <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>._A_indices, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>._A_indptr, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>._A_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _twist(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>._perm, A_indices, A_indptr, A_x)</span>
<span id="cb1-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">try</span>:</span>
<span id="cb1-6">      <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>._L_indices, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>._L_indptr, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>._L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _compute_cholesky()</span>
<span id="cb1-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">except</span> SPDError:</span>
<span id="cb1-8">      <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Matrix is not symmetric positive definite to machine precision."</span>)</span>
<span id="cb1-9">  </span>
<span id="cb1-10">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> _find_perm(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>, indices, indptr):</span>
<span id="cb1-11">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""Finds the best fill-reducing permutation"""</span></span>
<span id="cb1-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">raise</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">NotImplemented</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"_find_perm"</span>)</span>
<span id="cb1-13">  </span>
<span id="cb1-14">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> _twist(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>, perm, indices, indptr, x):</span>
<span id="cb1-15">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""Returns A[perm, perm]"""</span></span>
<span id="cb1-16">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">raise</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">NotImplemented</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"_twist"</span>)</span>
<span id="cb1-17">  </span>
<span id="cb1-18">  <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> _compute_cholesky():</span>
<span id="cb1-19">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""Compute the Cholesky decomposition of the permuted matrix"""</span></span>
<span id="cb1-20">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">raise</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">NotImplemented</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"_compute_cholesky"</span>)</span>
<span id="cb1-21">  </span>
<span id="cb1-22">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Not pictured: a whole forest of gets</span></span></code></pre></div>
</div>
<p>In contexts where we need a Cholesky decomposition of every SPD matrix we instantiate, this design might be useful. It might also be useful to write a constructor that takes a <code>jax.experimental.CSCMatrix</code>, so that we could build a differentiable matrix and then just absolutely <em>slam</em> it into our filthy little Cholesky context<sup>3</sup>.</p>
<p>In order to use this type of pattern with JAX, we would need to register it as a Pytree class, which involves writing flatten and unflatten routines. The <a href="https://github.com/google/jax/blob/712ab66f2855acf8a3f3c3977f80edb4447e7644/jax/experimental/sparse/csr.py">CSCSparse class</a> is a good example of how to implement this type of thing. Some care would be needed to make sure the differentiation rules don’t try to do something stupid like differentiate with respect to <code>self.iperm</code> or <code>self.L_x</code>. This is beyond the extra <a href="https://github.com/google/jax/blob/712ab66f2855acf8a3f3c3977f80edb4447e7644/jax/experimental/sparse/ad.py">autodiff sugar</a> in the experimental sparse library.</p>
<p>Implementing this would be quite an undertaking, but it’s certainly an option. The most obvious downside of this pattern (plus a fully functional sparse matrix class) is that it may end up being quite delicate to have this volume of auxillary information<sup>4</sup> in a pytree while making everything differentiate properly. This doesn’t seem to be how most parts of JAX has been built. There are also a couple of <a href="https://jax.readthedocs.io/en/latest/pytrees.html#custom-pytrees-and-initialization">sharp corners</a> we could run into with instantiation.</p>
<p>To close this out, it’s worth noting a variation on this pattern that comes up: the optional Cholesky. The idea is that rather than compute the permutations and the Cholesky factorisation on initialisation, we store a boolean flag in the class <code>is_cholesky</code> and, whenever we need a Cholesky factor we check <code>is_cholesky</code> and if it’s <code>True</code> we use the computed Cholesky factor and otherwise we compute it and set <code>is_cholesky = True</code>.</p>
<p>This pattern introduces state to the object: it is no longer <em>set and forget</em>. This will not work within JAX<sup>5</sup>, where objects need to be immutable. It’s also not an exceptional pattern in general: it is considerably easier to debug code with stateless objects.</p>
</section>
<section id="option-2-implement-all-of-the-combinations-of-functions-that-we-need" class="level2">
<h2 class="anchored" data-anchor-id="option-2-implement-all-of-the-combinations-of-functions-that-we-need">Option 2: Implement all of the combinations of functions that we need</h2>
<p>Rather than dicking around with classes, we could just implement primitives that compute</p>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?A%20%5Crightarrow%20%5Clog(%7CA%7C)">, for a sparse, symmetric positive definite matrix <img src="https://latex.codecogs.com/png.latex?A"></li>
<li><img src="https://latex.codecogs.com/png.latex?(A,b,%20c)%20%5Crightarrow%20%5Clog(%7CA%7C)%20+%20c%5ETA%5E%7B-1%7Db">, for a sparse, symmetric positive definite matrix <img src="https://latex.codecogs.com/png.latex?A"> and vectors <img src="https://latex.codecogs.com/png.latex?b"> and <img src="https://latex.codecogs.com/png.latex?c">.</li>
</ul>
<p>This is exactly what we need to do our task and nothing more. It won’t result in any unnecessary Cholesky factors. It doesn’t need us to store computed Cholesky factors. We can simply eat, prey, love.</p>
<p>The obvious downside to this option is it’s going to just massively expand the codebase if there are more things that we want to do. It’s also not obvious why we would do this instead of just making <img src="https://latex.codecogs.com/png.latex?%5Clog%20p(%5Ctheta%20%5Cmid%20y)"> a primitive<sup>6</sup>.</p>
</section>
<section id="option-3-just-compute-the-cholesky" class="level2">
<h2 class="anchored" data-anchor-id="option-3-just-compute-the-cholesky">Option 3: Just compute the Cholesky</h2>
<p>Our third option is to simply compute (and differentiate) the Cholesky factor directly. We can then compute <img src="https://latex.codecogs.com/png.latex?%5Clog(%7CA%7C)"> and <img src="https://latex.codecogs.com/png.latex?A%5E%7B-1%7Db"> through a combination of differentiable operations on the elements of the Cholesky factor (for <img src="https://latex.codecogs.com/png.latex?%5Clog(%7CA%7C)">) and triangular linear solves <img src="https://latex.codecogs.com/png.latex?L%5E%7B-1%7Db"> and <img src="https://latex.codecogs.com/png.latex?L%5E%7B-T%7Dc"> (for <img src="https://latex.codecogs.com/png.latex?A%5E%7B-1%7Db">).</p>
<p>Hence we require the following two<sup>7</sup> JAX primitives:</p>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?A%20%5Crightarrow%20%5Coperatorname%7Bchol%7D(A)">, where <img src="https://latex.codecogs.com/png.latex?%5Coperatorname%7Bchol%7D(A)"> is the Cholesky factor of <img src="https://latex.codecogs.com/png.latex?A">,</li>
<li><img src="https://latex.codecogs.com/png.latex?(L,%20b)%20%5Crightarrow%20L%5E%7B-1%7D%20b"> and <img src="https://latex.codecogs.com/png.latex?(L,%20b)%20%5Crightarrow%20L%5E%7B-T%7Db"> for lower-triangular sparse matrix <img src="https://latex.codecogs.com/png.latex?L">.</li>
</ul>
<p>This is pretty close to how the dense version of this function would be implemented.</p>
<p>There are two little challenges with this pattern:</p>
<ol type="1">
<li><p>We are adding another large-ish node <img src="https://latex.codecogs.com/png.latex?L"> to our autodiff tree. As we saw in other patterns, this is unnecessary storage for our problem at hand.</p></li>
<li><p>The number of non-zeros in <img src="https://latex.codecogs.com/png.latex?L"> is a function of the non-zero pattern of <img src="https://latex.codecogs.com/png.latex?A">. This means the Cholesky will need to be implemented very carefully to ensure that its traceable enough.</p></li>
</ol>
<p>The second point here might actually be an issue. To be honest, I have no idea. I think maybe it’s fine? But I need to do a close read on <a href="https://jax.readthedocs.io/en/latest/notebooks/How_JAX_primitives_work.html#reverse-differentiation">the adding primitives doc</a>. Essentially, as long as the abstract traces just need shapes but not dimensions, we should be ok.</p>
<p>For adding this to something like Stan, however, we will likely need to do some extra work to make sure we know the number of parameters.</p>
<p>The advantage of this type of design pattern is that it gives users the flexibility to do whatever perverted thing they want to do with the Cholesky triangle. For example, they might want to do a centring/non-centring transformation. In Option 1, we would need to write explicit functions to let them do that (not difficult, but there’s a lot of code to write, which has the annoying tendency to increases the maintainence burden).</p>
</section>
<section id="option-4-functors" class="level2">
<h2 class="anchored" data-anchor-id="option-4-functors">Option 4: Functors!</h2>
<p>A slightly wilder design pattern would be to abandon sparse matrices and just make functions <code>A(theta, ...)</code> that return a sparse matrix. If that function is differentiable wrt its first argument, then we can build this whole thing up that way.</p>
<p>In reality, the only way I can think of to implement this pattern would be to implement a whole differentiable sparse matrix arithmetic (make operations like <code>alpha * A + beta * B</code>, <code>C * D</code> work for sparse matrices). At which point, we’ve basically just recreated option 1.</p>
<p>I’m really only bringing up functors because unlike sparse matrices, it is actually a pretty good model for implementing Gaussian Processes with general covariance functions. There’s a little bit of the idea in <a href="https://github.com/stan-dev/math/issues/1011">this Stan issue</a> that, to my knowledge, hasn’t gone anywhere. More recently, a variant has been used successfully in the (as yet un-merged) <a href="https://github.com/stan-dev/math/tree/try-laplace_student/stan/math/laplace">Laplace approximation feature</a> in Stan.</p>
</section>
<section id="which-one-should-we-use" class="level2">
<h2 class="anchored" data-anchor-id="which-one-should-we-use">Which one should we use?</h2>
<p>We don’t really need to make that choice yet. So we won’t.</p>
<p>But personally, I like option 1. I expect everyone else on earth would prefer option 3. For densities that see a lot of action, it would make quite a bit of sense to consider making that density a primitive when it has a complex derivative (<em>à la</em> option 2).</p>
<p>But for now, let’s park this and start getting in on the implementations.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>functions that have explicit transformations written for them (eg explicit instruction on how to JIT or how to differentiate)↩︎</p></li>
<li id="fn2"><p>I get sick of typing “unnormalised log-posterior”↩︎</p></li>
<li id="fn3"><p>I am sorry. I have had some wine.↩︎</p></li>
<li id="fn4"><p>Permuations, cholesky, etc↩︎</p></li>
<li id="fn5"><p>This also won’t work in Stan, because all Stan objects are stateless.↩︎</p></li>
<li id="fn6"><p>This is actually what Stan has done for a bunch of its <a href="https://mc-stan.org/docs/2_29/functions-reference/poisson-log-glm.html">GLM-type models</a>. It’s very efficient and fast. But with a maintainance burden.↩︎</p></li>
<li id="fn7"><p>or three, but you can implement both triangular solves in one function↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2022,
  author = {Simpson, Dan},
  title = {Sparse {Matrices} 4: {Design} Is My Passion},
  date = {2022-05-16},
  url = {https://dansblog.netlify.app/2022-05-16-design-is-my-passion-sparse-matrices-part-four},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2022" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2022. <span>“Sparse Matrices 4: Design Is My
Passion.”</span> May 16, 2022. <a href="https://dansblog.netlify.app/2022-05-16-design-is-my-passion-sparse-matrices-part-four">https://dansblog.netlify.app/2022-05-16-design-is-my-passion-sparse-matrices-part-four</a>.
</div></div></section></div> ]]></description>
  <category>Sparse matrices</category>
  <category>Sparse Cholesky factorisation</category>
  <category>Python</category>
  <category>JAX</category>
  <guid>https://dansblog.netlify.app/posts/2022-05-16-design-is-my-passion-sparse-matrices-part-four/design-is-my-passion-sparse-matrices-part-four.html</guid>
  <pubDate>Sun, 15 May 2022 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2022-05-16-design-is-my-passion-sparse-matrices-part-four/scrod.JPG" medium="image"/>
</item>
<item>
  <title>Sparse Matrices 3: Failing at JAX</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2022-05-14-jax-ing-a-sparse-cholesky-factorisation-part-3-in-an-ongoing-journey/jax-ing-a-sparse-cholesky-factorisation-part-3-in-an-ongoing-journey.html</link>
  <description><![CDATA[ 





<p>This is part three of an ongoing exercise in hubris. <a href="https://dansblog.netlify.app/posts/2022-03-22-a-linear-mixed-effects-model/">Part one is here.</a> <a href="https://dansblog.netlify.app/posts/2022-03-23-getting-jax-to-love-sparse-matrices/">Part two is here.</a> The overall aim of this series of posts is to look at how sparse Cholesky factorisations work, how JAX works, and how to marry the two with the ultimate aim of putting a bit of sparse matrix support into PyMC, which should allow for faster inference in linear mixed models, Gaussian spatial models. And hopefully, if anyone ever gets around to putting the Laplace approximation in, all sorts of GLMMs and non-Gaussian models with splines and spatial effects.</p>
<p>It’s been a couple of weeks since the last blog, but I’m going to just assume that you are fully on top of all of those details. To that end, let’s jump in.</p>
<section id="what-is-jax" class="level2">
<h2 class="anchored" data-anchor-id="what-is-jax">What is JAX?</h2>
<p><a href="https://jax.readthedocs.io/en/latest/index.html">JAX</a> is a minor miracle. It will take python+numpy code and make it cool. It will let you JIT<sup>1</sup> compile it! It will let you differentiate it! It will let you batch<sup>2</sup>. JAX refers to these three operations as <em>transformations</em>.</p>
<p>But, as The Mountain Goats tell us <a href="https://www.youtube.com/watch?v=-E4XeV33TvE"><em>God is present in the sweeping gesture, but the devil is in the details</em></a>. And oh boy are those details going to be really fucking important to us.</p>
<p>There are going to be two key things that will make our lives more difficult:</p>
<ol type="1">
<li><p>Not every operation can be transformed by every operation. For example, you can’t always JIT or take gradients of a <code>for</code> loop. This means that some things have to be re-written carefully to make sure it’s possible to get the advantages we need.</p></li>
<li><p>JAX arrays are <em>immutable</em>. That means that once a variable is defined it <em>cannot be changed</em>. This means that things like <code>a = a + 1</code> is not allowed! If you’ve come from an R/Python/C/Fortran world, this is the weirdest thing to deal with.</p></li>
</ol>
<p>There are really excellent reasons for both of these restrictions. And looking into the reasons is fascinating. But not a topic for this blog<sup>3</sup></p>
<p>JAX has some pretty decent<sup>4</sup> documentation, a core piece of which outlines some of the <a href="https://jax.readthedocs.io/en/latest/notebooks/Common_Gotchas_in_JAX.html">sharp edges</a> you will run into. As you read through the documentation, the design choices become clearer.</p>
<p>So let’s go and find some sharp edges together!</p>
</section>
<section id="to-jax-or-not-to-jax" class="level2">
<h2 class="anchored" data-anchor-id="to-jax-or-not-to-jax">To JAX or not to JAX</h2>
<p>But first, we need to ask ourselves <em>which functions do we need to JAX</em>?</p>
<p>In the context of our problem we, so far, have three functions:</p>
<ol type="1">
<li><code>_symbolic_factor_csc(A_indices, A_indptr)</code>, which finds the non-zero indices of the sparse Cholesky factor and return them in CSC format,</li>
<li><code>_deep_copy_csc(A_indices, A_indptr, A_x, L_indices, L_indptr)</code>, which takes the <em>entries</em> of the matrix <img src="https://latex.codecogs.com/png.latex?A"> and re-creates them so they can be indexed within the larger pattern of non-zero elements of <img src="https://latex.codecogs.com/png.latex?L">,</li>
<li><code>_sparse_cholesky_csc_impl(L_indices, L_indptr, L_x)</code>, which actually does the sparse Cholesky factorisation.</li>
</ol>
<p>Let’s take them piece by piece, which is also a good opportunity to remind everyone what the code looked like.</p>
</section>
<section id="symbolic-factorisation" class="level2">
<h2 class="anchored" data-anchor-id="symbolic-factorisation">Symbolic factorisation</h2>
<div class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> _symbolic_factor_csc(A_indices, A_indptr):</span>
<span id="cb1-2">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Assumes A_indices and A_indptr index the lower triangle of $A$ ONLY.</span></span>
<span id="cb1-3">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb1-4">  L_sym <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [np.array([], dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n)]</span>
<span id="cb1-5">  children <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [np.array([], dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n)]</span>
<span id="cb1-6">  </span>
<span id="cb1-7">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n):</span>
<span id="cb1-8">    L_sym[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_indices[A_indptr[j]:A_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb1-9">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> child <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> children[j]:</span>
<span id="cb1-10">      tmp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_sym[child][L_sym[child] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> j]</span>
<span id="cb1-11">      L_sym[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.unique(np.append(L_sym[j], tmp))</span>
<span id="cb1-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_sym[j]) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:</span>
<span id="cb1-13">      p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_sym[j][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb1-14">      children[p] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.append(children[p], j)</span>
<span id="cb1-15">        </span>
<span id="cb1-16">  L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb1-17">  L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum([<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> L_sym])</span>
<span id="cb1-18">  L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.concatenate(L_sym)</span>
<span id="cb1-19">  </span>
<span id="cb1-20">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_indices, L_indptr</span></code></pre></div>
</div>
<p>This function only needs to be computed once per non-zero pattern. In the applications I outlined in the first post, this non-zero pattern is <em>fixed</em>. This means that you only need to run this function <em>once</em> per analysis (unlike the others, that you will have to run once per iteration!).</p>
<p>As a general rule, if you only do something once, it isn’t all that necessary to devote <em>too much</em> time into optimising it. There are, however, some obvious things we could do.</p>
<p>It is, for instance, pretty easy to see how you would implement this with an explicit tree<sup>5</sup> structure instead of constantly <code>np.append</code>ing the <code>children</code> array. This is <em>far</em> better from a memory standpoint.</p>
<p>It’s also easy to imagine this as a two-pass algorithm, where you build the tree and count the number of non-zero elements in the first pass and then build and populate <code>L_indices</code> in the second pass.</p>
<p>The thing is, neither of these things fixes the core problem for using JAX to JIT this: the dimensions of the internal arrays depend on the <em>values</em> of the inputs. This is not possible.</p>
<p>It seems like this would be a huge limitation, but in reality it isn’t. Most functions aren’t like this one! And, if we remember that JAX is a domain language focussing mainly on ML applications, this is <em>very rarely</em> the case. It is always good to remember context!</p>
<p>So what are our options? We have two.</p>
<ol type="1">
<li>Leave it in Python and just eat the speed.</li>
<li>Build a <a href="https://jax.readthedocs.io/en/latest/notebooks/How_JAX_primitives_work.html">new JAX primitive</a> and write the XLA compilation rule<sup>6</sup>.</li>
</ol>
<p>Today are opting for the first option!</p>
</section>
<section id="the-structure-changing-copy" class="level2">
<h2 class="anchored" data-anchor-id="the-structure-changing-copy">The structure-changing copy</h2>
<div class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> _deep_copy_csc(A_indices, A_indptr, A_x, L_indices, L_indptr):</span>
<span id="cb2-2">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb2-3">  L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_indices))</span>
<span id="cb2-4">  </span>
<span id="cb2-5">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n):</span>
<span id="cb2-6">    copy_idx <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.nonzero(np.in1d(L_indices[L_indptr[j]:L_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]],</span>
<span id="cb2-7">                                  A_indices[A_indptr[j]:A_indptr[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]))[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb2-8">    L_x[L_indptr[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> copy_idx] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_x[A_indptr[j]:A_indptr[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb2-9">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_x</span></code></pre></div>
</div>
<p>This is, fundamentally, a piece of bookkeeping. An annoyance of sparse matrices. Or, if you will, explicit <em>cast</em> between different sparse matrix types<sup>7</sup>. This is a thing that we do actually need to be able to differentiate, so it needs to live in JAX.</p>
<p>So where are the potential problems? Let’s go line by line.</p>
<ol type="1">
<li><p><code>n = len(A_indptr) - 1</code>: This is lovely. <code>n</code> is used in a for loop later, but because it is a function of the <em>shape</em> of <code>A_indptr</code>, it is considered static and we will be able to JIT over it!</p></li>
<li><p><code>L_x = np.zeros(len(L_indices))</code>: Again, this is fine. Sizes are derived from shapes, life is peachy.</p></li>
<li><p><code>for j in range(0, n):</code>: This could be a problem if <code>n</code> was an argument or derived from <em>values</em> of the arguments, but it’s derived from a shape so it is static. Praise be! Well, actually it’s a bit more involved than that.</p></li>
</ol>
<p>The problem with the <code>for</code> loop is what will happen when it is JIT’d.&nbsp;Essentially, the loop will be statically unrolled<sup>8</sup>. That is fine for small loops, but it’s a bit of a pain in the arse when <code>n</code> is large.</p>
<p>In this case, we might want to use the structured control flow in <code>jax.lax</code><sup>9</sup> In this case we would need <code>jax.lax.fori_loop(start, end, body_fun, init_value)</code>. This makes the code look less <em>pythonic</em>, but probably should make it faster. It is also, and I cannot stress this enough, an absolute dick to use.</p>
<p>(In actuality, we will see that we do not need this particular corner of the language here!)</p>
<ol start="4" type="1">
<li><code>copy_idx = np.nonzero(...)</code>: This looks like it’s going to be complicated, but actually it is a perfectly reasonable composition of <code>numpy</code> functions. Hence, we can use the same <code>jax.numpy</code> functions with minimal changes. The one change that we are going to need to make in order to end up with a JIT-able and differentiable function is that we need to tell JAX how many non-zero elements there are. Thankfully, we know this! Because the non-zero pattern of <img src="https://latex.codecogs.com/png.latex?A"> is a subset of the non-zero pattern of <img src="https://latex.codecogs.com/png.latex?L">, we know that</li>
</ol>
<div class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb3-1">np.in1d(L_indices[L_indptr[j]:L_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]], A_indices[A_indptr[j]:A_indptr[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]])</span></code></pre></div>
</div>
<p>will have exactly <code>len(A_indices[A_indptr[j]:A_indptr[j+1]])</code> <code>True</code> values, and so <code>np.nonzero(...)</code> will have that many. We can pass this information to <code>jnp.nonzero()</code> using the optional <code>size</code> argument.</p>
<p><strong>Oh no! We have a problem!</strong> This return size is <em>a function of the values</em> of <code>A_indptr</code> rather than a function of the shape. This means we’re a bit fucked.</p>
<p>There are two routes out:</p>
<ol type="1">
<li>Declare <code>A_indptr</code> to be a static parameter, or</li>
<li>Change the representation from CSC to something more convenient.</li>
</ol>
<p>In this case we could do either of these things, but I’m going to opt for the second option, as it’s going to be more useful going forward.</p>
<p>But before we do that, let’s look at the final line in the code.</p>
<ol start="5" type="1">
<li><code>L_x[L_indptr[j] + copy_idx] = A_x[A_indptr[j]:A_indptr[j+1]]</code>: The final non-trivial line of the code is also a problem. The issue is that these arrays are <em>immutable</em> and we are asking to change the values! That is not allowed!</li>
</ol>
<p>The solution here is to use a clunkier syntax. In JAX, we need to replace</p>
<div class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb4-1">x[ind] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> a</span></code></pre></div>
</div>
<p>with the less pleasant</p>
<div class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb5-1">x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> x.at[ind].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(a)</span></code></pre></div>
</div>
<p>What is going on under the hood to make the second option ok while the first is an error is well beyond the scope of this little post. But the important thing is that they <em>compile down</em> to an in-place<sup>10</sup> update, which is all we really care about.</p>
</section>
<section id="re-doing-the-data-structure." class="level2">
<h2 class="anchored" data-anchor-id="re-doing-the-data-structure.">Re-doing the data structure.</h2>
<p>Ok. So we need a new data structure. That’s annoying. The rule, I guess, is always that if you need to innovate, you should innovate very little if you can get away with it, or a lot if you have to.</p>
<p>We are going to innovate only the tiniest of bits.</p>
<p>The idea is to keep the core structure of the CSC data structure, but to replace the <code>indptr</code> array with explicitly storing the row indices and row values as a <em>list</em> of <code>np.arrays</code>. So <code>A_index</code> will now be a <em>list</em> of <code>n</code> arrays that contain the row indices of the non-zero elements of <img src="https://latex.codecogs.com/png.latex?A">, while <code>A_x</code>will now be a <em>list</em> of <code>n</code> arrays that contain the values of the non-zero elements of <img src="https://latex.codecogs.com/png.latex?A">.</p>
<p>This means that the matrix <img src="https://latex.codecogs.com/png.latex?%0AB%20=%20%5Cbegin%7Bpmatrix%7D%0A1%20&amp;&amp;5%20%5C%5C%0A2&amp;3&amp;%20%5C%5C%0A&amp;4&amp;6%0A%5Cend%7Bpmatrix%7D%0A"> would be stored as</p>
<div class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb6-1">B_index <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [np.array([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]), np.array([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]), np.array([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>])]</span>
<span id="cb6-2">B_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [np.array([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]), np.array([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>]), np.array([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>])]</span></code></pre></div>
</div>
<p>This is a considerably more <em>pythonic</em><sup>11</sup> version of CSC. So I guess that’s an advantage.</p>
<p>We can easily go from CSC storage to this modified storage.</p>
<div class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb7-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> to_pythonic_csc(indices, indptr, x):</span>
<span id="cb7-2">  index <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.split(indices, indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb7-3">  x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.split(x, indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb7-4">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> index, x</span></code></pre></div>
</div>
</section>
<section id="a-jax-tracable-structure-changing-copy" class="level2">
<h2 class="anchored" data-anchor-id="a-jax-tracable-structure-changing-copy">A JAX-tracable structure-changing copy</h2>
<p>So now it’s time to come back to that damn <code>for</code> loop. As flagged earlier, <code>for</code> loops can be a bit picky in JAX. If we use them <em>as is</em>, then the code that is generated and then compiled is <em>unrolled</em>. You can think of this as if the JIT compiler automatically writes a C++ program and then compiles it. If you were to examine that code, the for loop would be replaced by <code>n</code> almost identical blocks of code with only the index <code>j</code> changing between them. This leads to a potentially very large program to compile<sup>12</sup> and it limits the compiler’s ability to do clever things to make the compiled code run faster<sup>13</sup>.</p>
<p>The <code>lax.fori_loop()</code> function, on the other hand, compiles down to the equivalent of a single operation<sup>14</sup>. This lets the compiler be super clever.</p>
<p>But we don’t actually need this here. Because if you take a look at the original for loop we are just applying the same two lines of code to each triple of lists in <code>A_index</code>, <code>A_x</code>, and <code>L_index</code> (in our new<sup>15</sup> data structure).</p>
<p>This just <em>screams</em> out for a map applying a single function independently to each column.</p>
<p>The challenge is to find the right map function. An obvious hope would be <code>jax.vmap</code>. Sadly, <code>jax.vmap</code> does not do that. (At least not without more padding<sup>16</sup> than a drag queen.) The problem here is a misunderstanding of what different parts of JAX are for. Functions like <code>jax.vmap</code> are made for applying the same function to arrays <em>of the same size</em>. This makes sense in their context. (JAX is, after all, made for machine learning and these shape assumptions fit really well in that paradigm. They just don’t fit here.)</p>
<p>And I won’t lie. After this point I went <em>wild</em>. <code>lax.map</code> did not help. And I honest to god tried <code>lax.scan</code>, which is will solve the problem but <a href="https://www.youtube.com/watch?v=AOGzY9xShEI">at what cost?</a>.</p>
<p>But at some point, you read enough of the docs to find the answer.</p>
<p>The correct answer here is to use the JAX concept of a <code>pytree</code>. Pytrees are essentially<sup>17</sup> lists of arrays. They’re very flexible and they have a <code>jax.tree_map</code> function that lets you map over them! We are saved!</p>
<div class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb8-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb8-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> jnp</span>
<span id="cb8-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> tree_map</span>
<span id="cb8-4"></span>
<span id="cb8-5"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> _structured_copy_csc(A_index, A_x, L_index):</span>
<span id="cb8-6">    <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> body_fun(A_rows, A_vals, L_rows):</span>
<span id="cb8-7">      out <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.zeros(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_rows))</span>
<span id="cb8-8">      copy_idx <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>  jnp.nonzero(jnp.in1d(L_rows, A_rows), size <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_rows))[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] </span>
<span id="cb8-9">      out <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> out.at[copy_idx].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(A_vals)</span>
<span id="cb8-10">      <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> out</span>
<span id="cb8-11">    L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tree_map(body_fun, A_index, A_x, L_index)</span>
<span id="cb8-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_x</span></code></pre></div>
</div>
<section id="testing-it-out" class="level3">
<h3 class="anchored" data-anchor-id="testing-it-out">Testing it out</h3>
<p>Ok so now lets see if it works. To do that I’m going to define a very simple function <img src="https://latex.codecogs.com/png.latex?%0Af(A,%20%5Calpha,%20%5Cbeta)%20=%20%5C%7C%5Calpha%20I%20+%20%5Cbeta%20%5Coperatorname%7Btril%7D(A)%5C%7C_F%5E2,%0A"> that is the sum of the squares of all of the elements of <img src="https://latex.codecogs.com/png.latex?%5Calpha%20I%20+%20%5Cbeta%20%5Coperatorname%7Btril%7D(A)">. There’s obviously an easy way to do this, but I’m going to do it in a way that uses the function we just built.</p>
<div class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb9-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> test_func(A_index, A_x, params):</span>
<span id="cb9-2">  I_index <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [jnp.array([j]) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_index))]</span>
<span id="cb9-3">  I_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [jnp.array([params[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]]) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_index))]</span>
<span id="cb9-4">  I_x2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _structured_copy_csc(I_index, I_x, A_index)</span>
<span id="cb9-5">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> jnp.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>((jnp.concatenate(I_x2) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> params[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> jnp.concatenate(A_x))<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div>
</div>
<p>Next, we need a test case. Once again, we will use the 2D Laplacian on a regular <img src="https://latex.codecogs.com/png.latex?n%20%5Ctimes%20n"> grid (up to a scaling). This is a nice little function because it’s easy to make test problems of different sizes.</p>
<div class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb10-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> scipy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> sparse</span>
<span id="cb10-2"></span>
<span id="cb10-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> make_matrix(n):</span>
<span id="cb10-4">    one_d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.diags([[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>n, [<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)], [<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb10-5">    A_lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.tril(sparse.kronsum(one_d, one_d) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> sparse.eye(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>n), <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">format</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"csc"</span>)</span>
<span id="cb10-6">    A_index <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.split(jnp.array(A_lower.indices), A_lower.indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb10-7">    A_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jnp.split(jnp.array(A_lower.data), A_lower.indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb10-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> (A_index, A_x)</span></code></pre></div>
</div>
<p>With our test case in hand, we can check to see if JAX will differentiate for us!</p>
<div class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb11-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> grad, jit</span>
<span id="cb11-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> jax.test_util <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> check_grads</span>
<span id="cb11-3"></span>
<span id="cb11-4">grad_func <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> grad(test_func, argnums <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb11-5"></span>
<span id="cb11-6">A_index, A_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>)</span>
<span id="cb11-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"The value at (2.0, 2.0) is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>test_func(A_index, A_x, (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>))<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">."</span>)</span>
<span id="cb11-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"The gradient is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>array(grad_func(A_index, A_x, (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>)))<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">."</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>The value at (2.0, 2.0) is 379600.0.</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>The gradient is [ 60000. 319600.].</code></pre>
</div>
</div>
<p>Fabulous! That works!</p>
</section>
</section>
<section id="but-what-about-jit" class="level2">
<h2 class="anchored" data-anchor-id="but-what-about-jit">But what about JIT?</h2>
<p>JIT took fucking <em>ages</em>. I’m talking “it threw a message” amounts of time. I’m not even going to pretend that I understand why. But I can hazard a guess.</p>
<p>My running assumption, taken from the docs, is that as long as the function only relies of quantities that are derived from the <em>shapes</em> of the inputs (and not the values), then JAX will be able to trace through and JIT through the functions with ease.</p>
<p>This might not be true for <code>tree_map</code>s. The docs are, as far as I can tell, silent on this matter. And a cursory look through the github repo did not give me any hints as to how <code>tree_map()</code> is translated.</p>
<p>Let’s take a look to see if this is true.</p>
<div class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb14-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> timeit</span>
<span id="cb14-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> functools <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> partial</span>
<span id="cb14-3">jit_test_func <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jit(test_func)</span>
<span id="cb14-4"></span>
<span id="cb14-5">A_index, A_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb14-6">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(partial(jit_test_func, A_index, A_x, (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>)), number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb14-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = 5: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 5: [1.6695, 0.0001, 0.0, 0.0, 0.0]</code></pre>
</div>
</div>
<p>We can see that the first run includes compilation time, but after that it runs a bunch faster. This is how a JIT system is supposed to work! But the question is: will it recompile when we run it for a different matrix?</p>
<div class="cell" data-execution_count="13">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb16-1">_ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jit_test_func(A_index, A_x, (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>)) </span>
<span id="cb16-2">A_index, A_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_matrix(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>)</span>
<span id="cb16-3">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(partial(jit_test_func, A_index, A_x, (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>)), number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb16-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = 20: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 20: [38.5779, 0.0006, 0.0003, 0.0003, 0.0003]</code></pre>
</div>
</div>
<p>Damn. It recompiles. But, as we will see, it does not recompile if we only change <code>A_x</code>.</p>
<div class="cell" data-execution_count="14">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb18-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># What if we change A_x only</span></span>
<span id="cb18-2">_ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> jit_test_func(A_index, A_x, (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>)) </span>
<span id="cb18-3">A_x2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tree_map(<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">lambda</span> x: x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>, A_x)</span>
<span id="cb18-4">times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> timeit.repeat(partial(jit_test_func, A_index, A_x2, (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>)), number <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb18-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"n = 20, new A_x: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(t, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> t <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> times]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>n = 20, new A_x: [0.0006, 0.0007, 0.0005, 0.0003, 0.0003]</code></pre>
</div>
</div>
<p>This gives us some hope! This is because the <em>structure</em> of A (aka <code>A_index</code>) is fixed in our application, but the values <code>A_x</code> changes. So as long as the initial JIT compilation is reasonable, we should be ok.</p>
<p>Unfortunately, there is something bad happening with the compilation. For <img src="https://latex.codecogs.com/png.latex?n=10">, it takes (on my machine) about 2 seconds for the initial compilation. For <img src="https://latex.codecogs.com/png.latex?n=20">, that increases to 16 seconds. Once <img src="https://latex.codecogs.com/png.latex?n%20=%2030">, this balloons up to 51 seconds. Once we reach the lofty peaks<sup>18</sup> of <img src="https://latex.codecogs.com/png.latex?n=40">, we are up at 149 seconds to compile.</p>
<p>This is not good. The function we are JIT-ing is <em>very</em> simple: just one <code>tree_map</code>. I do not know enough<sup>19</sup> about the internals of JAX, so I don’t want to speculate too wildly. But it seems like it might be unrolling the <code>tree_map</code> before compilation, which is … bad.</p>
</section>
<section id="lets-admit-failure" class="level2">
<h2 class="anchored" data-anchor-id="lets-admit-failure">Let’s admit failure</h2>
<p>Ok. So that didn’t bloody work. I’m not going to make such broad statements as <em>you can’t use the JAX library in python to write a transformable sparse Cholesky factorisation</em>, but I am more than prepared to say that <em>I</em> cannot do such a thing.</p>
<p>But, if I’m totally honest, I’m not <em>enormously</em> surprised. Even in looking at the very simple operation we focussed on today, it’s pretty clear that the operations required to work on a sparse matrix don’t look an awful lot like the types of operations you need to do the types of machine learning work that is JAX’s <em>raison d’être</em>.</p>
<p>And it is <em>never</em> surprising to find that a library designed to do a fundamentally different thing does not easily adapt to whatever random task I decide to throw at it.</p>
<p>But there is a light: JAX is an extensible language. We can build a new JAX primitive (or, new JAX primitives) and manually write all of the transformations (batching, JIT, and autodiffing).</p>
<p>And that is what we shall do next! It’s gonna be a blast!</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>If you’ve never come across this term before, you can Google it for actual details, but the squishy version is that it will <em>compile</em> your code so it runs fast (like C code) instead of slow (like python code). JIT stands for <em>just in time</em>, which means that the code is compiled when it’s needed rather than before everything else is run. It’s a good thing. It makes the machine go <em>bing</em> faster.↩︎</p></li>
<li id="fn2"><p>I give less of a shit about the third transformation in this context. I’m not completely sure what you would batch when you’re dealing with a linear mixed-ish model. But hey. Why not.↩︎</p></li>
<li id="fn3"><p>If you’ve ever spoken to a Scala advocate (or any other pure functional language), you can probably see the edges of why the arrays need to be immutable.<img src="https://latex.codecogs.com/png.latex?%0A%5Cphantom%7Ba%7D%0A"> Restrictions to JIT-able control flow has to do with how it’s translated onto the XLA compiler, which involves <em>tracing</em> through the code with an abstract data type with the same shape as the one that it’s being called with. Because this abstract data type does not have any values, structural parts of the code that <em>require</em> knowledge of specific values of the arguments will be lost. You can get around this partially by declaring those important values to be <em>static</em>, which would make the JIT compiler re-compile the function each time that value changes. We are not going to do that. <img src="https://latex.codecogs.com/png.latex?%0A%5Cphantom%7Ba%7D%0A"> Restrictions to gradients have to do (I assume) with reverse-mode autodiff needing to construct the autodiff tree at compile time, which means you need to be able to compute the number of operations from the types and shapes of the input variables and not from their values.↩︎</p></li>
<li id="fn4"><p>Coverage is pretty good on the <em>using</em> bit, but, as is usual, the bits on extending the system are occasionally a bit … sparse. (What in the hairy Christ is a <a href="https://jax.readthedocs.io/en/latest/notebooks/How_JAX_primitives_work.html#transposition">transposition</a> rule actually supposed to do????)↩︎</p></li>
<li id="fn5"><p>Forest↩︎</p></li>
<li id="fn6"><p>aka implement the damn thing in C++ and then do some proper work on it.↩︎</p></li>
<li id="fn7"><p>It is useful to think of a sparse matrix type as the triple <code>(value_type, indices, indptr)</code>. This means that if we are going to do something like add sparse matrices, we need to first cast them both to have the same type. After the cast, addition of two different sparse matrices becomes the addition of their <code>x</code> attributes. The same holds for scalar multiplication. Sparse matrix-matrix multiplication is a bit different because you once again need to symbolically work out the sparsity structure (aka the type) of the product. ↩︎</p></li>
<li id="fn8"><p>I think. That’s certainly what’s implied <a href="https://jax.readthedocs.io/en/latest/notebooks/Common_Gotchas_in_JAX.html#python-control-flow-jit">by the docs</a>, but I don’t want to give the impression that I’m sure. Because this is <a href="https://www.youtube.com/watch?v=5NPBIwQyPWE">complicated.</a>↩︎</p></li>
<li id="fn9"><p>What is <code>jax.lax</code>? Oh honey you don’t want to know.↩︎</p></li>
<li id="fn10"><p>aka there’s no weird copying↩︎</p></li>
<li id="fn11"><p><a href="https://www.youtube.com/watch?v=1hRvQqyeI2g">Whatever that means anyway</a>↩︎</p></li>
<li id="fn12"><p>slowwwwww to compile↩︎</p></li>
<li id="fn13"><p>The XLA compiler does very clever things. Incidentally, loop unrolling is actually one of the optimisations that compilers have in their pocket. Just not one that’s usually used for loops as large as this.↩︎</p></li>
<li id="fn14"><p>Read about XLA High Level Operations (HLOs) <a href="https://www.tensorflow.org/xla/architecture">here</a>. The XLA documentation is not extensive, but there’s still a lot to read.↩︎</p></li>
<li id="fn15"><p>This is why we have a new data structure.↩︎</p></li>
<li id="fn16"><p>My kingdom for a ragged array.↩︎</p></li>
<li id="fn17"><p>Yes. They are more complicated than this. But for our purposes they are lists of arrays.↩︎</p></li>
<li id="fn18"><p><img src="https://latex.codecogs.com/png.latex?n=50"> takes so long it prints a message telling us what to do if we need to do if we want to file a bug! Compilation eventually clocks in at 361 seconds.↩︎</p></li>
<li id="fn19"><p>aka I know sweet bugger all↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2022,
  author = {Simpson, Dan},
  title = {Sparse {Matrices} 3: {Failing} at {JAX}},
  date = {2022-05-14},
  url = {https://dansblog.netlify.app/2022-05-14-jax-ing-a-sparse-cholesky-factorisation-part-3-in-an-ongoing-journey},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2022" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2022. <span>“Sparse Matrices 3: Failing at JAX.”</span>
May 14, 2022. <a href="https://dansblog.netlify.app/2022-05-14-jax-ing-a-sparse-cholesky-factorisation-part-3-in-an-ongoing-journey">https://dansblog.netlify.app/2022-05-14-jax-ing-a-sparse-cholesky-factorisation-part-3-in-an-ongoing-journey</a>.
</div></div></section></div> ]]></description>
  <category>Sparse matrices</category>
  <category>Sparse Cholesky factorisation</category>
  <category>Python</category>
  <category>JAX</category>
  <guid>https://dansblog.netlify.app/posts/2022-05-14-jax-ing-a-sparse-cholesky-factorisation-part-3-in-an-ongoing-journey/jax-ing-a-sparse-cholesky-factorisation-part-3-in-an-ongoing-journey.html</guid>
  <pubDate>Fri, 13 May 2022 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2022-05-14-jax-ing-a-sparse-cholesky-factorisation-part-3-in-an-ongoing-journey/alien.JPG" medium="image"/>
</item>
<item>
  <title>Sparse Matrices 2: An invitation to a sparse Cholesky factorisation</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2022-03-23-getting-jax-to-love-sparse-matrices/getting-jax-to-love-sparse-matrices.html</link>
  <description><![CDATA[ 





<p>This is part two of an ongoing exercise in hubris. <a href="https://dansblog.netlify.app/posts/2022-03-22-a-linear-mixed-effects-model/">Part one is here.</a></p>
<section id="the-choleksy-factorisation" class="level1">
<h1>The Choleksy factorisation</h1>
<p>So first things first: Cholesky wasn’t Russian. I don’t know why I always thought he was, but you know. Sometime you should do a little googling first. Cholesky was French and died in the First World War.</p>
<p>But now that’s out of the way, let’s talk about matrices. If <img src="https://latex.codecogs.com/png.latex?A"><sup>1</sup> is a symmetric positive definite matrix, then there is a unique lower-triangular matrix <img src="https://latex.codecogs.com/png.latex?L"> such that <img src="https://latex.codecogs.com/png.latex?A%20=%20LL%5ET">.</p>
<p>Like all good theorems in numerical linear algebra, the proof of the existence of the Cholesky decomposition gives a pretty clear algorithm for constructing <img src="https://latex.codecogs.com/png.latex?L">. To sketch<sup>2</sup> it, let us see what it looks like if build up our Choleksy factorisation from left to right, so the first <img src="https://latex.codecogs.com/png.latex?j-1"> columns have been modified and we are looking at how to build the <img src="https://latex.codecogs.com/png.latex?j">th column. In order to make <img src="https://latex.codecogs.com/png.latex?L"> lower-triangular, we need the first <img src="https://latex.codecogs.com/png.latex?j-1"> elements of the <img src="https://latex.codecogs.com/png.latex?j">th column to be zero. Let’s see if we can work out what the other columns have to be.</p>
<p>Writing this as a matrix equation, we get <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Bpmatrix%7D%20A_%7B11%7D%20&amp;%20a_%7B12%7D%20&amp;%20A_%7B32%7D%5ET%20%5C%5C%0Aa_%7B12%7D%5ET%20&amp;%20a_%7B22%7D%20&amp;%20a_%7B32%7D%5ET%20%5C%5C%0AA_%7B31%7D%20&amp;%20a_%7B32%7D%20&amp;%20A_%7B33%7D%5Cend%7Bpmatrix%7D%20=%0A%5Cbegin%7Bpmatrix%7D%20L_%7B11%7D&amp;&amp;%20%5C%5C%0Al_%7B12%7D%5ET%20&amp;%20l_%7B22%7D&amp;%5C%5C%0AL_%7B31%7D%20&amp;%20l_%7B32%7D%20&amp;%20L_%7B33%7D%5Cend%7Bpmatrix%7D%0A%5Cbegin%7Bpmatrix%7DL_%7B11%7D%5ET%20%20&amp;l_%7B12%7D%20&amp;%20L_%7B31%7D%5ET%5C%5C%0A&amp;%20l_%7B22%7D&amp;l_%7B32%7D%5ET%5C%5C%0A&amp;%20%20&amp;%20L_%7B33%7D%5ET%5Cend%7Bpmatrix%7D,%0A"> where <img src="https://latex.codecogs.com/png.latex?L_%7B11%7D"> is lower-triangular (and <img src="https://latex.codecogs.com/png.latex?A_%7B11%7D%20=%20L_%7B11%7DL_%7B11%7D%5ET">) and lower-case letters are vectors<sup>3</sup> and everything is of the appropriate dimension to make <img src="https://latex.codecogs.com/png.latex?A_%7B11%7D"> the top-left <img src="https://latex.codecogs.com/png.latex?(j-1)%20%5Ctimes%20(j-1)"> submatrix of <img src="https://latex.codecogs.com/png.latex?A">.</p>
<p>If we can find equations for <img src="https://latex.codecogs.com/png.latex?l_%7B22%7D"> and <img src="https://latex.codecogs.com/png.latex?l_%7B32%7D"> that don’t depend on <img src="https://latex.codecogs.com/png.latex?L_%7B33%7D"> (ie we can express them in terms of things we already know), then we have found an algorithm that marches from the left of the matrix to the right leaving a Choleksy factorisation in its wake!</p>
<p>If we do our matrix multiplications, we get the following equation for <img src="https://latex.codecogs.com/png.latex?a_%7B22%7D%20=%20A_%7Bjj%7D">: <img src="https://latex.codecogs.com/png.latex?%0Aa_%7B22%7D%20=%20l_%7B12%7D%5ETl_%7B12%7D%20+%20l_%7B22%7D%5E2.%0A"> Rearranging, we get <img src="https://latex.codecogs.com/png.latex?%0Al_%7B22%7D%20%20=%20%5Csqrt%7Ba_%7B22%7D%20-%20l_%7B12%7D%5ETl_%7B12%7D%7D.%0A"> The canny amongst you will be asking “yes but is that a real number”. The answer turns out to be “yes” for all diagonals if and only if<sup>4</sup> <img src="https://latex.codecogs.com/png.latex?A"> is symmetric positive definite.</p>
<p>Ok! We have expressed <img src="https://latex.codecogs.com/png.latex?l_%7B22%7D"> in terms of things we know, so we are half way there. Now to attack the vector <img src="https://latex.codecogs.com/png.latex?l_%7B3,2%7D">. Looking at the (3,2) equation implied by the above block matrices, we get <img src="https://latex.codecogs.com/png.latex?%0Aa_%7B32%7D%20=%20L_%7B31%7Dl_%7B12%7D%20+%20l_%7B32%7D%20l_%7B22%7D.%0A"> Remembering that <img src="https://latex.codecogs.com/png.latex?l_%7B22%7D"> is a scalar (that we have already computed!), we get <img src="https://latex.codecogs.com/png.latex?%0Al_%7B32%7D%20=%20(a_%7B32%7D%20-%20L_%7B31%7Dl_%7B12%7D)%20/%20l_%7B22%7D.%0A"></p>
<p>Success!</p>
<p>This then gives us the<sup>5</sup> Cholesky factorisation<sup>6</sup>:</p>
<pre><code>for j in range(0,n) (using python slicing notation because why notation)
  L[j,j] = sqrt(A[j,j] - L[j, 1:(j-1)] * L[j, 1:(j-1)]')
  L[(j+1):n, j] = (A[(j+1):n, j] - L[(j+1):n, 1:(j-1)] * L[j, 1:(j-1)]') / L[j,j]</code></pre>
<p>Easy as.</p>
<p>When <img src="https://latex.codecogs.com/png.latex?A"> is a dense matrix, this costs <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(n%5E3)"> floating point operations<sup>7</sup>.</p>
<p>So how can we take advantage of the observation that most of the entries of <img src="https://latex.codecogs.com/png.latex?A"> are zero (aka <img src="https://latex.codecogs.com/png.latex?A"> is a sparse matrix)? Well. That is the topic of this post. In order, we are going to look at the following:</p>
<ol type="1">
<li>Storing a sparse matrix so it works with the algorithm</li>
<li>How sparse is a Cholesky factor?</li>
<li>Which elements of the Cholesky factor are non-zero (aka symbolic factorisation)</li>
<li>Computing the Cholesky factorisation</li>
<li><del>What about JAX? (or: fucking immutable arrays are trying to ruin my fucking life)</del> (This did not happen. Next time. The post is long enough.)</li>
</ol>
<section id="so-how-do-we-store-a-sparse-matrix" class="level2">
<h2 class="anchored" data-anchor-id="so-how-do-we-store-a-sparse-matrix">So how do we store a sparse matrix?</h2>
<p>If we look at the Cholesky algorithm, we notice that we are scanning through the matrix column-by-column. When a computer stores a matrix, it stores it as a long 1D array with some side information. How this array is constructed from the matrix depends on the language.</p>
<p>There are (roughly) two options: column-major or row-major storage. Column major storage (used by Fortran<sup>8</sup>, R, Matlab, Julia, Eigen, etc) stacks a matrix column by column. A small example: <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Bpmatrix%7D1&amp;3&amp;5%5C%5C2&amp;4&amp;6%20%5Cend%7Bpmatrix%7D%20%5CRightarrow%20%5B1,2,3,4,5,6%5D.%0A"> Row-mjor ordering (C/C++ arrays, SAS, Pascal, numpy<sup>9</sup>) stores things row-by-row.</p>
<p>Which one do we use? Well. If you look at the Cholesky algorithm, it scans through the matrix column-by-column. It is much much much more memory efficient in this case to have the whole column available in one contiguous chunk of memory. So we are going to use column-major storage.</p>
<p>But there’s an extra wrinkle: Most of the entries in our matrix are zero. It would be very inefficient to store all of those zeros. You may be sceptical about this, but it’s true. It helps to realize that even in the examples at the bottom of this post that are not trying very hard to minimise the fill in, only 3-4% of the potential elements in <img src="https://latex.codecogs.com/png.latex?L"> are non-zero.</p>
<p>It is far more efficient to just store the locations<sup>10</sup> of the non-zeros and their values. If only 4% of your matrix is non-zero, you are saving<sup>11</sup> a lot of memory!</p>
<p>The storage scheme we are inching towards is called <em>compressed sparse column (CSC)</em> storage. This stores the matrix in three arrays. The first array <code>indices</code> (which has as many entries as there are non-zeros) stores the row numbers for each non-zero element. So if <img src="https://latex.codecogs.com/png.latex?%0AB%20=%20%5Cbegin%7Bpmatrix%7D%0A1%20&amp;&amp;5%20%5C%5C%0A2&amp;3&amp;%20%5C%5C%0A&amp;4&amp;6%0A%5Cend%7Bpmatrix%7D%0A"> then (using zero-based indices because I’ve to to make this work in Python)</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb2-1">B_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]</span></code></pre></div>
</div>
<p>The second array <code>indptr</code> is an <img src="https://latex.codecogs.com/png.latex?n+1">-dimensional array that indexes the first element of each row. The final element of <code>indptr</code> is <code>nnz(B)</code><sup>12</sup>. This leads to</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb3-1">B_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>]</span></code></pre></div>
</div>
<p>This means that the entries in column<sup>13</sup> j are have row numbers</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb4-1">B_indices[B_indptr[j]:B_indptr[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span></code></pre></div>
</div>
<p>The third and final array is <code>x</code>, which stores the <em>values</em> of the non-negative entries of <img src="https://latex.codecogs.com/png.latex?A"> <em>column-by-column</em>. This gives</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb5-1">B_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>]</span></code></pre></div>
</div>
<p>Using these three arrays we can get access to the <code>j</code>th row of <img src="https://latex.codecogs.com/png.latex?B"> by accessing</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb6-1">B_x[B_indptr[j]:B_indptr[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span></code></pre></div>
</div>
<p>This storage scheme is very efficient for what we are about to do. But it is fundamentally a static scheme: it is <em>extremely</em> expensive to add a new non-zero element. There are other sparse matrix storage schemes that make this work better.</p>
</section>
<section id="how-sparse-is-a-cholesky-factor-of-a-sparse-matrix" class="level2">
<h2 class="anchored" data-anchor-id="how-sparse-is-a-cholesky-factor-of-a-sparse-matrix">How sparse is a Cholesky factor of a sparse matrix?</h2>
<p>Ok. So now we’ve got that out of the way, we need to work out the sparsity structure of a Choleksy factorisation. At this point we need to close our eyes, pray, and start thinking about graphs.</p>
<p>Why graphs? I promise, it is not because I love discrete<sup>14</sup> maths. It is because symmetric sparse matrices are strongly related to graphs.</p>
<p>To remind people, a graph<sup>15</sup> (in a mathematical sense) <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D%20=%20(%5Cmathcal%7BV%7D,%20%5Cmathcal%7BE%7D)"> consists of two lists:</p>
<ol type="1">
<li>A list of vertices <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BV%7D"> numbered from <img src="https://latex.codecogs.com/png.latex?1"> to <img src="https://latex.codecogs.com/png.latex?n"><sup>16</sup>.</li>
<li>A list of edges <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BE%7D"> in the graph (aka all the pairs <img src="https://latex.codecogs.com/png.latex?(i,j)"> such that <img src="https://latex.codecogs.com/png.latex?i%3Cj"> and there is an edge between <img src="https://latex.codecogs.com/png.latex?i"> and <img src="https://latex.codecogs.com/png.latex?j">).</li>
</ol>
<p>Every symmetric sparse matrix <img src="https://latex.codecogs.com/png.latex?A"> has a graph naturally associated with it. The relationship is that <img src="https://latex.codecogs.com/png.latex?(i,j)"> (for <img src="https://latex.codecogs.com/png.latex?i%5Cneq%20j">) is an edge in <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D"> if and only if <img src="https://latex.codecogs.com/png.latex?A_%7Bij%7D%20%5Cneq%200">.</p>
<p>So, for instance, if <img src="https://latex.codecogs.com/png.latex?%0AA%20=%20%5Cbegin%7Bpmatrix%7D%0A1&amp;2&amp;&amp;8%20%5C%5C%0A2&amp;3&amp;&amp;%205%5C%5C%0A&amp;&amp;4&amp;6%20%5C%5C%0A8&amp;5&amp;6&amp;7%0A%5Cend%7Bpmatrix%7D,%0A"></p>
<p>then we can plot the associated graph, <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D">.</p>
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-03-23-getting-jax-to-love-sparse-matrices/getting-jax-to-love-sparse-matrices_files/figure-html/unnamed-chunk-7-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>But why do we care about graphs?</p>
<p>We care because they let us answer our question for this section: <em>which elements of the Cholesky factor <img src="https://latex.codecogs.com/png.latex?L"> are non-zero?</em></p>
<p>It is useful to write the algorithm out for a second time<sup>17</sup>, but this time closer to how we will implement it.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb7-1">L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.tril(A)</span>
<span id="cb7-2"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n):</span>
<span id="cb7-3">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> k <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>):</span>
<span id="cb7-4">    L[j:n, j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-=</span> L[j, k] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> L[j:n, k]</span>
<span id="cb7-5">  L[j,j]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.sqrt(L[j,j])</span>
<span id="cb7-6">  L[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:n, j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:n] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> L[j, j]</span></code></pre></div>
</div>
<p>If we stare at this long enough we can work out when <img src="https://latex.codecogs.com/png.latex?L_%7Bij%7D"> is going to be potentially non-zero.</p>
<p>And here is where we have to take a quick zoom out. We are <em>not</em> interested if the numerical entry <img src="https://latex.codecogs.com/png.latex?L_%7Bij%7D"> is <em>actually</em> non-zero. We are interested if it <em>could be</em> non-zero. Why? Because this will allow us to set up our storage scheme for the sparse Cholesky factor. And it will tell us exactly which bits of the above loops we actually need to do!</p>
<p>So with that motivation in mind, can we spot the non-zeros? Well. I’ll be honest with you. I struggle at this game. This is part of why I do not like thinking about graphs<sup>18</sup>. But with a piece of paper and a bit of time, I can convince myslef that <img src="https://latex.codecogs.com/png.latex?L_ij"> is potentially non-zero (or a <em>structural</em> non-zero) if:</p>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?A_%7Bij%7D"> is non-zero (because <code>tmp[i-j]</code> is non-zero!), or</li>
<li><img src="https://latex.codecogs.com/png.latex?L_%7Bik%7D%20%5Cneq%200"> <em>and</em> <img src="https://latex.codecogs.com/png.latex?L_%7Bjk%7D%20%5Cneq%200"> for some <img src="https://latex.codecogs.com/png.latex?k%20%3C%20%5Cmin%5C%7Bi,%20j%5C%7D"> (because that is the only time an element of <code>tmp</code> is updated through <code>tmp[i] = tmp[i] - L[i, k] * L[j, k]</code>)</li>
</ul>
<p>If we dig into the second condition a bit more,<sup>19</sup> we notice that the second case can happen if and only if there is a path in <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D"><sup>20</sup> from node <img src="https://latex.codecogs.com/png.latex?i"> to node <img src="https://latex.codecogs.com/png.latex?j"> <img src="https://latex.codecogs.com/png.latex?%0Ai%20%5Crightarrow%20v_1%20%5Crightarrow%20v_2%20%5Crightarrow%20%5Cldots%20%5Crightarrow%20v_%7B%5Cell-1%7D%20%5Crightarrow%20j%0A"> with <img src="https://latex.codecogs.com/png.latex?v_1,%20%5Cldots%20v_%7B%5Cell-1%7D%20%3C%20%5Cmin%5C%7Bi,j%5C%7D">. The proof is an induction on <img src="https://latex.codecogs.com/png.latex?%5Cmin%5C%7Bi,j%5C%7D"> that I can’t be arsed typing out.</p>
<p>(As an aside, Theorem 2.8 in <a href="https://www.routledge.com/Gaussian-Markov-Random-Fields-Theory-and-Applications/Rue-Held/p/book/9781584884323">Rue and Held’s book</a> gives a very clearn nice statistical proof of this result.)</p>
<p>This is enough to see that fill in patterns are going to be a complex thing.</p>
<section id="a-toy-example" class="level3">
<h3 class="anchored" data-anchor-id="a-toy-example">A toy example</h3>
<p>Consider the following graph</p>
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-03-23-getting-jax-to-love-sparse-matrices/getting-jax-to-love-sparse-matrices_files/figure-html/unnamed-chunk-10-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>It’s pretty clear that there is a path between <img src="https://latex.codecogs.com/png.latex?(i,j)"> for every pair <img src="https://latex.codecogs.com/png.latex?(i,j)"> (the path goes through the fully connected vertex, which is labelled <code>1</code>).</p>
<p>And indeed, we can check this numerically<sup>21</sup></p>
<div class="cell">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(Matrix)</span>
<span id="cb8-2">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span></span>
<span id="cb8-3">A <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sparseMatrix</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">i =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>n, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,n)), </span>
<span id="cb8-4">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">j =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,n),<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>n), </span>
<span id="cb8-5">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>, </span>
<span id="cb8-6">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dims =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(n,n)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb8-7">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Diagonal</span>(n)</span>
<span id="cb8-8">A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#print the non-zero structrure</span></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>6 x 6 sparse Matrix of class "lgCMatrix"
                
[1,] | | | | | |
[2,] | | . . . .
[3,] | . | . . .
[4,] | . . | . .
[5,] | . . . | .
[6,] | . . . . |</code></pre>
</div>
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb10-1">L <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chol</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.matrix</span>(A))) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># transpose is for R reasons</span></span>
<span id="cb10-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(L, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fully dense!</span></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]  0.8  0.0  0.0  0.0  0.0    0
[2,] -0.3  1.0  0.0  0.0  0.0    0
[3,] -0.3 -0.1  1.0  0.0  0.0    0
[4,] -0.3 -0.1 -0.1  1.0  0.0    0
[5,] -0.3 -0.1 -0.1 -0.1  1.0    0
[6,] -0.3 -0.1 -0.1 -0.1 -0.1    1</code></pre>
</div>
</div>
<p>But what if we changed the labels of our vertices? What is the fill in pattern implied by a labelling where the fully collected vertex is labelled last instead of first?</p>
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://dansblog.netlify.app/posts/2022-03-23-getting-jax-to-love-sparse-matrices/getting-jax-to-love-sparse-matrices_files/figure-html/unnamed-chunk-12-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>There are now <em>no paths</em> from <img src="https://latex.codecogs.com/png.latex?i"> to <img src="https://latex.codecogs.com/png.latex?j"> that only go through lower-numbered vertices. So there is no fill in! We can check this numerically!<sup>22</sup></p>
<div class="cell">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb12-1">A2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> A[n<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,n<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb12-2">L2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chol</span>(A2))</span>
<span id="cb12-3">L2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>6 x 6 sparse Matrix of class "ltCMatrix"
                
[1,] | . . . . .
[2,] . | . . . .
[3,] . . | . . .
[4,] . . . | . .
[5,] . . . . | .
[6,] | | | | | |</code></pre>
</div>
</div>
</section>
<section id="so-what-is-the-lesson-here" class="level3">
<h3 class="anchored" data-anchor-id="so-what-is-the-lesson-here">So what is the lesson here?</h3>
<p>The lesson is that the sparse Cholesky algorithm cares <em>deeply</em> about what order the rows and columns of the matrix are in. This is why, <a href="https://dansblog.netlify.app/posts/2022-03-22-a-linear-mixed-effects-model/">in the previous post</a>, we put the dense rows and columns of <img src="https://latex.codecogs.com/png.latex?Q_%7Bu%20%5Cmid%20y,%20%5Ctheta%7D"> at the <em>end</em> of the matrix!</p>
<p>Luckily, a lot of clever graph theorists got on the job a while back and found a number of good algorithms for finding decent<sup>23</sup> ways to reorder the vertices of a graph to minimise fill in. There are two particularly well-known reorderings: the approximate minimum degree (AMD) reordering and the nested-dissection reordering. Neither of these are easily available in Python<sup>24</sup>.</p>
<p>AMD is a bog-standard black box that is a greedy reordering that tries to label the next vertex so that graph you get after removing that vertex and adding edges between all of the nodes that connect to that vertex isn’t too fucked.</p>
<p>Nested dissection tries to generalise the toy example above by finding nodes that separate the graph into two minimally connected components. The separator node is then labelled last. The process is repeated until you run out of nodes. This algorithm can be very efficient in some cases (eg if the graph is planar<sup>25</sup>, the sparse Cholesky algorithm using this reordering <a href="https://link.springer.com/article/10.1007/BF01396660">provably costs</a> at most <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(n%5E%7B3/2%7D)">).</p>
<p>Typically, you compute multiple reorderings<sup>26</sup> and pick the one that results in the least fill in.</p>
</section>
</section>
<section id="which-elements-of-the-cholesky-factor-are-non-zero-aka-symbolic-factorisation" class="level2">
<h2 class="anchored" data-anchor-id="which-elements-of-the-cholesky-factor-are-non-zero-aka-symbolic-factorisation">Which elements of the Cholesky factor are non-zero (aka symbolic factorisation)</h2>
<p>Ok. So I guess we’ve got to work out an algorithm for computing the non-zero structure of a sparse Cholesky factor. Naively, this seems easy: just use the Cholesky algorithm and mark which elements are non-zero.</p>
<p>But this is slow and inefficient. You’re not thinking like a programmer! Or a graph theorist. So let’s talk about how to do this efficiently.</p>
<section id="the-elimination-tree" class="level3">
<h3 class="anchored" data-anchor-id="the-elimination-tree">The elimination tree</h3>
<p>Let’s consider the graph <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D_L"> that contains the sparsity pattern of <img src="https://latex.codecogs.com/png.latex?L">. We <em>know</em> that the non-zero structure consists of all <img src="https://latex.codecogs.com/png.latex?(i,j)"> such that <img src="https://latex.codecogs.com/png.latex?i%20%3C%20j"> and there is a path <img src="https://latex.codecogs.com/png.latex?in%20%5Cmathcal%7BG%7D"> from <img src="https://latex.codecogs.com/png.latex?i"> to <img src="https://latex.codecogs.com/png.latex?j">. This means we could just compute that and make <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D_L">.</p>
<p>The thing that you should notice immediately is that there is a lot of redundancy in this structure. Remember that if <img src="https://latex.codecogs.com/png.latex?L_%7Bik%7D"> is non-zero and <img src="https://latex.codecogs.com/png.latex?L_%7Bjk%7D"> is also non-zero, then <img src="https://latex.codecogs.com/png.latex?L_%7Bij%7D"> is also non-zero.</p>
<p>This suggests that if we have <img src="https://latex.codecogs.com/png.latex?(i,k)"> and <img src="https://latex.codecogs.com/png.latex?(j,k)"> in the graph, we can remove the edge <img src="https://latex.codecogs.com/png.latex?(i,j)"> from <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D_L"> and still be able to work out that <img src="https://latex.codecogs.com/png.latex?L_%7Bij%7D"> is non-zero. This new graph is no longer the graph associated with <img src="https://latex.codecogs.com/png.latex?L"> but, for our purposes, it contains the same information.</p>
<p>If we continue pruning the graph this way, we are going to end up with a<sup>27</sup> rooted tree! From this tree, which is called the <em>elimination tree</em> of <img src="https://latex.codecogs.com/png.latex?A"><sup>28</sup> we can easily work out the non-zero structure of <img src="https://latex.codecogs.com/png.latex?L">.</p>
<p>The elimination tree is the fundamental structure needed to build an efficient sparse Cholesky algorithm. We are not going to use it to its full potential, but it is very cheap to compute (roughly<sup>29</sup> <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(%5Coperatorname%7Bnnz%7D(A))"> operations).</p>
<p>Once we have the elimination tree, it’s cheap to compute properties of <img src="https://latex.codecogs.com/png.latex?L"> like the number of non-zeros in a column, the exact sparsity pattern of every column, which columns can be grouped together to form supernodes<sup>30</sup>, and the approximate minimum degree reordering.</p>
<p>All of those things would be necessary for a modern, industrial-strength sparse Cholesky factorisation. But, and I cannot stress this enough, fuck that shit.</p>
</section>
<section id="the-symbolic-factorisation" class="level3">
<h3 class="anchored" data-anchor-id="the-symbolic-factorisation">The symbolic factorisation</h3>
<p>We are doing the easy version. Which is to say I <em>refuse</em> to do anything here that couldn’t be easily done in the early 90s. Specifically, we are going to use the version of this that<a href="http://heath.cs.illinois.edu/courses/cs598mh/george_liu.pdf">George, Liu, and Ng</a> wrote about<sup>31</sup> in the 90s. Understanding this is, I think, enough to see how things like supernodal factorisations work, but it’s so much less to keep track of.</p>
<p>The nice thing about this method is that we compute the elimination tree implicitly as we go along.</p>
<p>Let <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BL%7D_j"> be the non-zero entries in the <img src="https://latex.codecogs.com/png.latex?j">th column of <img src="https://latex.codecogs.com/png.latex?L">. Then our discussion in the previous section tells us that we need to determine the <em>reach</em> of the node i <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BReach%7D(j,%20S_j)%20=%20%5Cleft%5C%7Bi:%20%5Ctext%7Bthere%20is%20a%20path%20from%20%7D%20i%5Ctext%7B%20to%20%7Dj%5Ctext%7B%20through%20%7DS_j%5Cright%5C%7D,%0A"> where <img src="https://latex.codecogs.com/png.latex?S_j%20=%20%5C%7B1,%5Cldots,%20j-1%5C%7D">.</p>
<p>If we can compute the reach, then <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BL%7D_j%20%20=%20%5Ctext%7BReach%7D(j,%20S_j)%20%5Ccup%5C%7Bj%5C%7D">!</p>
<p>This is where the elimination tree comes in: it is an efficient representation of these sets. Indeed, <img src="https://latex.codecogs.com/png.latex?i%20%5Cin%20%5Ctext%7BReach%7D(j,%20S_j)"> <em>if and only if</em> there is a directed<sup>32</sup> path from <img src="https://latex.codecogs.com/png.latex?j"> to <img src="https://latex.codecogs.com/png.latex?i"> in the elimination tree! Now this tree is ordered<sup>33</sup> so that if <img src="https://latex.codecogs.com/png.latex?i"> is a child of <img src="https://latex.codecogs.com/png.latex?j"> (aka directly below it in the tree), then <img src="https://latex.codecogs.com/png.latex?i%20%3C%20j">. This means that its column in the Cholesky factorisation has already been computed. So all of the nodes that can be reached from <img src="https://latex.codecogs.com/png.latex?j"> by going through <img src="https://latex.codecogs.com/png.latex?i"> are in <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BL%7D_%7Bi%7D%20%5Ccap%20%5C%7Bj+1,%20%5Cldots,%20n%5C%7D">.</p>
<p>This means that we can compute the non-zeros of the <img src="https://latex.codecogs.com/png.latex?j">th column of <img src="https://latex.codecogs.com/png.latex?L"> efficiently from the non-zeros of all of the (very few, hopefully) columns associated with the child nodes of <img src="https://latex.codecogs.com/png.latex?j">.</p>
<p>So all that’s left is to ask “how can we find the child?” (as phones around the city start buzzing). Well, a little bit of thinking time should convince you that if <img src="https://latex.codecogs.com/png.latex?%0Ap%20=%20%5Cmin%5C%7Bi%20:%20i%20%5Cin%20%5Ctext%7BReach%7D(j,%20S_j)%20%5C%7D,%0A"> then <img src="https://latex.codecogs.com/png.latex?p"> is the parent of <img src="https://latex.codecogs.com/png.latex?i">. Or, the parent of column <img src="https://latex.codecogs.com/png.latex?j"> is the index of its first<sup>34</sup> non-zero below the diagonal.</p>
<p>We can put all of these observations together into the following algorithm. We assume that we are given the non-zero structure of <code>tril(A)</code> (aka the lower-triangle of <img src="https://latex.codecogs.com/png.latex?A">).</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb14-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb14-2"></span>
<span id="cb14-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> _symbolic_factor_csc(A_indices, A_indptr):</span>
<span id="cb14-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Assumes A_indices and A_indptr index the lower triangle of $A$ ONLY.</span></span>
<span id="cb14-5">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb14-6">  L_sym <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [np.array([], dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n)]</span>
<span id="cb14-7">  children <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [np.array([], dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n)]</span>
<span id="cb14-8">  </span>
<span id="cb14-9">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n):</span>
<span id="cb14-10">    L_sym[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_indices[A_indptr[j]:A_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb14-11">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> child <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> children[j]:</span>
<span id="cb14-12">      tmp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_sym[child][L_sym[child] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> j]</span>
<span id="cb14-13">      L_sym[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.unique(np.append(L_sym[j], tmp))</span>
<span id="cb14-14">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_sym[j]) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:</span>
<span id="cb14-15">      p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_sym[j][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb14-16">      children[p] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.append(children[p], j)</span>
<span id="cb14-17">        </span>
<span id="cb14-18">  L_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb14-19">  L_indptr[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.cumsum([<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> L_sym])</span>
<span id="cb14-20">  L_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.concatenate(L_sym)</span>
<span id="cb14-21">  </span>
<span id="cb14-22">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_indices, L_indptr</span>
<span id="cb14-23">  </span></code></pre></div>
</div>
<p>This was the first piece of Python I’ve written in about 13 years<sup>35</sup>, so it’s a bit shit. Nevertheless, it works. It is possible to replace the <code>children</code> structure by a linked list implemented in an n-dimensional integer array<sup>36</sup>, but why bother. This function is run once.</p>
<p>It’s also worth noting that the <code>children</code> array expresses the elimination tree. If we were going to do something with it explicitly, we could just spit it out and reshape it into a more useful data structure.</p>
<p>There’s one more piece of tedium before we can get to the main event: we need to do a deep copy of <img src="https://latex.codecogs.com/png.latex?A"> into the data structure of <img src="https://latex.codecogs.com/png.latex?L">. There is no<sup>37</sup> avoiding this.</p>
<p>Here is the code.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb15-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> _deep_copy_csc(A_indices, A_indptr, A_x, L_indices, L_indptr):</span>
<span id="cb15-2">  n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb15-3">  L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.zeros(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_indices))</span>
<span id="cb15-4">  </span>
<span id="cb15-5">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n):</span>
<span id="cb15-6">    copy_idx <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.nonzero(np.in1d(L_indices[L_indptr[j]:L_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]],</span>
<span id="cb15-7">                                  A_indices[A_indptr[j]:A_indptr[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]))[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb15-8">    L_x[L_indptr[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> copy_idx] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_x[A_indptr[j]:A_indptr[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb15-9">  <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_x</span></code></pre></div>
</div>
</section>
</section>
<section id="computing-the-cholesky-factorisation" class="level2">
<h2 class="anchored" data-anchor-id="computing-the-cholesky-factorisation">Computing the Cholesky factorisation</h2>
<p>It feels like we’ve been going for a really long time and we still don’t have a Cholesky factorisation. Mate. I feel your pain. Believe me.</p>
<p>But we are here now: everything is in place. We can now write down the Cholesky algorithm!</p>
<p>The algorithm is as it was before, with the main difference being that we now know two things:</p>
<ol type="1">
<li>We only need to update <code>tmp</code> with descendent of <code>j</code> in the elimination tree.</li>
<li>That’s it. That is the only thing we know.</li>
</ol>
<p>Of course, we could use the elimination tree to do this very efficiently, but, <em>as per my last email</em>, I do not care. So we will simply build up a copy of all of the descendants. This will obviously be less efficient, but it’s fine for our purposes. Let’s face it, we’re all going to die eventually.</p>
<p>So here it goes.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb16-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> _sparse_cholesky_csc_impl(L_indices, L_indptr, L_x):</span>
<span id="cb16-2">    n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_indptr) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb16-3">    descendant <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [[] <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n)]</span>
<span id="cb16-4">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> j <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n):</span>
<span id="cb16-5">        tmp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_x[L_indptr[j]:L_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb16-6">        <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> bebe <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> descendant[j]:</span>
<span id="cb16-7">            k <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bebe[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb16-8">            Ljk<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> L_x[bebe[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb16-9">            pad <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.nonzero(                                                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb16-10">              L_indices[L_indptr[k]:L_indptr[k<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> L_indices[L_indptr[j]])[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb16-11">            update_idx <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.nonzero(np.in1d(                                 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb16-12">              L_indices[L_indptr[j]:L_indptr[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]],                          <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb16-13">              L_indices[(L_indptr[k] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> pad):L_indptr[k<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]))[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb16-14">            tmp[update_idx] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tmp[update_idx] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>                              <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb16-15">              Ljk <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> L_x[(L_indptr[k] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> pad):L_indptr[k <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb16-16">            </span>
<span id="cb16-17">        diag <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.sqrt(tmp[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb16-18">        L_x[L_indptr[j]] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> diag</span>
<span id="cb16-19">        L_x[(L_indptr[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>):L_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tmp[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> diag</span>
<span id="cb16-20">        <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> idx <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(L_indptr[j] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, L_indptr[j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]):</span>
<span id="cb16-21">            descendant[L_indices[idx]].append((j, idx))</span>
<span id="cb16-22">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_x</span></code></pre></div>
</div>
<p>The one thing that you’ll note in this code<sup>38</sup> is that we are implicitly using things that we know about the sparsity structure of the <img src="https://latex.codecogs.com/png.latex?j">th column. In particular, we <em>know</em> that the sparsity structure of the <img src="https://latex.codecogs.com/png.latex?j">th column is the <em>union</em> of the relevant parts of the sparsity structure of their dependent columns. This allows a lot of our faster indexing to work.</p>
<p>Finally, we can put it all together.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb17-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> sparse_cholesky_csc(A_indices, A_indptr, A_x):</span>
<span id="cb17-2">    L_indices, L_indptr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _symbolic_factor_csc(A_indices, A_indptr)</span>
<span id="cb17-3">    L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _deep_copy_csc(A_indices, A_indptr, A_x, L_indices, L_indptr)</span>
<span id="cb17-4">    L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _sparse_cholesky_csc_impl(L_indices, L_indptr, L_x)</span>
<span id="cb17-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> L_indices, L_indptr, L_x</span></code></pre></div>
</div>
<p>Right. Let’s test it. We’re going to work on a particular<sup>39</sup> sparse matrix.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb18-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> scipy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> sparse</span>
<span id="cb18-2"></span>
<span id="cb18-3">n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span></span>
<span id="cb18-4">one_d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.diags([[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>n, [<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)], [<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb18-5">A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.kronsum(one_d, one_d) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> sparse.eye(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>n)</span>
<span id="cb18-6">A_lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.tril(A, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">format</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"csc"</span>)</span>
<span id="cb18-7">A_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_lower.indices</span>
<span id="cb18-8">A_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_lower.indptr</span>
<span id="cb18-9">A_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_lower.data</span>
<span id="cb18-10"></span>
<span id="cb18-11">L_indices, L_indptr, L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_cholesky_csc(A_indices, A_indptr, A_x)</span>
<span id="cb18-12">L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.csc_array((L_x, L_indices, L_indptr), shape <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span>
<span id="cb18-13"></span>
<span id="cb18-14">err <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>((A <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> L.transpose()).todense()))</span>
<span id="cb18-15"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Error in Cholesky is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>err<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Error in Cholesky is 3.871041263071504e-12</code></pre>
</div>
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb20-1">nnz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_x)</span>
<span id="cb20-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Number of non-zeros is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>nnz<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> (fill in of </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_x) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_x)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">)"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Number of non-zeros is 125049 (fill in of 117649)</code></pre>
</div>
</div>
<p>Finally, let’s demonstrate that we can reduce the amount of fill-in with a reordering. Obviously, the built in permutation in <code>scipy</code> is crappy, so we will not see much of a difference. But nevertheless. It’s there.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb22-1">perm <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.csgraph.reverse_cuthill_mckee(A, symmetric_mode<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb22-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(perm)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[2499 2498 2449 ...   50    1    0]</code></pre>
</div>
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb24-1">A_perm <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A[perm[:,<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>], perm]</span>
<span id="cb24-2">A_perm_lower <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.tril(A_perm, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">format</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"csc"</span>)</span>
<span id="cb24-3">A_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_perm_lower.indices</span>
<span id="cb24-4">A_indptr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_perm_lower.indptr</span>
<span id="cb24-5">A_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> A_perm_lower.data</span>
<span id="cb24-6"></span>
<span id="cb24-7">L_indices, L_indptr, L_x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse_cholesky_csc(A_indices, A_indptr, A_x)</span>
<span id="cb24-8">L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sparse.csc_array((L_x, L_indices, L_indptr), shape <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span>
<span id="cb24-9">err <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>((A_perm <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> L <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> L.transpose()).todense()))</span>
<span id="cb24-10"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Error in Cholesky is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>err<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Error in Cholesky is 3.0580421951974465e-12</code></pre>
</div>
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb26-1">nnz_rcm <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_x)</span>
<span id="cb26-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Number of non-zeros is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>nnz_rcm<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> (fill in of </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(L_x) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(A_x)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">),</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">which is less than the unpermuted matrix, which had </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>nnz<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> non-zeros."</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Number of non-zeros is 87025 (fill in of 79625),
which is less than the unpermuted matrix, which had 125049 non-zeros.</code></pre>
</div>
</div>
<p>And finally, let’s check that we’ve not made some fake non-zeros. To do this we need to wander back into <code>R</code> because <code>scipy</code> doesn’t have a sparse Cholesky<sup>40</sup> factorisation.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb28-1">ind <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> py<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>A_indices</span>
<span id="cb28-2">indptr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> py<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>A_indptr</span>
<span id="cb28-3">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(py<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>A_x)</span>
<span id="cb28-4">A <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sparseMatrix</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">i =</span> ind <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p =</span> indptr, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">symmetric =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb28-5"></span>
<span id="cb28-6">L <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chol</span>(A))</span>
<span id="cb28-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(L<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span>i <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> py<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>L_indices)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0</code></pre>
</div>
<div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb30-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(L<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span>p <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> py<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>L_indptr)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0</code></pre>
</div>
</div>
<p>Perfect.</p>
</section>
<section id="ok-we-are-done-for-today." class="level2">
<h2 class="anchored" data-anchor-id="ok-we-are-done-for-today.">Ok we are done for today.</h2>
<p>I was hoping that we were going to make it to the JAX implementation, but this is long enough now. And I suspect that there will be some <em>issues</em> that are going to come up.</p>
<p>If you want some references, I recommend:</p>
<ul>
<li><a href="http://heath.cs.illinois.edu/courses/cs598mh/george_liu.pdf">George, Liu, and Ng’s notes</a> (warning: FORTRAN).</li>
<li><a href="https://epubs.siam.org/doi/book/10.1137/1.9780898718881">Timothy Davis’ book</a> (warning: pure C).</li>
<li>Liu’s <a href="https://epubs.siam.org/doi/10.1137/0611010">survey paper about elimination trees</a> (warning: trees).</li>
<li><a href="https://www.routledge.com/Gaussian-Markov-Random-Fields-Theory-and-Applications/Rue-Held/p/book/9781584884323">Rue and Held’s book</a> (Statistically motivated).</li>
</ul>
<p>Obviously this is a massive area and I obviously did not do it justice in a single blog post. It’s well worth looking further into. It is very cool. And obviously, <em>I go through all this</em><sup>41</sup> to get a prototype that I can play with all of the bits of. For the love of god, use Cholmod or Eigen or MUMPS or literally anything else. The only reason to write these yourself is to learn how to understand it.</p>


</section>
</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>The old numerical linear algebra naming conventions: Symmetric letters are symmetric matrices, upper case is a matrix, lower case is a vector, etc etc etc. Obviously, all conventions in statistics go against this so who really cares. Burn it all down.↩︎</p></li>
<li id="fn2"><p>Go girl. Give us nothing.↩︎</p></li>
<li id="fn3"><p>or scalars↩︎</p></li>
<li id="fn4"><p>This is actually how you check if a matrix is SPD. Such a useful agorithm!↩︎</p></li>
<li id="fn5"><p>This variant is called the left-looking Cholesky. There are 6 distinct ways to rearrange these computations that lead to algorithms that are well-adapted to different structures. The left-looking algorithm is well adapted to matrices stored column-by-column. But it is not the only one! The variant of the sparse Cholesky in Matlab and Eigen is the upward-looking Cholesky. CHOLMOD uses the left-looking Cholesky (because that’s how you get supernodes). MUMPS uses the right-looking variant. Honestly this is a fucking fascinating wormhole you can fall down. A solid review of some of the possibilities is in Chapter 4 of Tim Davis’ book.↩︎</p></li>
<li id="fn6"><p>Here <code>A</code> is a <img src="https://latex.codecogs.com/png.latex?n%5Ctimes%20n"> matrix and <code>u'</code> is the transpose of the vector <code>u</code>.↩︎</p></li>
<li id="fn7"><p>You can also see that if <img src="https://latex.codecogs.com/png.latex?A"> is stored in memory by stacking the columns, this algorithm is set up to be fairly memory efficient. Of course, if you find yourself caring about what your cache is doing, you’ve gone astray somewhere. That is why professionals have coded this up (only a fool competes with LAPACK).↩︎</p></li>
<li id="fn8"><p>The ultimate language of scientific computing. Do not slide into my DMs and suggest Julia is.↩︎</p></li>
<li id="fn9"><p>You may be thinking <em>well surely we have to use a row-major ordering</em>. But honey let me tell you. We are building our own damn storage method, so we can order it however we bloody want. Also, somewhere down the line I’m going to do this in Eigen, which is column major by default.↩︎</p></li>
<li id="fn10"><p>If you look at the algorithm, you’ll see that we only need to store the diagonal and the entries below. This is enough (in general) because we know the matrix is symmetric!↩︎</p></li>
<li id="fn11"><p>CPU operations are a lot less memory-limited than they used to be, but nevertheless it piles up. GPU operations still very much are, but sparse matrix operations mostly don’t have the arithmetic intensity to be worth putting on a GPU.↩︎</p></li>
<li id="fn12"><p>(NB: zero-based indexing!) This is a superfluous entry (the information is available elsewhere), but having it in makes life just a million times easier because you don’t have to treat the final column separately!.↩︎</p></li>
<li id="fn13"><p>ZERO BASED, PYTHON SLICES↩︎</p></li>
<li id="fn14"><p>I am not a headless torso that can’t host. I differentiate.↩︎</p></li>
<li id="fn15"><p>We only care about undirected graphs↩︎</p></li>
<li id="fn16"><p>Or from <img src="https://latex.codecogs.com/png.latex?0"> to <img src="https://latex.codecogs.com/png.latex?n-1"> if you have hate in your heart and darkness in your soul.↩︎</p></li>
<li id="fn17"><p>To get from the previous version of the algorithm to this, we unwound all of those beautiful vectorised matrix-vector products. This would be a terrible idea if we were doing a dense Cholesky, but as general rule if you are implementing your own dense Cholesky factorisation you have already committed to a terrible idea. (The same, to be honest, is true for sparse Choleskys. But nevertheless, she persisted.)↩︎</p></li>
<li id="fn18"><p>or trees or really any discrete structure.↩︎</p></li>
<li id="fn19"><p>Don’t kid yourself, <a href="https://epubs.siam.org/doi/10.1137/0205021">we look this shit up</a>.↩︎</p></li>
<li id="fn20"><p>This means that all of the pairs <img src="https://latex.codecogs.com/png.latex?(i,%20v_1)">, <img src="https://latex.codecogs.com/png.latex?(v_i,%20v_%7Bi+1%7D)"> and <img src="https://latex.codecogs.com/png.latex?(v_%7B%5Cell-1%7D,%20v_j)"> are all in the edge set <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BE%7D">↩︎</p></li>
<li id="fn21"><p>The specific choices building this matrix are to make sure it’s positive definite. The transpose is there because in R, <code>R &lt;- chol(A)</code> returns an <em>upper</em> triangular matrix that satisfies <img src="https://latex.codecogs.com/png.latex?A%20=%20R%5ETR">. I assume this is because C has row-major storage, but I honestly don’t care enough to look it up.↩︎</p></li>
<li id="fn22"><p>Here the <code>pivot = FALSE</code> option is needed because the default for a sparse Cholesky decomposition in R is to re-order the vertices to try to minimise the fill-in. But that goes against the example!↩︎</p></li>
<li id="fn23"><p>Finding the minimum fill reordering is NP-hard, so everything is heuristic.↩︎</p></li>
<li id="fn24"><p>scipy has the reverse Cuthill-McKee reordering—which is shit—easily available. As far as I can tell, the easiest way to get AMD out is to factorise a sparse matrix in scipy and pull the reordering out. If I were less lazy, I’d probably just bind SuiteSparse’s AMD algorithm, which is permissively licensed. But nah. The standard nested-dissection implementation is in the METIS package, which used to have a shit license but is now Apache2.0. Good on you METIS!↩︎</p></li>
<li id="fn25"><p>and some other cases↩︎</p></li>
<li id="fn26"><p>They are cheap to compute↩︎</p></li>
<li id="fn27"><p>Actually, you get a forest in general. You get a tree if <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D"> has a single connected component, otherwise you get a bunch of disjoint trees. But we still call it a tree because maths is wild.↩︎</p></li>
<li id="fn28"><p>Fun fact: it is the spanning tree of the graph of <img src="https://latex.codecogs.com/png.latex?L%20+%20L%5ET">. Was that fun? I don’t think that was fun.↩︎</p></li>
<li id="fn29"><p>This is morally but not actually true. There is a variant (slower in practice, faster asymptotically), that costs <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D%5Cleft(%5Coperatorname%7Bnnz%7D(A)%5Calpha(%5Coperatorname%7Bnnz%7D(A),%20n)%5Cright)">, where <img src="https://latex.codecogs.com/png.latex?%5Calpha(m,n)"> is the inverse Ackerman function, which is a very slowly growing function that is always equal to 4 for our purposes. The actual version that people use is technically <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(%5Coperatorname%7Bnnz%7D(A)%20%5Clog%20n)">, but is faster and the <img src="https://latex.codecogs.com/png.latex?%5Clog%20n"> is never seen in practice.↩︎</p></li>
<li id="fn30"><p>This is beyond the scope, but basically it’s trying to find groups of nodes that can be eliminated as a block using dense matrix operations. This leads to a much more efficient algorithm.↩︎</p></li>
<li id="fn31"><p>There is, of course, a typo in the algorithm we’re about to implement. We’re using the correct version from <a href="https://epubs.siam.org/doi/10.1137/0611010">here</a>.↩︎</p></li>
<li id="fn32"><p>from parent to child (aka in descending node order)↩︎</p></li>
<li id="fn33"><p>by construction↩︎</p></li>
<li id="fn34"><p>If there are no non-zeros below the diagonal, then we have a root of one of the trees in the forest!↩︎</p></li>
<li id="fn35"><p>I did not make it prettier because a) I think it’s useful to show bad code sometimes, and b) I can’t be arsed. The real file has some comments in it because I am not a monster, but in some sense this whole damn blog is a code comment.↩︎</p></li>
<li id="fn36"><p>The George, Liu, Ng book does that in FORTRAN. Enjoy decoding it.↩︎</p></li>
<li id="fn37"><p>Well, there is some avoiding this. If the amount of fill in is small, it may be more efficient to do insertions instead. But again, I am not going to bother. And anyway. If <code>A_x</code> is a JAX array, it’s going to be immutable and we are not going to be able to avoid the deep copy.↩︎</p></li>
<li id="fn38"><p>and in the deep copy code↩︎</p></li>
<li id="fn39"><p>This is the discretisation of a 2D laplacian on a square with some specific boundary conditions↩︎</p></li>
<li id="fn40"><p>Cholmod, which is the natural choice, is GPL’d, which basically means it can’t be used in something like Scipy. R does not have this problem.↩︎</p></li>
<li id="fn41"><p>Björk voice↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2022,
  author = {Simpson, Dan},
  title = {Sparse {Matrices} 2: {An} Invitation to a Sparse {Cholesky}
    Factorisation},
  date = {2022-03-31},
  url = {https://dansblog.netlify.app/2022-03-23-getting-jax-to-love-sparse-matrices},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2022" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2022. <span>“Sparse Matrices 2: An Invitation to a Sparse
Cholesky Factorisation.”</span> March 31, 2022. <a href="https://dansblog.netlify.app/2022-03-23-getting-jax-to-love-sparse-matrices">https://dansblog.netlify.app/2022-03-23-getting-jax-to-love-sparse-matrices</a>.
</div></div></section></div> ]]></description>
  <category>Sparse matrices</category>
  <category>Sparse Cholesky factorisation</category>
  <category>Python</category>
  <guid>https://dansblog.netlify.app/posts/2022-03-23-getting-jax-to-love-sparse-matrices/getting-jax-to-love-sparse-matrices.html</guid>
  <pubDate>Wed, 30 Mar 2022 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2022-03-23-getting-jax-to-love-sparse-matrices/tori.JPG" medium="image"/>
</item>
<item>
  <title>Sparse Matrices 1: The linear algebra of linear mixed effects models and their generalisations</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2022-03-22-a-linear-mixed-effects-model/a-linear-mixed-effects-model.html</link>
  <description><![CDATA[ 





<p>Back in the early days of the pandemic I though “I’ll have a pandemic project”. I never did my pandemic project.</p>
<p>But I did think briefly about what it would be. I want to get the types of models I like to use in everyday life efficiently implemented inside Stan. These models encapsulate (generalised) linear mixed models<sup>1</sup>, (generalised) additive models, Markovian spatial models<sup>2</sup>, and other models. A good description of the types of models I’m talking about <a href="https://arxiv.org/abs/1604.00860">can be found here</a>.</p>
<p>Many of these models can be solved efficiently via <a href="https://www.r-inla.org/">INLA</a><sup>3</sup>, a great R package for fast posterior inference for an extremely useful set of Bayesian models. In focussing on a particular class of Bayesian models, INLA leverages a bunch of structural features to make a very very fast and accurate posterior approximation. I love this stuff. It’s where I started my stats career.</p>
<p>None of the popular MCMC packages really implement the lessons learnt from INLA to help speed up their inference. I want to change that.</p>
<p>The closest we’ve gotten so far is the <a href="https://arxiv.org/abs/2004.12550">nice work Charles Margossian has been doing</a> to get Laplace approximations into Stan.</p>
<p>But I want to focus on the other key tool in INLA: <em>using sparse linear algebra to make things fast and scalable</em>.</p>
<p>I usually work with Stan, but the scale of the C++ coding<sup>4</sup> required to even tell if these ideas are useful in Stan was honestly just too intimidating.</p>
<p>But the other day I remembered Python. Now I am a shit Python programmer<sup>5</sup> and I’m not fully convinced I ever achieved object permanence. So it took me a while to remember it existed. But eventually I realised that I could probably make a decent prototype<sup>6</sup> of this idea using some modern Python tools (specifically JAX). I checked with some PyMC devs and they pointed me at what the appropriate bindings would look like.</p>
<p>So I decided to go for it.</p>
<p>Of course, I’m pretty busy and these sort of projects have a way of dying in the arse. So I’m motivating myself by blogging it. I do not know if these ideas will work<sup>7</sup>. I do not know if my coding skills are up to it<sup>8</sup>. I do not know if I will lose interest. But it should be fun to find out.</p>
<p>So today I’m going to do the easiest part: I’m going to scope out the project. Read on, MacDuff.</p>
<section id="a-generalised-linear-mixed-effects-ish-model" class="level2">
<h2 class="anchored" data-anchor-id="a-generalised-linear-mixed-effects-ish-model">A generalised linear mixed effects-ish model</h2>
<p>If you were to open the correct textbook, or the <a href="https://www.jstatsoft.org/article/view/v067i01">Bates, Mächler, Boler, and Walker 2015 masterpiece paper</a> that describes the workings of <code>lme4</code>, you will see the linear mixed model written as <img src="https://latex.codecogs.com/png.latex?%0Ay%20=%20X%5Cbeta%20+%20Zb%20+%20%5Cepsilon,%0A"> where</p>
<ul>
<li>the columns of <img src="https://latex.codecogs.com/png.latex?X"> contain the covariates<sup>9</sup>,</li>
<li><img src="https://latex.codecogs.com/png.latex?%5Cbeta"> is a vector of unknown regression coefficients,</li>
<li><img src="https://latex.codecogs.com/png.latex?Z"> is a known matrix that describes the random effects (basically which observation is linked to which random effect),</li>
<li><img src="https://latex.codecogs.com/png.latex?b%20%5Csim%20N(0,%20%5CSigma_b)"> is the vector of random effects with some unknown covariance matrix <img src="https://latex.codecogs.com/png.latex?%5CSigma_b">,</li>
<li>and <img src="https://latex.codecogs.com/png.latex?%5Cepsilon%20%5Csim%20N(0%20,%5Csigma%5E2%20W)"> is the observation noise (here <img src="https://latex.codecogs.com/png.latex?W"> is a known diagonal matrix<sup>10</sup>).</li>
</ul>
<p>But unlike Doug Bates and his friends, my aim is to do Bayesian computation. In this situation, <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> <em>also</em> has a prior on it! In fact, I’m going to put a Gaussian prior <img src="https://latex.codecogs.com/png.latex?%5Cbeta%20%5Csim%20N(0,%20R)"> on it, for some typically known<sup>11</sup> matrix <img src="https://latex.codecogs.com/png.latex?R">.</p>
<p>This means that I can treat <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> and <img src="https://latex.codecogs.com/png.latex?b"> the same<sup>12</sup> way! And I’m going to do just that. I’m going to put them together into a vector <img src="https://latex.codecogs.com/png.latex?u%20=%20(%5Cbeta%5ET,%20b%5ET)%5ET">. Because the prior on <img src="https://latex.codecogs.com/png.latex?u"> is Gaussian<sup>13</sup>, I’m sometimes going to call <img src="https://latex.codecogs.com/png.latex?u"> the <em>Gaussian component</em> or even the <em>latent</em><sup>14</sup> Gaussian component.</p>
<p>Now that I’ve smooshed my fixed and random effects together, I don’t really need to keep <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Z"> separate. So I’m going push them together into a rectangular matrix <img src="https://latex.codecogs.com/png.latex?%0AA%20=%20%5BX%20%5Cvdots%20Z%5D.%0A"></p>
<p>This allows us to re-write the model as <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Ay%20%5Cmid%20u,%20%5Csigma%20&amp;%20%5Csim%20N(A%20u,%20%5Csigma%5E2%20W)%5C%5C%0Au%20%5Cmid%20%5Ctheta%20&amp;%5Csim%20N(0,%20Q(%5Ctheta)%5E%7B-1%7D).%0A%5Cend%7Balign*%7D"></p>
<p><em>What the hell is <img src="https://latex.codecogs.com/png.latex?Q(%5Ctheta)"> and why are we suddenly parameterising a multivariate normal distribution by the inverse of its covariance matrix (which, if you’re curious, is known as a <em>precision</em> matrix)???</em></p>
<p>I will take your questions in reverse order.</p>
<p>We are parameterising by the precision<sup>15</sup> matrix because it will simplify our formulas and lead to faster computations. This will be a major topic for us later!</p>
<p>As to what <img src="https://latex.codecogs.com/png.latex?Q(%5Ctheta)"> is, it is the matrix <img src="https://latex.codecogs.com/png.latex?%0AQ(%5Ctheta)%20=%20%5Cbegin%7Bpmatrix%7D%20%5CSigma_b%5E%7B-1%7D%20&amp;%200%20%5C%5C%200%20&amp;%20R%5E%7B-1%7D%5Cend%7Bpmatrix%7D%0A"> and <img src="https://latex.codecogs.com/png.latex?%5Ctheta%20=%20(%5Csigma,%20%5CSigma_b)"> is the collection of all<sup>16</sup> non-Gaussian parameters in the model. Later, we will assume<sup>17</sup> that <img src="https://latex.codecogs.com/png.latex?%5CSigma_b"> has quite a lot of structure.</p>
<p>This is a <em>very</em> generic model. It happily contains things like</p>
<ul>
<li>Linear regression!</li>
<li>Linear regression with horseshoe priors!</li>
<li>Linear mixed effects models!</li>
<li>Linear regression with splines (smoothing or basis)!</li>
<li>Spatial models like <a href="https://arxiv.org/abs/1601.01180">ICARs, BYMs</a>, etc etc etc</li>
<li>Gaussian processes (with the caveat that we’re mostly focussing on those that can be formulated via precision matrices rather than covariance matrices. <a href="https://dansblog.netlify.app/posts/2021-11-24-getting-into-the-subspace/">A whole blog post, I have.</a>)</li>
<li>Any combination of these things!</li>
</ul>
<p>So if I manage to get this implemented efficiently, all of these models will become efficient too. All it will cost is a truly shithouse<sup>18</sup> interface.</p>
<p>The only downside of this degree of flexibility compared to just implementing a straight linear mixed model with <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Z"> and <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> and <img src="https://latex.codecogs.com/png.latex?b"> all living separately is that there are a couple of tricks<sup>19</sup> to improve numerical stability that we can’t use.</p>
</section>
<section id="lets-get-the-posterior" class="level2">
<h2 class="anchored" data-anchor-id="lets-get-the-posterior">Let’s get the posterior!</h2>
<p>The nice thing about thing about this model is that it is a normal likelihood with a normal prior, so we can directly compute two key quantities:</p>
<ul>
<li><p>The “full conditional” distribution <img src="https://latex.codecogs.com/png.latex?p(u%20%5Cmid%20y,%20%5Ctheta)">, which is useful for getting posterior information about <img src="https://latex.codecogs.com/png.latex?b"> and <img src="https://latex.codecogs.com/png.latex?%5Cbeta">, and</p></li>
<li><p>The marginal posterior <img src="https://latex.codecogs.com/png.latex?p(%5Ctheta%20%5Cmid%20y)">.</p></li>
</ul>
<p>This means that we do not need to do MCMC on the joint space <img src="https://latex.codecogs.com/png.latex?(u,%20%5Ctheta)">! We can instead write a model to draw samples from <img src="https://latex.codecogs.com/png.latex?p(%5Ctheta%20%5Cmid%20y)">, which is much lower-dimensional and easier<sup>20</sup> to sample from, and then compute the joint posterior by sampling from the full conditional.</p>
<p>I talked a little about the mechanics of this in a <a href="https://dansblog.netlify.app/posts/2021-10-14-priors2/">previous blog post about conjugate priors</a>, but let’s do the derivations. Why? Because they’re not too hard and it’s useful to have them written out somewhere.</p>
<section id="the-full-conditional" class="level3">
<h3 class="anchored" data-anchor-id="the-full-conditional">The full conditional</h3>
<p>First we need to compute <img src="https://latex.codecogs.com/png.latex?p(u%20%5Cmid%20y%20,%20%5Ctheta)">. The first thing that we note is that conditional distributions are always proportional to the joint distribution (we’re literally just pretending some things are constant), so we get <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Ap(u%20%5Cmid%20y%20,%20%5Ctheta)%20&amp;%5Cpropto%20p(y%20%5Cmid%20u,%20%5Ctheta)%20p(u%20%5Cmid%20%5Ctheta)%20p(%5Ctheta)%20%5C%5C%0A&amp;%5Cpropto%20%5Cexp%5Cleft%5B-%5Cfrac%7B1%7D%7B2%5Csigma%5E2%7D%20(y%20-%20Au)%5ETW%5E%7B-1%7D(y-Au)%5Cright%5D%5Cexp%5Cleft%5B-%5Cfrac%7B1%7D%7B2%7Du%5ETQ(%5Ctheta)u%5Cright%5D.%0A%5Cend%7Balign*%7D"></p>
<p>Now we just need to expand things out and work out what the mean and the precision matrix of <img src="https://latex.codecogs.com/png.latex?p(u%20%5Cmid%20y,%20%5Ctheta%20)"> (which is Gaussian by conjugacy!) are.</p>
<p>Computing posterior distributions by hand is a dying<sup>21</sup> art. So my best and only advice to you: don’t be a hero. Just pattern match like the rest of us. To do this, we need to know what the density of a multivarite normal distribution looks like <em>deep</em> down in its soul.</p>
<p>Behold: the ugly <code>div</code> box!<sup>22</sup></p>
<div class="note">
<p>If <img src="https://latex.codecogs.com/png.latex?u%20%5Csim%20N(m,%20P%5E%7B-1%7D)">, then <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Ap(u)%20&amp;%5Cpropto%20%5Cexp%5Cleft%5B-%20%5Cfrac%7B1%7D%7B2%7D(u%20-%20m)%5ETP(u-m)%5Cright%5D%20%5C%5C%0A&amp;%5Cpropto%20%5Cexp%5Cleft%5B-%20%5Cfrac%7B1%7D%7B2%7Du%5ETPu%20+%20m%5ETPu%5Cright%5D,%0A%5Cend%7Balign*%7D"> where I just dropped all of the terms that didn’t involve <img src="https://latex.codecogs.com/png.latex?u">.</p>
</div>
<p>This means the plan is to</p>
<ol type="1">
<li>Expand out the quadratics in the exponential term so we get something that looks like <img src="https://latex.codecogs.com/png.latex?%5Cexp%5Cleft%5B-%5Cfrac%7B1%7D%7B2%7Du%5ETPu%20+%20z%5ETu%5Cright%5D"></li>
<li>The matrix <img src="https://latex.codecogs.com/png.latex?P"> will be the precision matrix of <img src="https://latex.codecogs.com/png.latex?u%20%5Cmid%20y,%20%5Ctheta">.</li>
<li>The mean of <img src="https://latex.codecogs.com/png.latex?%5Cmu%20%5Cmid%20y,%20%5Ctheta"> is <img src="https://latex.codecogs.com/png.latex?P%5E%7B-1%7Dz">.</li>
</ol>
<p>So let’s do it!</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Ap(u%20%5Cmid%20y%20,%20%5Ctheta)%20&amp;%5Cpropto%20%5Cexp%5Cleft%5B-%5Cfrac%7B1%7D%7B2%5Csigma%5E2%7D%20u%5ETA%5ETW%5E%7B-1%7DAu%20+%20%5Cfrac%7B1%7D%7B%5Csigma%5E2%7D(A%5ETW%5E%7B-1%7Dy)%5ETu%5Cright%5D%5Cexp%5Cleft%5B-%5Cfrac%7B1%7D%7B2%7Du%5ETQ(%5Ctheta)u%5Cright%5D%20%5C%5C%0A&amp;%5Cpropto%20%5Cexp%5Cleft%5B-%5Cfrac%7B1%7D%7B2%7Du%5ET%5Cleft(Q%20+%20%5Cfrac%7B1%7D%7B%5Csigma%5E2%7DA%5ETW%5E%7B-1%7DA%5Cright)u%20+%20%20%5Cfrac%7B1%7D%7B%5Csigma%5E2%7D(A%5ETW%5E%7B-1%7Dy)%5ETu%5Cright%5D.%0A%5Cend%7Balign*%7D"></p>
<p>This means that <img src="https://latex.codecogs.com/png.latex?p(u%20%5Cmid%20y%20,%5Ctheta)"> is multivariate normal with</p>
<ul>
<li><p>precision matrix <img src="https://latex.codecogs.com/png.latex?Q_%7Bu%5Cmid%20y,%5Ctheta%7D(%5Ctheta)%20=%20%5Cleft(Q(%5Ctheta)%20+%20%5Cfrac%7B1%7D%7B%5Csigma%5E2%7DA%5ETW%5E%7B-1%7DA%5Cright)"> and</p></li>
<li><p>mean<sup>23</sup> <img src="https://latex.codecogs.com/png.latex?%5Cmu_%7Bu%5Cmid%20y,%5Ctheta%7D(%5Ctheta)%20=%20%5Cfrac%7B1%7D%7B%5Csigma%5E2%7D%20Q_%7Bu%5Cmid%20y,%5Ctheta%7D(%5Ctheta)%5E%7B-1%7D%20A%5ETW%5E%7B-1%7Dy">.</p></li>
</ul>
<p>This means if I build an MCMC scheme to give me <img src="https://latex.codecogs.com/png.latex?B"> samples <img src="https://latex.codecogs.com/png.latex?%5Ctheta_b%20%5Csim%20p(%5Ctheta%20%5Cmid%20y)">, <img src="https://latex.codecogs.com/png.latex?b%20=%201,%20%5Cldots,%20B">, then I can turn them into <img src="https://latex.codecogs.com/png.latex?B"> samples <img src="https://latex.codecogs.com/png.latex?(%5Ctheta_b,%20u_b)"> from <img src="https://latex.codecogs.com/png.latex?p(%5Ctheta,%20u%20%5Cmid%20y)"> by doing the following.</p>
<div class="note">
<p>For <img src="https://latex.codecogs.com/png.latex?b%20=%201,%20%5Cldots,%20B"></p>
<ul>
<li><p>Simulate <img src="https://latex.codecogs.com/png.latex?u_b%20%5Csim%20N%5Cleft(%5Cmu_%7Bu%5Cmid%20y,%5Ctheta%7D(%5Ctheta_b),%20Q_%7Bu%5Cmid%20y,%5Ctheta%7D(%5Ctheta_b)%5E%7B-1%7D%5Cright)"></p></li>
<li><p>Store the pair <img src="https://latex.codecogs.com/png.latex?(%5Ctheta_b,%20u_b)"></p></li>
</ul>
</div>
<p>Easy<sup>24</sup> as!</p>
</section>
<section id="writing-down-ptheta-mid-y" class="level3">
<h3 class="anchored" data-anchor-id="writing-down-ptheta-mid-y">Writing down <img src="https://latex.codecogs.com/png.latex?p(%5Ctheta%20%5Cmid%20y)"></h3>
<p>So now we just<sup>25</sup> have to get the marginal posterior for the non-Gaussian parameters <img src="https://latex.codecogs.com/png.latex?%5Ctheta">. We only need it up to a constant of proportionality, so we can express the joint probability <img src="https://latex.codecogs.com/png.latex?p(y,%20u,%20%5Ctheta)"> in two equivalent ways to get <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Ap(y,%20u%20,%20%5Ctheta)%20&amp;=%20p(y,%20u,%20%5Ctheta)%20%5C%5C%0Ap(u%20%5Cmid%20%5Ctheta,%20y)%20p(%5Ctheta%20%5Cmid%20y)%20p(y)%20&amp;=%20p(y%20%5Cmid%20u,%20%5Ctheta)%20p(u%20%5Cmid%20%5Ctheta)p(%5Ctheta).%20%5C%5C%0A%5Cend%7Balign*%7D"></p>
<p>Rearranging, we get <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Ap(%5Ctheta%20%5Cmid%20y)%20&amp;=%20%5Cfrac%7Bp(y%20%5Cmid%20u,%20%5Ctheta)%20p(u%20%5Cmid%20%5Ctheta)p(%5Ctheta)%7D%7Bp(u%20%5Cmid%20%5Ctheta,%20y)p(y)%7D%20%5C%5C%0A&amp;%5Cpropto%20%5Cfrac%7Bp(y%20%5Cmid%20u,%20%5Ctheta)%20p(u%20%5Cmid%20%5Ctheta)p(%5Ctheta)%7D%7Bp(u%20%5Cmid%20%5Ctheta,%20y)%7D.%0A%5Cend%7Balign*%7D"></p>
<p>This is a very nice relationship between the functional forms of the various densities we happen to know and the density we are trying to compute. This means that if you have access to the full conditional distribution<sup>26</sup> for <img src="https://latex.codecogs.com/png.latex?u"> you can marginalise <img src="https://latex.codecogs.com/png.latex?u"> out. No weird integrals required.</p>
<p>But there’s one oddity: there is a <img src="https://latex.codecogs.com/png.latex?u"> on the right hand side, but no <img src="https://latex.codecogs.com/png.latex?u"> on the left hand side. What we have actually found is a whole continuum of functions that are proportional to <img src="https://latex.codecogs.com/png.latex?p(%5Ctheta%20%5Cmid%20y)">. It truly does not matter which one we choose.</p>
<p>But some choices make the algebra slightly nicer. (And remember, I’m gonna have to implement this later, so I should probably keep and eye on that.)</p>
<p>A good<sup>27</sup> generic choice is <img src="https://latex.codecogs.com/png.latex?u%20=%20%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)">.</p>
<p>The algebra here can be a bit tricky<sup>28</sup>, so let’s write out each function evaluated at <img src="https://latex.codecogs.com/png.latex?u%20=%20%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)">.</p>
<p>The bit from the likelihood is <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Ap(y%20%5Cmid%20u%20=%20%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta),%20%5Ctheta)%20&amp;%5Cpropto%20%5Csigma%5E%7B-n%7D%20%5Cexp%5Cleft%5B-%5Cfrac%7B1%7D%7B2%5Csigma%5E2%7D(y%20-%20A%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta))%5ETW%5E%7B-1%7D(y-%20%20A%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta))%5Cright%5D%5C%5C%0A&amp;%5Cpropto%20%5Csigma%5E%7B-n%7D%5Cexp%5Cleft%5B%5Cfrac%7B-1%7D%7B2%5Csigma%5E2%7D%20%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%5ETA%5ETW%5E%7B-1%7DA%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%20+%20%5Cfrac%7B1%7D%7B%5Csigma%5E2%7D%20y%5ET%20W%5E%7B-1%7DA%20%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%5Cright%5D,%0A%5Cend%7Balign*%7D"> where <img src="https://latex.codecogs.com/png.latex?n"> is the length of <img src="https://latex.codecogs.com/png.latex?y">.</p>
<p>The bit from the prior on <img src="https://latex.codecogs.com/png.latex?u"> is <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Ap(%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%20%5Cmid%20%5Ctheta%20)%0A%5Cpropto%20%7CQ(%5Ctheta)%7C%5E%7B1/2%7D%5Cexp%5Cleft%5B-%5Cfrac%7B1%7D%7B2%7D%20%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%5ETQ(%5Ctheta)%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%5Cright%5D.%0A%5Cend%7Balign*%7D"></p>
<p>Finally, we get that the denominator is <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%20%5Cmid%20y,%20%5Ctheta)%20%5Cpropto%20%7CQ_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%7C%5E%7B1/2%7D%0A"> as the exponential term<sup>29</sup> cancels!</p>
<p>Ok. Let’s finish this. (Incidentally, if you’re wondering why Bayesians love MCMC, this is why.)</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Ap(%5Ctheta%20%5Cmid%20y)%20&amp;%5Cpropto%20p(%5Ctheta)%20%5Cfrac%7B%7CQ(%5Ctheta)%7C%7D%7B%7CQ_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%7C%7D%20%5Cexp%5Cleft%5B-%5Cfrac%7B1%7D%7B2%7D%20%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%5ET(Q(%5Ctheta)%20+%20%5Cfrac%7B1%7D%7B%5Csigma%5E2%7DA%5ETW%5E%7B-1%7DA)%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%20+%20%5Cfrac%7B1%7D%7B%5Csigma%5E2%7D%20y%5ET%20W%5E%7B-1%7DA%20%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%5Cright%5D%20%5C%5C%0A&amp;=%20%20p(%5Ctheta)%20%5Cfrac%7B%7CQ(%5Ctheta)%7C%7D%7B%7CQ_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%7C%7D%20%5Cexp%5Cleft%5B-%5Cfrac%7B1%7D%7B2%7D%20%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%5ETQ_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%20+%20%5Cfrac%7B1%7D%7B%5Csigma%5E2%7D%20y%5ET%20W%5E%7B-1%7DA%20%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%5Cright%5D.%0A%5Cend%7Balign*%7D"></p>
<p>We can now use the fact that <img src="https://latex.codecogs.com/png.latex?Q_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%20=%20A%5ETW%5E%7B-1%7Dy"> to get</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0Ap(%5Ctheta%20%5Cmid%20y)%20&amp;%5Cpropto%20p(%5Ctheta)%20%5Cfrac%7B%7CQ(%5Ctheta)%7C%7D%7B%7CQ_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%7C%7D%20%5Cexp%5Cleft%5B-%5Cfrac%7B1%7D%7B2%7D%20%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%5ETA%5ETW%5E%7B-1%7Dy%20+%20%5Cfrac%7B1%7D%7B%5Csigma%5E2%7D%20y%5ET%20W%5E%7B-1%7DA%20%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%5Cright%5D%20%5C%5C%0A&amp;=%5Cfrac%7B%7CQ(%5Ctheta)%7C%7D%7B%7CQ_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%7C%7D%20%5Cexp%5Cleft%5B%5Cfrac%7B1%7D%7B2%7D%20%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%5ETA%5ETW%5E%7B-1%7Dy%20%5Cright%5D%20.%0A%5Cend%7Balign*%7D"></p>
<p>For those who just love a log-density, this is <img src="https://latex.codecogs.com/png.latex?%0A%5Clog(p(%5Ctheta%20%5Cmid%20y))%20=%20%5Cfrac%7B1%7D%7B2%7D%20%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%5ETA%5ETW%5E%7B-1%7Dy%20+%5Cfrac%7B1%7D%7B2%7D%20%5Clog(%7CQ(%5Ctheta)%7C)%20-%20%5Cfrac%7B1%7D%7B2%7D%5Clog(%7CQ_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%7C).%0A"> A fairly simple expression<sup>30</sup> for all of that work.</p>
</section>
</section>
<section id="so-why-isnt-this-just-a-gaussian-process" class="level2">
<h2 class="anchored" data-anchor-id="so-why-isnt-this-just-a-gaussian-process">So why isn’t this just a Gaussian process?</h2>
<p>These days, people<sup>31</sup> are more than passingly familiar<sup>32</sup> with Gaussian processes. And so they’re quite possibly wondering why this isn’t all just an extremely inconvenient way to do the exact same computations you do with a GP.</p>
<p>Let me tell you. It is <em>all</em> about <img src="https://latex.codecogs.com/png.latex?Q(%5Ctheta)"> and <img src="https://latex.codecogs.com/png.latex?A">.</p>
<p>The prior precision matrix <img src="https://latex.codecogs.com/png.latex?Q(%5Ctheta)"> is typically block diagonal. This special structure makes it pretty easy to compute the <img src="https://latex.codecogs.com/png.latex?%7CQ(%5Ctheta)%7C"> term<sup>33</sup>. But, of course, there’s more going on here.</p>
<p>In linear mixed effects models, these blocks on the diagonal matrix are typically fairly small (their size is controlled by the number of levels in the variable you’re stratifying by). Moreover, the matrices on the diagonal of <img src="https://latex.codecogs.com/png.latex?Q(%5Ctheta)"> are the inverses of either diagonal or block diagonal matrices that themselves have quite small blocks<sup>34</sup>.</p>
<p>In models that have more structured random effects<sup>35</sup>, the diagonal blocks of <img src="https://latex.codecogs.com/png.latex?Q(%5Ctheta)"> can get quite large<sup>36</sup>. Moreover, the matrices on these blocks are usually not block diagonal.</p>
<p>Thankfully, these prior precision matrices do have something going for them: most of their entries are zero. We refer to these types of matrices as <em>sparse matrices</em>. There are some marvelous algorithms for factorising sparse matrices that are usually a lot more efficient<sup>37</sup> than algorithms for dense matrices.</p>
<p>Moreover, the formulation here decouples the dimension of the latent Gaussian component from the number of observations. The data only enters the posterior through the reduction <img src="https://latex.codecogs.com/png.latex?A%5ETy">, so if the number of observations is much larger than the number of latent variables<sup>38</sup> and <img src="https://latex.codecogs.com/png.latex?A"> is sparse<sup>39</sup>, the operation scales <em>linearly</em> in the number of observations (and obviously superlinearly<sup>40</sup> in the row-dimension of <img src="https://latex.codecogs.com/png.latex?A">).</p>
<p>So the prior precision<sup>41</sup> is a sparse matrix. What about the precision matrix of <img src="https://latex.codecogs.com/png.latex?%5Bu%20%5Cmid%20y,%20%5Ctheta%5D">?</p>
<p>It is also sparse! Recall that <img src="https://latex.codecogs.com/png.latex?A%20=%20%5BZ%20%5Cvdots%20X%5D">. This means that <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B1%7D%7B%5Csigma%5E2%7DA%5ETW%5E%7B-1%7DA%20=%20%5Cfrac%7B1%7D%7B%5Csigma%5E2%7D%5Cbegin%7Bpmatrix%7D%20Z%5ET%20W%5E%7B-1%7DZ%20&amp;%20Z%5ET%20W%5E%7B-1%7DX%20%5C%5C%20X%5ET%20W%5E%7B-1%7D%20Z%20&amp;%20X%5ETW%5E%7B-1%7DX%20%5Cend%7Bpmatrix%7D.%0A"> <img src="https://latex.codecogs.com/png.latex?Z"> is a matrix that links the stacked vector of random effects <img src="https://latex.codecogs.com/png.latex?b"> to each observation. Typically, the likelihood <img src="https://latex.codecogs.com/png.latex?p(y_i%20%5Cmid%20%5Ctheta)"> will only depend on a small number of entries of <img src="https://latex.codecogs.com/png.latex?b">, which suggests that most elements in each row of <img src="https://latex.codecogs.com/png.latex?Z"> will be zero. This, in turn, implies that <img src="https://latex.codecogs.com/png.latex?Z"> is sparse and so is<sup>42</sup> <img src="https://latex.codecogs.com/png.latex?Z%5ETW%5E%7B-1%7DZ">.</p>
<p>On the other hand, the other three blocks are usually<sup>43</sup> fully dense. Thankfully, though, the usual situation is that <img src="https://latex.codecogs.com/png.latex?b"> has <em>far</em> more elements that <img src="https://latex.codecogs.com/png.latex?%5Cbeta">, which means that <img src="https://latex.codecogs.com/png.latex?A%5ETW%5E%7B-1%7DA"> is still sparse and we can still use our special algorithms<sup>44</sup></p>
<p>All of this suggests that, under usual operating conditions, <img src="https://latex.codecogs.com/png.latex?Q_%7Bu%5Cmid%20y,%20%5Ctheta%7D"> is <em>also</em> a sparse matrix.</p>
<p>And that’s <em>great</em> because that means that we can compute the log-posterior using only 3 main operations:</p>
<ol type="1">
<li><p>Computing <img src="https://latex.codecogs.com/png.latex?%5Clog(%7CQ(%5Ctheta)%7C)">. This matrix is block diagonal so you can just multiply together the determinants<sup>45</sup> of the diagonal blocks, which are relatively cheap to compute.</p></li>
<li><p>Computing <img src="https://latex.codecogs.com/png.latex?%5Cmu_%7Bu%20%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)">. This requires solving the sparse linear system <img src="https://latex.codecogs.com/png.latex?Q_%7Bu%20%5Cmid%20y,%20%5Ctheta%7D%20%5Cmu_%7Bu%20%5Cmid%20y,%20%5Ctheta%7D%20=%20%5Cfrac%7B1%7D%7B%5Csigma%5E2%7DA%5ETW%5E%7B-1%7Dy">. This is going to rely on some fancy pants sparse matrix algorithm.</p></li>
<li><p>Computing <img src="https://latex.codecogs.com/png.latex?%5Clog(%7CQ_%7Bu%20%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%7C)">. This is, thankfully, a by-product of the things we need to compute to solve the linear system in the previous task.</p></li>
</ol>
</section>
<section id="what-i-what-i-what-i-gotta-do-what-i-gotta-do-to-get-this-model-in-pymc" class="level2">
<h2 class="anchored" data-anchor-id="what-i-what-i-what-i-gotta-do-what-i-gotta-do-to-get-this-model-in-pymc">What I? What I? What I gotta do? <a href="https://www.youtube.com/watch?v=fqTSaMR75ns">What I gotta do to get this model in PyMC?</a></h2>
<p>So this is where shit gets real.</p>
<p>Essentially, I want to implement a new distribution in PyMC that will take approprite inputs and output the log-density and its gradient. There are two ways to do this:</p>
<ul>
<li>Panic</li>
<li>Pray</li>
</ul>
<p>For the first option, you write a C++<sup>46</sup> backend and register it as an Aesara node. This is how, for example, differential equation solvers migrated into PyMC.</p>
<p>For the second option, which is going to be our goal, we light our Sinead O’Connor votive candle and program up the model using JAX. JAX is a glorious feat of engineering that makes compilable and autodiff-able Python code. In a lot of cases, it seamlessly lets you shift from CPUs to GPUs and is all around quite cool.</p>
<p>It also has approximately zero useful sparse matrix support. (It will let you do <em>very</em> basic things<sup>47</sup> but nothing as complicated as we are going to need.)</p>
<p>So why am I taking this route? Well firstly I’m curious to see how well it works. So I am going to write JAX code to do all of my sparse matrix operations and see how efficiently it autodiffs it.</p>
<p>Now I’m going to pre-register my expectations. I expect it to be a little bit shit. Or, at least, I expect to be able to make it do better.</p>
<p>The problem is that computing a gradient requires a single reverse-mode<sup>48</sup> autodiff sweep. This does not seem like a problem until you look at how this sort of thing needs to be implemented and you realise that every gradient call is going to need to generate <em>and store</em> the entire damn autodiff tree for the log-density evaluation. And that autodiff tree is going to be <em>large</em>. So I am expecting the memory scaling on this to be truly shite.</p>
<p>Thankfully there are two ways to fix this. One of them is to implement a custom <em>Jacobian-vector product</em><sup>49</sup> and register it with JAX so it knows <em>most</em> of how to do the derivative. The other way is to implement this shit in C++ and register it as a JAX primitive. And to be honest I’m very tempted. But that is not where I am starting.</p>
<p>The other problem is going to be exposing this to users. The internal interface is going to be an absolute shit to use. So we are gonna have to get our Def Leppard on and sprinkle some syntactical sugar all over it.</p>
<p>I’m honestly less concerned about this challenge. It’s important but I am not expecting to produce anything good enough to put into PyMC (or any other package). But I do think it’s a good idea to keep this sort of question in mind: it can help you make cleaner, more useful code.</p>
<section id="what-comes-next" class="level3">
<h3 class="anchored" data-anchor-id="what-comes-next">What comes next?</h3>
<p>Well you will not get a solution today. This blog post is more than long enough.</p>
<p>My plan is to do three things.</p>
<ol type="1">
<li><p>Implement the relevant sparse matrix solver in a JAX-able form. (This is mostly gonna be me trying to remember how to do something I haven’t done in a very long time.)</p></li>
<li><p>Bind<sup>50</sup> the (probably) inefficient version into PyMC to see how that process works.</p></li>
<li><p>Try the custom <code>jvp</code> and <code>vjp</code> interfaces in JAX to see if they speed things up relative to just autodiffing through my for loops.</p></li>
<li><p>(Maybe) Look into whether hand-rolling some C++ is worth the effort.</p></li>
</ol>
<p>Will I get all of this done? I mean, I’m skeptical. But hey. If I do it’ll be nice.</p>


</section>
</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>aka linear multilevel models↩︎</p></li>
<li id="fn2"><p>Popular in epidemiology↩︎</p></li>
<li id="fn3"><p>INLA = Laplace approximations + sparse linear algebra to do fast, fairly scalable, and accurate Bayesian inference on a variety of Bayesian models. It’s particularly good at things like spatial models.↩︎</p></li>
<li id="fn4"><p>In its guts, Stan is a fully templated C++ autodiff library, so I would need to add specific sparse matrix support. And then there’s be some truly gross stuff with the Stan language and its existing types. And so on and so on and honestly it just broke my damn brain. So I started a few times but never finished.↩︎</p></li>
<li id="fn5"><p>I just don’t ever use it. I semi-regularly read and debug other people’s code, but I don’t typically write very much myself. I use R because that’s what my job needs me to use. So a shadow aim here is to just put some time into my Python. By the end of this I’ll be like Britney doing I’m a Slave 4 U.↩︎</p></li>
<li id="fn6"><p>Or maybe more, but let’s not be too ambitious.↩︎</p></li>
<li id="fn7"><p>I’m pretty sure they will.↩︎</p></li>
<li id="fn8"><p>My sparse matrix data structures are <em>rusty</em> as fuck.↩︎</p></li>
<li id="fn9"><p>and the intercept if it’s needed↩︎</p></li>
<li id="fn10"><p>Really this costs me nothing and can be useful with multiple observations.↩︎</p></li>
<li id="fn11"><p>Default options include the identity matrix or some multiple of the identity matrix.↩︎</p></li>
<li id="fn12"><p>REML heads don’t dismay. You can do all kinds of weird shit by choosing some of these matrices in certain ways. I’m not gonna stop you. I love and support you. Good vibes only.↩︎</p></li>
<li id="fn13"><p>The priors on <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> and <img src="https://latex.codecogs.com/png.latex?b"> are independent Gaussian so it has to be.↩︎</p></li>
<li id="fn14"><p>homosexual↩︎</p></li>
<li id="fn15"><p>Inverse correlation matrix↩︎</p></li>
<li id="fn16"><p>excluding the fixed ones, like <img src="https://latex.codecogs.com/png.latex?W"> and <img src="https://latex.codecogs.com/png.latex?A"> and <img src="https://latex.codecogs.com/png.latex?R">. ↩︎</p></li>
<li id="fn17"><p>Such a dirty word. For all of the models we care about, this is block diagonal. So this assumption is our restriction to a specific class of models.↩︎</p></li>
<li id="fn18"><p>I would suggest a lot of syntactic sugar if you were ever going to expose this stuff to users.↩︎</p></li>
<li id="fn19"><p>See the Bates <em>et al.</em> paper. Their formulation is fabulous but doesn’t extend nicely to the situations I care about! Basically they optimise for the situation where <img src="https://latex.codecogs.com/png.latex?%5CSigma_b"> can be singular, which is an issue when you’re doing optimisation. But I’m not doing optimisation and I care about the case where the precision matrix is defined as a singular matrix (and therefore <img src="https://latex.codecogs.com/png.latex?%5CSigma_b"> does not exist. This seems like a truly wild idea, but it occurs quite naturally in many important models like smoothing splines and ICAR models (which are extremely popular in spatial epidemiology).↩︎</p></li>
<li id="fn20"><p>It’s easier in two ways. Firstly, MCMC likes lower-dimensional targets. They are typically easier to sample from! Secondly, the posterior geometry of <img src="https://latex.codecogs.com/png.latex?p(%5Ctheta%20%5Cmid%20y)"> is usually pretty simple, while the joint posterior <img src="https://latex.codecogs.com/png.latex?p(%5Ctheta,%20u%20%5Cmid%20y)"> has an annoying tendency to have a funnel in it, which forces us to do all kinds of annoying reparameterisation tricks to stop the sampler from shitting the bed.↩︎</p></li>
<li id="fn21"><p>Computers!↩︎</p></li>
<li id="fn22"><p>CSS is my passion.↩︎</p></li>
<li id="fn23"><p>It’s possible to rearrange things to lose that <img src="https://latex.codecogs.com/png.latex?%5Cfrac%7B1%7D%7B%5Csigma%5E2%7D">, which I admit looks a bit weird. It cancels out down the line.↩︎</p></li>
<li id="fn24"><p>I have, historically, not had the greatest grip on whether or not things are easy.↩︎</p></li>
<li id="fn25"><p>See previous footnote.↩︎</p></li>
<li id="fn26"><p>Or a good approximation to it. Laplace approximations work very well for this to extend everything we’re doing here from a linear mixed-ish model to a generalised linear mixed-ish model.↩︎</p></li>
<li id="fn27"><p>This is actually a bit dangerous on the face of it because it depends on <img src="https://latex.codecogs.com/png.latex?%5Ctheta">. You can convince yourself it’s ok. Choosing <img src="https://latex.codecogs.com/png.latex?u=0"> is less stress inducing, but I wanted to bring out the parallel to using a Laplace approximation to <img src="https://latex.codecogs.com/png.latex?p(u%20%5Cmid%20%5Ctheta,%20y)">, in which case we really want to evaluate the ratio at the point where the approximation is the best (aka the conditional mean).↩︎</p></li>
<li id="fn28"><p>A common mistake is to forget the parameter dependent proportionality constants from the normal distribution. You didn’t need them before because you were conditioning on <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> so they were all constant. But now <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> is unknown and if we forget them an angel will cry.↩︎</p></li>
<li id="fn29"><p>Honest footnote: This started as <img src="https://latex.codecogs.com/png.latex?p(%5Cmu_%7Bu%5Cmid%20y,%20%5Ctheta%7D(%5Ctheta)%20%5Cmid%20y,%20%5Ctheta)%20%5Cpropto%201"> because I don’t read my own warnings.↩︎</p></li>
<li id="fn30"><p>The brave or foolish amongst you might want to convince yourselves that this collapses to <em>exactly</em> the marginal likelihood we would’ve gotten from Rasmussen and Williams had we made a sequence of different life choices. In particular if <img src="https://latex.codecogs.com/png.latex?A%20=%20I"> and <img src="https://latex.codecogs.com/png.latex?Q(%5Ctheta)%20=%20%5CSigma(%5Ctheta)%5E%7B-1%7D">.↩︎</p></li>
<li id="fn31"><p>Or, at least, people who have made it this far into the post.↩︎</p></li>
<li id="fn32"><p>You like GPs bro? <a href="https://dansblog.netlify.app/posts/2021-11-03-yes-but-what-is-a-gaussian-process-or-once-twice-three-times-a-definition-or-a-descent-into-madness/">Give me a sequence of increasingly abstract definitions.</a> I’m waiting.↩︎</p></li>
<li id="fn33"><p>Multiply the determinants of the matrices along the diagonal.↩︎</p></li>
<li id="fn34"><p>Look at the Bates et al paper. Specifically section 2.2. <code>lme4</code> is a really clever thing.↩︎</p></li>
<li id="fn35"><p>examples: smoothing splines, AR(p) models, areal spatial models, <a href="https://dansblog.netlify.app/posts/2021-11-24-getting-into-the-subspace/">some Gaussian processes if you’re careful</a>↩︎</p></li>
<li id="fn36"><p><img src="https://latex.codecogs.com/png.latex?10%5E4">–<img src="https://latex.codecogs.com/png.latex?10%5E6"> is not unheard of↩︎</p></li>
<li id="fn37"><p>A dense matrix factorisation of an <img src="https://latex.codecogs.com/png.latex?n%5Ctimes%20n"> matrix costs <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(n%5E3)">. The same factorisation of a sparse matrix can cost as little as <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(n)"> if you’re very lucky. More typically it clocks in a <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(n%5E%7B1.5%7D)">–<img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(n%5E%7B2%7D)">, which is still a substantial saving!↩︎</p></li>
<li id="fn38"><p>This happens for a lot of designs, or when a basis spline or a Markovian Gaussian process is being used↩︎</p></li>
<li id="fn39"><p>This happens a lot, but not always. For instance subset-of-regressors/predictive process-type models have a dense <img src="https://latex.codecogs.com/png.latex?A">. In this case, if <img src="https://latex.codecogs.com/png.latex?A"> has <img src="https://latex.codecogs.com/png.latex?m"> rows an <img src="https://latex.codecogs.com/png.latex?n"> columns, this is an <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BO%7D(mn)">, which is more expensive than a sparse <img src="https://latex.codecogs.com/png.latex?A"> unless <img src="https://latex.codecogs.com/png.latex?A"> has roughly <img src="https://latex.codecogs.com/png.latex?m"> non-zeros per row..↩︎</p></li>
<li id="fn40"><p>but usually not cubically. See above footnote.↩︎</p></li>
<li id="fn41"><p>It’s important that we are talking about <em>precision</em> matrices here and not covariance matrices as the inverse of a sparse matrix is typically dense. For instance, an AR(1) prior with autocorrelation parameter <img src="https://latex.codecogs.com/png.latex?%5Crho"> has a prior has a sparse precision matrix that looks something like <img src="https://latex.codecogs.com/png.latex?%0AQ%20=%20%5Cfrac%7B1%7D%7B%5Ctau%5E2%7D%5Cbegin%7Bpmatrix%7D%0A1%20&amp;%20-%5Crho%20&amp;&amp;&amp;&amp;&amp;%20%5C%5C%0A-%5Crho&amp;1%20+%20%5Crho%5E2&amp;%20-%5Crho&amp;&amp;&amp;&amp;%20%5C%5C%0A&amp;-%5Crho&amp;%201%20+%20%5Crho%5E2%20&amp;-%20%5Crho&amp;&amp;&amp;%20%5C%5C%0A&amp;&amp;-%5Crho&amp;%201%20+%20%5Crho%5E2&amp;-%5Crho&amp;&amp;%20%5C%5C%0A&amp;&amp;&amp;-%5Crho&amp;1+%5Crho%5E2%20&amp;-%5Crho%20&amp;%20%5C%5C%0A&amp;&amp;&amp;&amp;-%5Crho&amp;1%20+%20%5Crho%5E2&amp;%20-%20%5Crho%20%5C%5C%0A&amp;&amp;&amp;&amp;&amp;-%5Crho&amp;1%0A%5Cend%7Bpmatrix%7D.%0A"> On the other hand, the <em>covariance matrix</em> is fully dense <img src="https://latex.codecogs.com/png.latex?%0AQ%5E%7B-1%7D%20=%20%5Ctau%5E2%5Cbegin%7Bpmatrix%7D%0A%5Crho&amp;%5Crho%5E2&amp;%5Crho%5E3&amp;%5Crho%5E4&amp;%5Crho%5E5&amp;%5Crho%5E6&amp;%5Crho%5E7%20%5C%5C%0A%5Crho%5E2&amp;%5Crho&amp;%5Crho%5E2&amp;%5Crho%5E3&amp;%5Crho%5E4&amp;%5Crho%5E5&amp;%5Crho%5E6%20%5C%5C%0A%5Crho%5E3&amp;%5Crho%5E2&amp;%5Crho&amp;%5Crho%5E2&amp;%5Crho%5E3&amp;%5Crho%5E4&amp;%5Crho%5E5%20%5C%5C%0A%5Crho%5E4&amp;%5Crho%5E3&amp;%5Crho%5E2&amp;%5Crho&amp;%5Crho%5E2&amp;%5Crho%5E3&amp;%5Crho%5E4%20%5C%5C%0A%5Crho%5E5&amp;%5Crho%5E4&amp;%5Crho%5E3&amp;%5Crho%5E2&amp;%5Crho&amp;%5Crho%5E2&amp;%5Crho%5E3%20%5C%5C%0A%5Crho%5E6&amp;%5Crho%5E5&amp;%5Crho%5E4&amp;%5Crho%5E3&amp;%5Crho%5E2&amp;%5Crho&amp;%5Crho%5E2%20%5C%5C%0A%5Crho%5E7&amp;%5Crho%5E6&amp;%5Crho%5E5&amp;%5Crho%5E4&amp;%5Crho%5E3&amp;%5Crho%5E2&amp;%5Crho%20%5C%5C%0A%5Cend%7Bpmatrix%7D.%0A"><br>
This is a generic property: the inverse of a sparse matrix is usually dense (it’s dense as long as the graph associated with the sparse matrix has a single connected component there’s a matrix with the same pattern of non-zeros that has a fully dense inverse) and the entries <a href="https://eudml.org/doc/130625">satisfy geometric decay bounds</a>.↩︎</p></li>
<li id="fn42"><p>Remember: <img src="https://latex.codecogs.com/png.latex?W"> is diagonal and known.↩︎</p></li>
<li id="fn43"><p>Not if you’re doing some wild dummy coding shit or modelling text, but typically.↩︎</p></li>
<li id="fn44"><p>You’d think that dense rows and columns would be a problem but they’re not. A little graph theory and a little numerical linear algebra says that as long as they are the last variables in the model, the algorithms will still be efficient. That said, if you want to <em>dig in</em>, it is possible to use supernodal (eg CHOLMOD) and multifrontal (eg MUMPS) methods to group the operations in such a way that it’s possible to use level-3 BLAS operations. CHOLMOD even spins this into a GPU acceleration scheme, which is fucking wild if you think about it: sparse linear algebra rarely has the arithmetic intensity or data locality required to make GPUs worthwhile (you spend all of your time communicating, which is great in a marriage, terrible in a GPU). But some clever load balancing, tree-based magic, and multithreading <a href="https://www.sciencedirect.com/science/article/pii/S1877750317312164">apparently makes it possible</a>. Like truly, I am blown away by this. We are not going to do <em>any</em> of this because absolutely fucking not. And anyway. It’s kinda rare to have a huge number of covariates in the sorts of models that use these complex random effects. (Or if you do, you better light your Sinead O’Connor votive candle because honestly you have a lot of problems and you’re gonna need healing.)↩︎</p></li>
<li id="fn45"><p>If you’ve been reading the footnotes, you’ll recall that sometimes one of these precision matrices on the diagonal will be singular. Sometimes that’s because you fucked up your programming. But other times it’s because you’re using something like an ICAR (intrinsic conditional autoregressive) prior on one of your components. The precision matrix for this model is <img src="https://latex.codecogs.com/png.latex?Q_%5Ctext%7BICAR%7D%20=%20%5Ctau_%5Ctext%7BICAR%7D%20=%20%5Ctau%20%5Ctext%7BAdj%7D(%5Cmathcal%7BG%7D)">, where <img src="https://latex.codecogs.com/png.latex?%5Coperatorname%7BAdj%7D(%5Cmathcal%7BG%7D)"> is the adjacency matrix of some fixed graph <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D"> (typically describing something like which postcodes are next to each other). <a href="https://www.routledge.com/Gaussian-Markov-Random-Fields-Theory-and-Applications/Rue-Held/p/book/9781584884323">Some theory</a> suggests that if <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D"> has <img src="https://latex.codecogs.com/png.latex?d"> connected components, the zero determinant should be replaced with <img src="https://latex.codecogs.com/png.latex?%5Ctau%5E%7B(m%20-%20d)/2%7D">, where <img src="https://latex.codecogs.com/png.latex?m"> is the number of vertices in <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BG%7D">.↩︎</p></li>
<li id="fn46"><p>I guess there’s nothing really stopping you from writing in pure Python except a creeping sense of inadequacy.↩︎</p></li>
<li id="fn47"><p>eg build a sparse matrix↩︎</p></li>
<li id="fn48"><p>Honey, we do not have time. Understanding autodiff is not massively important in the grand scheme of this blogpost (or, you know, probably in real life unless you do some fairly specific things). <a href="https://arxiv.org/abs/1811.05031">I’ll let Charles explain it.</a>↩︎</p></li>
<li id="fn49"><p>Or, a custom vector-Jacobian product, which is not a symmetrical choice.↩︎</p></li>
<li id="fn50"><p>I bind you Nancy!↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2022,
  author = {Simpson, Dan},
  title = {Sparse {Matrices} 1: {The} Linear Algebra of Linear Mixed
    Effects Models and Their Generalisations},
  date = {2022-03-22},
  url = {https://dansblog.netlify.app/2022-03-22-a-linear-mixed-effects-model},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2022" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2022. <span>“Sparse Matrices 1: The Linear Algebra of
Linear Mixed Effects Models and Their Generalisations.”</span> March 22,
2022. <a href="https://dansblog.netlify.app/2022-03-22-a-linear-mixed-effects-model">https://dansblog.netlify.app/2022-03-22-a-linear-mixed-effects-model</a>.
</div></div></section></div> ]]></description>
  <category>Sparse matrices</category>
  <category>Linear mixed models</category>
  <guid>https://dansblog.netlify.app/posts/2022-03-22-a-linear-mixed-effects-model/a-linear-mixed-effects-model.html</guid>
  <pubDate>Mon, 21 Mar 2022 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2022-03-22-a-linear-mixed-effects-model/patti.JPG" medium="image"/>
</item>
<item>
  <title>Barry Gibb came fourth in a Barry Gibb look alike contest (Repost)</title>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2022-01-26-barry-gibb-came-fourth-in-a-barry-gibb-look-alike-contest-repost/barry-gibb-came-fourth-in-a-barry-gibb-look-alike-contest-repost.html</link>
  <description><![CDATA[ 





<blockquote class="blockquote">
<p><em>Every day a little death, in the parlour, in the bed. On the lips and in the eyes. In the curtains in the silver, in the buttons, in the bread, in the murmurs, in the pauses, in the gestures, in the sighs.</em> <a href="https://www.youtube.com/watch?v=Snru5gtCyWA">Sondheim</a></p>
</blockquote>
<p>The most horrible sound in the world is that of a reviewer asking you to compare your computational method to another, existing method. Like bombing countries in the name of peace, the purity of intent drowns out the voices of our better angels as they whisper: at what cost.</p>
<p>Before the unnecessary drama of that last sentence<sup>1</sup> sends you running back to the still-open browser tab documenting the world’s slow slide into a deeper, danker, more complete darkness that we’ve seen before, I should say that I understand that for most people this isn’t a problem. Most people don’t do research in computational statistics. Most people are happy<sup>2</sup>.</p>
<p>So why does someone asking for a comparison of two methods for allegedly computing the same thing fill me with the sort of dread usually reserved for climbing down the ladder into my basement to discover, by the the light of a single, swinging, naked light bulb, that the evil clown I keep chained in the corner has escaped? Because it’s almost impossible to do well.</p>
<section id="i-go-through-all-this-before-you-wake-up-so-i-can-feel-happier-to-be-safe-again-with-you" class="level1">
<h1>I go through all this before you wake up so I can feel happier to be safe again with you</h1>
<p>Many many years ago, when I still had all my hair and thought it was impressive when people proved things, I did a PhD in numerical analysis. These all tend to have the same structure:</p>
<ol type="1">
<li><p>survey your chosen area with a simulation study comparing all the existing methods,</p></li>
<li><p>propose a new method that should be marginally better than the existing ones,</p></li>
<li><p>analyse the new method, show that it’s at least not worse than the existing ones (or worse in an interesting way),</p></li>
<li><p>construct a simulation study that shows the superiority of your method on a problem that hopefully doesn’t look too artificial,</p></li>
<li><p>write a long discussion blaming the inconsistencies between the maths and the simulations on “pre-asymptotic artefacts”.</p></li>
</ol>
<p>Which is to say, I’ve done my share of simulation studies comparing algorithms.</p>
<p>So what changed? When did I start to get <a href="https://www.youtube.com/watch?v=ykdtNuKlHiA">the fear</a> every time someone mentioned comparing algorithms?</p>
<p>Well, I left numerical analysis and moved to statistics and I learnt the one true thing that all people who come to statistics must learn: statistics is hard.</p>
<p>When I used to compare deterministic algorithms it was easy. I would know the correct answer and so I could compare algorithms by comparing the error in their approximate solutions (perhaps taking into account things like how long it took to compute the answer).</p>
<p>But in statistics, the truth is random. Or the truth is a high-dimensional joint distribution that you cannot possibly know. So how can you really compare your algorithms, except possibly by comparing your answer to some sort of “gold standard” method that may or may not work.</p>
</section>
<section id="inte-ner-för-ett-stup.-inte-ner-från-en-bro.-utan-från-vattentornets-topp." class="level1">
<h1>Inte ner för ett stup. Inte ner från en bro. Utan från vattentornets topp<sup>3</sup>.</h1>
<p>The first two statistical things I ever really worked on (in an office overlooking a fjord) were computationally tractable ways of approximating posterior distributions for specific types of models. The first of these was <a href="https://en.wikipedia.org/wiki/Irish_National_Liberation_Army">INLA</a><sup>4</sup>. For those of you who haven’t heard of it, INLA (and it’s popular R implementation <a href="https://www.r-inla.org">R-INLA</a>) is a method for doing approximate posterior computation for a lot of the sorts of models you can fit in <code>rstanarm</code> and <code>brms</code>. So random effect models, multilevel models, models with splines, and spatial effects.</p>
<p>At the time, Stan didn’t exist (later, it barely existed), so I would describe INLA as being Bayesian inference for people who lacked the ideological purity to wait 14 hours for a poorly mixing BUGS chain to run, instead choosing to spend 14 seconds to get a better “approximate” answer. These days, Stan exists in earnest and that 14 hours is 20 minutes for small-ish models with only a couple of thousand observations, and the answer that comes out of Stan is probably as good as INLA.</p>
<p>Working on INLA I learnt a new fear: the fear that someone else was going to publish a simulation study comparing INLA with something else without checking with us first.</p>
<p>Now obviously, we wanted people to run their comparisons past us so we could ruthlessly quash any dissent and hopefully exile the poor soul who thought to critique our perfect method to the academic equivalent of a Siberian work camp.</p>
<p>Or, more likely, because comparing statistical models is really hard, and we could usually make the comparison much better by asking some questions about how it was being done.</p>
<p>Sometimes, learning from well-constructed simulation studies how INLA was failing lead to improvements in the method.</p>
<p>But nothing could be learned if, for instance, the simulation study was reporting runs from code that wasn’t doing what the authors thought it was<sup>5</sup>. And I don’t want to suggest that bad or unfair comparisons comes from malice (for the most part, we’re all quite conscientious and fairly nice), but rather that they happen because comparing statistical algorithms is hard.</p>
<p>And comparing algorithms fairly where you don’t understand them equally well is almost impossible.</p>
</section>
<section id="well-did-you-hear-the-one-about-mr-ed-he-said-im-this-way-because-of-the-things-ive-seen" class="level1">
<h1>Well did you hear the one about Mr Ed? He said I’m this way because of the things I’ve seen</h1>
<p>Why am I bringing this up? It’s because of the second statistical thing that I worked on while I was living in sunny Trondheim (in between looking at the fjord and holding onto the sides of buildings for dear life because for 8 months of the year Trondheim is a very pretty mess of icy hills).</p>
<p>During that time, I worked with <a href="https://www.maths.ed.ac.uk/~flindgre/">Finn Lindgren</a> and <a href="https://www.kaust.edu.sa/en/study/faculty/haavard-rue">Håvard “INLA” Rue</a> on computationally efficient approximations to Gaussian random fields (which is what we’re supposed to call Gaussian Processes when the parameter space is more complex than just “time” [<em>shakes fist at passing cloud</em>]). Finn (with Håvard and Johan Lindström) had proposed a new method, cannily named the <a href="https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2011.00777.x">Stochastic Partial Differential Equation</a> (SPDE) method, for exploiting the continuous-space Markov property in higher dimensions. Which all sounds very maths-y, but it isn’t.</p>
<p>The guts of the method says “all of our problems with working computationally with Gaussian random fields comes from the fact that the set of all possible functions is too big for a computer to deal with, so we should do something about that”. &nbsp;The “something” is replace the continuous function with a piecewise linear one defined over a fairly fine triangulation on the domain of interest.</p>
</section>
<section id="but-why-am-i-talking-about-this" class="level1">
<h1>But why am I talking about this?</h1>
<p>(Sorry. One day I’ll write a short post.)</p>
<p>A <a href="https://arxiv.org/pdf/1710.05013.pdf">very exciting paper popped up on arXiv on Monday</a><sup>6</sup> comparing a fairly exhaustive collection of recent methods for making spatial Gaussian random fields more computationally efficient.</p>
<p>Why am I not cringing in fear? Because if you look at the author list, they have included an author from each of the projects they have compared! This means that the comparison will probably be as good as it can be. In particular, it won’t suffer from the usual problem of the authors understanding some methods they’re comparing better than others.</p>
<p>#The world is held together by the wind that blows through Gena Rowland’s hair</p>
<p>So how did they go? Well, actually, they did quite well. I like that</p>
<ul>
<li><p>They describe each problem quite well</p></li>
<li><p>The simulation study and the real data analysis uses a collection of different evaluations metrics</p></li>
<li><p>Some of these are proper scoring rules, which is the correct framework for evaluating probabilistic predictions</p></li>
<li><p>They acknowledge that the wall clock timings are likely to be more a function of how hard a team worked to optimise performance on this one particular model than a true representation of how these methods would work in practice.</p></li>
</ul>
</section>
<section id="not-the-lovin-kind" class="level1">
<h1>Not the lovin’ kind</h1>
<p>But I’m an academic statistician. And our key feature, as a people, is that we loudly and publicly dislike each other’s work. Even the stuff we agree with. &nbsp;Why? Because people with our skills who also have impulse control tend to work for more money in the private sector.</p>
<p>So with that in mind, let’s have some fun.</p>
<p>(Although seriously, this is the best comparison of this type I’ve ever seen. So, really, I’m just wanting it to be even bester.)</p>
<p>So what’s wrong with it?</p>
</section>
<section id="its-gotta-be-big.-i-said-it-better-be-big" class="level1">
<h1>It’s gotta be big. I said it better be big</h1>
<p>The most obvious problem with the comparison is that the problem that these methods are being compared on is not particularly large or complex. You can see that from the timings. Almost none of these implementations are sweating, which is a sign that we are not anywhere near the sort of problem that would really allow us to differentiate between methods.</p>
<p>So how small is small? The problem had 105,569 observations and required prediction at at most&nbsp; 4,431 other locations. To be challenging, this data needed to be another order of magnitude bigger.</p>
</section>
<section id="god-knows-i-know-ive-thrown-away-those-graces" class="level1">
<h1>God knows I know I’ve thrown away those graces</h1>
<p>(Can you tell what I’m listening to?)</p>
<p>The second problem with the comparison is that the problem is tooooooo easy. As the data is modelled with a Gaussian observation noise and a multivariate Gaussian latent random effect, it is a straightforward piece of algebra to eliminate all of the latent Gaussian variables from the model. This leads to a model with only a small number of parameters, which should make inference much easier.</p>
<p>How do you do that? Well, if the data is <img src="https://latex.codecogs.com/png.latex?y">, the Gaussian random field is <img src="https://latex.codecogs.com/png.latex?x"> and and all the hyperparmeters <img src="https://latex.codecogs.com/png.latex?%5Ctheta">. In this case, we can use conditional probability to write that <img src="https://latex.codecogs.com/png.latex?%0Ap(%5Ctheta%20%5Cmid%20y)%20%5Cpropto%20%5Cfrac%7Bp(y,x,%5Ctheta)%7D%7Bp(x%20%5Cmid%20y,%20%5Ctheta)%7D,%0A"> which holds for every value of <img src="https://latex.codecogs.com/png.latex?x"> and particularly <img src="https://latex.codecogs.com/png.latex?x=0">. Hence if you have a closed form full conditional (which is the case when you have Gaussian observations), you can write the marginal posterior out exactly without having to do any integration.</p>
<p>A much more challenging problem would have had Poisson or binomial data, where the full conditional doesn’t have a known form. In this case you cannot do this marginalisation analytically, so you put much more stress on your inference algorithm.</p>
<p>I guess there’s an argument to be made that some methods are really difficult to extend to non-Gaussian observations. But there’s also an argument to be made that I don’t care. Shit or get off the pot, as American would say.</p>
</section>
<section id="dont-take-me-back-to-the-range" class="level1">
<h1>Don’t take me back to the range</h1>
<p>The prediction quality is measured in terms of mean squared error and mean absolute error (which are fine), the continuous rank probability score (CRPS) and and the Interval Score (INT), both of which are proper scoring rules. <a href="https://sites.stat.washington.edu/raftery/Research/PDF/Gneiting2007jasa.pdf">Proper scoring rules</a> (and follow the link or google for more if you’ve never heard of them) are the correct way to compare probabilistic predictions, regardless of the statistical framework that’s used to make the predictions. So this is an excellent start!</p>
<p>But one of these measures does stand out: the prediction interval coverage (CVG) which is defined in the paper as “the percent of intervals containing the true predicted value”. I’m going to parse that as “the percent of prediction intervals containing the true value”. The paper suggests (through use of bold in the tables) that the correct value for CVG is 0.95. That is, the paper suggests the true value should lie within the 95% interval 95% of the time.</p>
<p><em>This is not true.</em></p>
<p>Or, at least, this is considerably more complex than the result suggests.</p>
<p>Or, at least, this is only true if you compute intervals that are specifically built to do this, which is mostly very hard to do. And you definitely don’t do it by providing a standard error (which is an option in this competition).</p>
<p>#Boys on my left side. Boys on my right side. Boys in the middle. And you’re not here.</p>
<p>So what’s wrong with CVG?</p>
<p>Why? Well first of all it’s a multiple testing problem. You are not testing the same interval multiple times, you are checking multiple intervals one time each. So it can only be meaningful if the prediction intervals were constructed jointly to solve this specific multiple testing problem.</p>
<p>Secondly, it’s extremely difficult to know what is considered random here. Coverage statements are statements about repeated tests, so how you repeat them<sup>7</sup> will affect whether or not a particular statement is true. It will also affect how you account for the multiple testing when building your prediction intervals. (Really, if anyone did opt to just return standard errors, nothing good is going to happen for them in this criterion!)</p>
<p>Thirdly, it’s already covered by the interval score. If your interval is <img src="https://latex.codecogs.com/png.latex?%5Bl,u%5D"> with nominal level <img src="https://latex.codecogs.com/png.latex?%5Calpha">, the interval score is for an observation <img src="https://latex.codecogs.com/png.latex?y"> is <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BINT%7D_%5Calpha(l,%20u,%20y)%20=%20u%20-%20l%20+%20%5Cfrac%7B2%7D%7B%5Calpha%7D(l-y)%20%5Cmathbf%7B1%7D%5C%7By%20%3C%20l%5C%7D%20+%20%5Cfrac%7B2%7D%7B%5Calpha%7D(y-u)%5Cmathbf%7B1%7D%5C%7By%3Eu%5C%7D.%0A"> This score (where smaller is better) rewards you for having a narrow prediction interval, but penalises you every time the data does not lie in the interval. The score is minimised when <img src="https://latex.codecogs.com/png.latex?%5CPr(y%20%5Cin%20%5Bl,u%5D)%20=%20%5Calpha">. So this really is a good measure of how well the interval estimate is calibrated that also checks more aspects of the interval than CVG (which lacks the first term) does.</p>
</section>
<section id="theres-the-part-youve-braced-yourself-against-and-then-theres-the-other-part" class="level1">
<h1>There’s the part you’ve braced yourself against, and then there’s the other part</h1>
<p>Any conversation about how to evaluate the quality of an interval estimate really only makes sense in the situation where everyone has constructed their intervals the same way. The authors’ code is <a href="https://github.com/finnlindgren/heatoncomparison/">here</a>, but even without seeing it we know there are essentially four options:</p>
<ol type="1">
<li><p>Compute pointwise prediction means <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cmu%7D_i"> and standard errors <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Csigma%7D_i"> and build the pointwise intervals <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cmu%7D_i%20%5Cpm%201.96%5Chat%7B%5Csigma%7D">.</p></li>
<li><p>Compute the pointwise Bayesian prediction intervals, which are formed from the appropriate quantiles (or the HPD region if you are Tony O’Hagan) of <img src="https://latex.codecogs.com/png.latex?%5Cint%20%5Cint%20p(%5Chat%7By%7D%20%5Cmid%20x,%5Ctheta)%20p(x,%5Ctheta%20%5Cmid%20y)%5C,dx%20d%5Ctheta">.</p></li>
<li><p>An interval of the form <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cmu%7D_i%20%5Cpm%20c%5Chat%7B%5Csigma%7D">, where <img src="https://latex.codecogs.com/png.latex?c"> is chosen to ensure coverage.</p></li>
<li><p>Some sort of clever thing based on functional data analysis.</p></li>
</ol>
<p>But how well these different options work will depend on how they’re being assessed (or what they’re being used for).</p>
<section id="option-1-we-want-to-fill-in-our-sparse-observation-by-predicting-at-more-and-more-points" class="level2">
<h2 class="anchored" data-anchor-id="option-1-we-want-to-fill-in-our-sparse-observation-by-predicting-at-more-and-more-points">Option 1: We want to fill in our sparse observation by predicting at more and more points</h2>
<p>(This is known as “in-fill asymptotics”). This type of question occurs when, for instance, we want to fill in the holes in satellite data (which are usually due to clouds).</p>
<p>This is the case that most closely resembles the design of the simulation study in this paper. In this case you refine your estimated coverage by computing more prediction intervals and checking if the true value lies within the interval.</p>
<p>Most of the easy to find results about coverage in these is from the 1D literature (specifically around smoothing splines and non-parametric regression). In these cases, it’s known that the first option is bad, the second option will <a href="https://projecteuclid.org/journals/annals-of-statistics/volume-21/issue-2/An-Analysis-of-Bayesian-Inference-for-Nonparametric-Regression/10.1214/aos/1176349157.full">lead to conservative regions</a> (the coverage will be too high), the third option involves <a href="https://link.springer.com/book/10.1007/978-0-387-48116-6">some sophisticated understanding of how Gaussian random fields work</a>, and the fourth is not something I know anything about.</p>
</section>
<section id="option-2-we-want-to-predict-at-one-point-where-the-field-will-be-monitored-multiple-times" class="level2">
<h2 class="anchored" data-anchor-id="option-2-we-want-to-predict-at-one-point-where-the-field-will-be-monitored-multiple-times">Option 2: We want to predict at one point, where the field will be monitored multiple times</h2>
<p>This second option comes up when we’re looking at a long-term monitoring network. This type data is common in environmental science, where a long term network of sensors is set up to monitor, for example, air pollution. The new observations are not independent of the previous ones (there’s usually some sort of temporal structure), but independence can often be assumed if the observations are distant enough in time.</p>
<p>In this case as you are repeating observations at a single site, Option 1 will be the right way to construct your interval, option 2 will probably still be a bit broad but might be ok, and options 3 and 4 will probably be too narrow if the underlying process is smooth.</p>
</section>
<section id="option-3-mixed-asymptotics-you-do-both-at-once" class="level2">
<h2 class="anchored" data-anchor-id="option-3-mixed-asymptotics-you-do-both-at-once">Option 3: Mixed asymptotics! You do both at once</h2>
<p>Simulation studies are the last refuge of the damned.</p>
</section>
</section>
<section id="i-see-the-sun-go-down.-i-see-the-sun-come-up.-i-see-a-light-beyond-the-frame." class="level1">
<h1>I see the sun go down. I see the sun come up. I see a light beyond the frame.</h1>
<p>So what are my suggestions for making this comparison better (other than making it bigger, harder, and dumping the weird CVG criterion)?</p>
<ol type="1">
<li><p>randomise</p></li>
<li><p>randomise</p></li>
<li><p>randomise</p></li>
</ol>
<p>What do I mean by that? Well in the simulation study, the paper only considered one possible set of data simulated from the correct model. All of the results in their Table 2, which contains the scores, and timings on the simulated data, depends on this particular realisation. And hence Table 2 is a realisation of a random variable that will have a mean and standard deviation.</p>
<p>This should <em>not</em> be taken as an endorsement of the frequentist view that the observed data is random and estimators should be evaluated by their average performance over different realisation of the data. <em>This is an acknowledgement of the fact that in this case the data is actually a realisation of a random variable.</em> Reporting the variation in Table 2 would give an idea of the variation in the performance of the method. And would lead to a more nuanced and realistic comparison of the methods. It is not difficult to imagine that for some of these criteria there is no clear winner when averaged over data sets.</p>
</section>
<section id="where-did-you-get-that-painter-in-your-pocket" class="level1">
<h1>Where did you get that painter in your pocket?</h1>
<p>I have very mixed feelings about the timings column in the results table. On one hand, an “order of magnitude” estimate of how long this will actually take to fit is probably a useful thing for a person considering using a method. On the other hand, there is just no way for these results not to be misleading. And the paper acknowledges this.</p>
<p>Similarly, the competition does not specify things like priors for the Bayesian solutions. This makes it difficult to really compare things like interval estimates, which can strongly depend on the specified priors. You could certainly improve your chances of winning on the CVG computation for the simulation study by choosing your priors carefully!</p>
</section>
<section id="what-is-this-six-stringed-instrument-but-an-adolescent-loom" class="level1">
<h1>What is this six-stringed instrument but an adolescent loom?</h1>
<p>I haven’t really talked about the real data performance yet. Part of this is because <a href="https://statmodeling.stat.columbia.edu/2019/10/15/a-heart-full-of-hatred-8-schools-edition/">I don’t think real data is particularly useful for evaluating algorithms</a>. More likely, you’re evaluating your chosen data set as much as, or even more than, you are evaluating your algorithm.</p>
<p>Why? Because real data doesn’t follow the model, so even if a particular method gives a terrible approximation to the inference you’d get from the “correct” model, it might do very very well on the particular data set. I’m not sure how you can draw any sort of meaningful conclusion from this type of situation.</p>
<p>I mean, I should be happy I guess because the method I work on “won” three of the scores, and did fairly well in the other two. But there’s no way to say that wasn’t just luck.</p>
<p>What does luck look like in this context? It could be that the SPDE approximation is a better model for the data than the “correct” Gaussian random field model. It could just be Finn appealing to the old Norse gods. It’s really hard to tell.</p>
<p>If any real data is to be used to make general claims about how well algorithms work, I think it’s necessary to use <em>a lot</em> of different data sets rather than just one.</p>
<p>Similarly, a range of different simulation study scenarios would give a broader picture of when different approximations behave better.</p>
</section>
<section id="dont-dream-its-over" class="level1">
<h1>Don’t dream it’s over</h1>
<p><a href="https://www.youtube.com/watch?v=OtvdZ47h8y4">One more kiss before we part</a>: This field is still alive and kicking. One of the really exciting new ideas in the field (that’s probably too new to be in the comparison) is that you can speed up the computation of the unnormalised log-posterior through <a href="https://arxiv.org/abs/1709.04419">hierarchical decompositions of the covariance matrix</a> (there is also code). This is a really neat method for solving the problem and a really exciting new idea in the field.</p>
<p>There are a bunch of other things that are probably worth looking at in this article, but I’ve run out of energy for the moment. Probably the most interesting thing for me is that a lot of the methods that did well (SPDEs, Predictive Processes, Fixed Rank Kriging, Multi-resolution Approximation, Lattice Krig, Nearest-Neighbour Predictive Processes) are cut from very similar cloth. It would be interesting to look deeper at the similarities and differences in an attempt to explain these results.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>2021: Oh my giddy aunt what even was that?!↩︎</p></li>
<li id="fn2"><p>2021: The around that time is notable to me, but not interesting to others. So I’m sorry extent to which these blog posts captured the variations in my mental state about that. But also they give a small glimpse at just how bleak my sense of humour can be.↩︎</p></li>
<li id="fn3"><p>No I don’t speak Swedish, but <a href="https://www.youtube.com/watch?v=oS2ExAcW-Z8">one of my favourite songwriters/lyricists</a> does. And sometimes I’m just that unbearable. Also the next part of this story takes place in Norway, which is near Sweden but produces worse music (<a href="https://www.youtube.com/watch?v=Y_lEXa7VWcA">Susanne Sunfør</a> and <a href="https://www.youtube.com/watch?v=ZCFlT_FYnEE">M2M</a> being notable exceptions)↩︎</p></li>
<li id="fn4"><p>I once gave a truly mortifying talk called INLA: Past, Present, and Future at a conference in Dublin.↩︎</p></li>
<li id="fn5"><p>Or, as happened one time, they compared computation for a different model with an algorithm that failed its convergence checks and assumed that all of the hyperparameters were fixed. All of that is bad but the last part is like saying <code>lm</code> is faster than <code>lme4::lmer</code> for fitting mixed effects models because we only checked when the almost always unknown variance parameters were assumed known.↩︎</p></li>
<li id="fn6"><p>In 2017. A long time ago.↩︎</p></li>
<li id="fn7"><p>Repeat the same test or make a new test for different data↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2022,
  author = {Simpson, Dan},
  title = {Barry {Gibb} Came Fourth in a {Barry} {Gibb} Look Alike
    Contest {(Repost)}},
  date = {2022-01-26},
  url = {https://dansblog.netlify.app/2022-01-26-barry-gibb-came-fourth-in-a-barry-gibb-look-alike-contest-repost},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2022" class="csl-entry quarto-appendix-citeas">
Simpson, Dan. 2022. <span>“Barry Gibb Came Fourth in a Barry Gibb Look
Alike Contest (Repost).”</span> January 26, 2022. <a href="https://dansblog.netlify.app/2022-01-26-barry-gibb-came-fourth-in-a-barry-gibb-look-alike-contest-repost">https://dansblog.netlify.app/2022-01-26-barry-gibb-came-fourth-in-a-barry-gibb-look-alike-contest-repost</a>.
</div></div></section></div> ]]></description>
  <category>Computation</category>
  <category>Assessing algorithms</category>
  <guid>https://dansblog.netlify.app/posts/2022-01-26-barry-gibb-came-fourth-in-a-barry-gibb-look-alike-contest-repost/barry-gibb-came-fourth-in-a-barry-gibb-look-alike-contest-repost.html</guid>
  <pubDate>Tue, 25 Jan 2022 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2022-01-26-barry-gibb-came-fourth-in-a-barry-gibb-look-alike-contest-repost/yetta.JPG" medium="image"/>
</item>
<item>
  <title>Why won’t you cheat with me? (Repost)</title>
  <dc:creator>Dan Simpson</dc:creator>
  <dc:creator>Dan Simpson</dc:creator>
  <link>https://dansblog.netlify.app/posts/2021-12-09-why-wont-you-cheat-with-me-repost/why-wont-you-cheat-with-me-repost.html</link>
  <description><![CDATA[ 





<blockquote class="blockquote">
<p>But I got some ground rules&nbsp; I’ve found to be sound rules<br>
and you’re not the one I’m exempting.<br>
Nonetheless, I confess it’s tempting.<br>
– <a href="https://www.youtube.com/watch?v=K2sPdIsr7jY">Jenny Toomey sings Franklin Bruno</a></p>
</blockquote>
<p>It turns out that I did something a little controversial in <a href="https://dansblog.netlify.app/posts/2021-12-08-the-king-must-die-repost/">last week’s</a><sup>1</sup> post. As these things always go, it wasn’t the thing I was expecting to get push back from, but rather what I thought was a fairly innocuous scaling of the prior. <a href="http://statmodeling.stat.columbia.edu/2017/11/02/king-must-die/#comment-601142">One commenter</a> (and a few other people on other communication channels) pointed out that the dependence of the prior on the design didn’t seem kosher. Of course, we (Andrew, Mike and I) wrote a paper that was sort of about this a <a href="http://www.stat.columbia.edu/~gelman/research/published/entropy-19-00555-v2.pdf">few months ago</a><sup>2</sup>, but it’s one of those really interesting topics that we can probably all deal with thinking more about.</p>
<p>So in this post, I’m going to go into a couple of situations where it makes sense to scale the prior based on fixed information about the experiment. (The emerging theme for these posts is “things I think are interesting and useful but are probably not publishable” interspersed with “weird digressions into musical theatre / the personal mythology of Patti LuPone”.)</p>
<p>If you haven’t clicked yet, this particular post is going to be drier than Eve Arden in Mildred Pierce. If you’d rather be entertained, I’d recommend <a href="https://open.spotify.com/album/2qY9GSG0nLoJdcQNmYxMGE">Tempting: Jenny Toomey sings the songs of Franklin Bruno</a>. (Franklin Bruno is today’s stand in for Patti, because I’m still sad that War Paint closed<sup>3</sup>. I only got to see it twice.)</p>
<p>(Jenny Toomey was one of the most exciting American indie musicians in the 90s both through her bands [Tsunami was the notable one, but there were others] and her work with Simple Machines, the label she co-founded. These days she’s working in musician advocacy and hasn’t released an album since the early 2000s. Bruno’s current band is called The Human Hearts. He has had a long solo career and was also in an excellent powerpop band called Nothing Painted Blue, who had an album called The Monte Carlo Method. And, now<sup>4</sup> that I live in Canada, I should say that that album has a fabulous cover of Mark Szabo’s I Should Be With You. To be honest, the only reason I work with Andrew and the Stan crew is that I figure if I’m in New York often enough I’ll eventually coincide with a Human Hearts concert<sup>5</sup>.)</p>
<section id="sparsity" class="level2">
<h2 class="anchored" data-anchor-id="sparsity">Sparsity</h2>
<blockquote class="blockquote">
<p>Why won’t you cheat with me? You and I both know you’ve done it before. – <a href="https://www.youtube.com/watch?v=dL-4ZQthJ5w">Jenny Toomey sings Franklin Bruno</a></p>
</blockquote>
<p>The first object of our affliction are priors that promote sparsity in high-dimensional models. There has been a lot of work on this topic, but the cheaters guide is basically this:</p>
<blockquote class="blockquote">
<p>While spike-and-slab models can exactly represent sparsity and have excellent theoretical properties, they are basically useless from a computational point of view. So we use scale-mixture of normal priors (also known as local-global priors) to achieve approximate sparsity, and then use some sort of decision rule to take our approximately sparse signal and make it exactly sparse.</p>
</blockquote>
<p>What is a scale-mixture of normals? Well it has the general form <img src="https://latex.codecogs.com/png.latex?%0A%5Cbeta_j%20%5Csim%20N(0,%20%5Ctau%5E2%20%5Cpsi%5E2_j),%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Ctau"> is a global standard deviation parameter, controlling how large the <img src="https://latex.codecogs.com/png.latex?%5Cbeta_j"> parameters are in general<sup>6</sup>, while the local standard deviation parameters <img src="https://latex.codecogs.com/png.latex?%5Cpsi_j"> control how big <img src="https://latex.codecogs.com/png.latex?%5Cbeta_j"> is <em>relative</em> to the other <img src="https://latex.codecogs.com/png.latex?%5Cbeta">s.</p>
<p>The priors for <img src="https://latex.codecogs.com/png.latex?%5Ctau"> and the <img src="https://latex.codecogs.com/png.latex?%5Cpsi_j"> are typically set to be independent. A lot of theoretical work just treats <img src="https://latex.codecogs.com/png.latex?%5Ctau"> as fixed (or as otherwise less important than the local parameters), but <a href="https://arxiv.org/abs/1610.05559">this is wrong</a>.</p>
<p><em>Pedant’s corner:</em> Andrew likes define mathematical statisticians as those who use <img src="https://latex.codecogs.com/png.latex?x"> for their data rather than <img src="https://latex.codecogs.com/png.latex?y">. I prefer to characterise them by those who think it’s a good idea to put a prior on variance (an un-elicitable quantity) rather than standard deviation (which is easy to have opinions about). Please people just stop doing this. You’re not helping yourselves!</p>
<p>Actually, maybe that last point isn’t for Pedant’s Corner after all. Because if you parameterise by standard deviation it’s pretty easy to work out what the marginal prior on <img src="https://latex.codecogs.com/png.latex?%5Cbeta_j"> (with <img src="https://latex.codecogs.com/png.latex?%5Ctau"> fixed) is.</p>
<p>This is quite useful because, with the notable exception of the “Bayesian” “Lasso” <a href="https://dansblog.netlify.app/posts/2021-12-08-the-king-must-die-repost/">which-does-not-work-but-will-never-die-because-it-was-inexplicably-published-in-the leading-stats-journal-by-prominent-statisticians-and-has-the-word-Lasso-in-the-title-even-though-a-back-of-the-envelope-calculation-or-I-don’t-know-a-fairly-straightforward-simulation-by-the-reviewers-should-have-nixed-it</a> (to use its married name), we can’t compute the marginal prior for most scale-mixtures of normals.</p>
<p>The following result, which was killed by reviewers at some point during the PC prior papers long review process, but lives forever <a href="https://arxiv.org/abs/1403.4630v1">in the arXiv’d first version</a>, tells you everything you need to know. It’s a picture because frankly I’ve had a glass of wine and I’m not bloody typing it all again<sup>7</sup>.</p>
<div id="thm-prior" class="theorem">
<p><span class="theorem-title"><strong>Theorem 1</strong></span> Let <img src="https://latex.codecogs.com/png.latex?%5Cpi_d(r)"> be a prior on the standard deviation of <img src="https://latex.codecogs.com/png.latex?v%20%5Csim%0A%7B%5Cmathcal%20N%7D(0,r%5E2)">. The induced prior <img src="https://latex.codecogs.com/png.latex?%0A%5Cpi(v)%20=%20%5Cint_0%5E%5Cinfty%0A%5Cfrac%7B1%7D%7B2%5Cpi%20r%7D%5Cexp%5Cleft(%7B-%5Cfrac%7Bv%5E2%7D%7B2r%5E2%7D%7D%5Cright)%5Cpi_d(r)%5C,dr%0A"> has the following properties. Fix <img src="https://latex.codecogs.com/png.latex?%5Cdelta%3E%200">.</p>
<ol type="1">
<li><p>If <img src="https://latex.codecogs.com/png.latex?%5Cpi_d(r)%20%5Cleq%20Cr%5Et"> for all <img src="https://latex.codecogs.com/png.latex?r%20%5Cin%20%5B0,%5Cdelta%5D"> and for some <img src="https://latex.codecogs.com/png.latex?C,t%20%3E0">, then <img src="https://latex.codecogs.com/png.latex?%5Cpi(v)"> is finite at <img src="https://latex.codecogs.com/png.latex?v=0">.</p></li>
<li><p>If <img src="https://latex.codecogs.com/png.latex?%5Cpi_d(r)%20%5Cin%20(0,%5Cinfty)"> for every <img src="https://latex.codecogs.com/png.latex?r%20%5Cin%0A%5B0,%5Cdelta%5D">, then <img src="https://latex.codecogs.com/png.latex?%5Cpi(v)"> has a weak logarithmic spike at zero, that is <img src="https://latex.codecogs.com/png.latex?%0A%5Cpi(v)%20=%20%5Cmathcal%7BO%7D%5Cleft%5B%5Clog%5Cleft(1%20+%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%5Cfrac%7B%5Cdelta%5E2%7D%7Bv%5E2%7D%5Cright)%5Cright%5D,%20%5Cqquad%20v%20%5Crightarrow%200.%0A"></p></li>
<li><p>If <img src="https://latex.codecogs.com/png.latex?%5Cint_0%5E%5Cdelta%20%5Cfrac%7B1%7D%7B2%5Cpi%0A%20%20r%7D%5Cexp%5Cleft(%7B-%5Cfrac%7Bv%5E2%7D%7B2r%5E2%7D%7D%5Cright)%5Cpi_d(r)%5C,dr%20%3C%0A%20%20%5Cinfty">, then <img src="https://latex.codecogs.com/png.latex?%0A%5Cpi(v)%20%5Cgeq%0A%5Cmathcal%7BO%7D%5Cleft(v%5E%7B-2%7D%5Cexp%5Cleft(-%5Cfrac%7Bv%5E2%7D%7B2%5Cdelta%5E2%7D%5Cright)%5Cright),%0A%5Cqquad%20%7Cv%7C%20%5Crightarrow%20%5Cinfty.%0A"></p></li>
<li><p>If <img src="https://latex.codecogs.com/png.latex?%5Cpi_d(r)%20%7B%5Cleq%7D(%7B%5Cgeq%7D)%20Cr%5E%7B-t%7D"> for all <img src="https://latex.codecogs.com/png.latex?r%20%5Cin%20%5B0,%5Cdelta%5D"> and for some <img src="https://latex.codecogs.com/png.latex?C,t%20%3E0">, then <img src="https://latex.codecogs.com/png.latex?%0A%5Cpi(v)%0A%7B%5Cleq%7D(%7B%5Cgeq%7D)%20%5Cmathcal%7BO%7D(%7Cv%7C%5E%7B-t%7D),%5Cqquad%20v%20%5Crightarrow%200.%0A"></p></li>
<li><p>If <img src="https://latex.codecogs.com/png.latex?%5Cpi_d(r)%20%7B%5Cleq%7D(%7B%5Cgeq%7D)%20Cr%5E%7B-t%7D"> for all <img src="https://latex.codecogs.com/png.latex?r%20%3E%5Cdelta"> and for some <img src="https://latex.codecogs.com/png.latex?C,t%20%3E0">, then <img src="https://latex.codecogs.com/png.latex?%0A%5Cpi(v)%0A%7B%5Cleq%7D(%7B%5Cgeq%7D)%20%5Cmathcal%7BO%7D(%7Cv%7C%5E%7B-t%7D),%5Cqquad%20%7Cv%7C%20%5Crightarrow%0A%5Cinfty.%0A"></p></li>
</ol>
</div>
<details>
<summary>
The proof is here.
</summary>
<p>For any <img src="https://latex.codecogs.com/png.latex?%5Cdelta%20%3E%200">, <img src="https://latex.codecogs.com/png.latex?%0A%5Cpi(v)%20=%0A%5Cint_0%5E%5Cdelta%5Cfrac%7B1%7D%7B2%5Cpi%20r%7D%0A%5Cexp%5Cleft(%7B-%5Cfrac%7Bv%5E2%7D%7B2r%5E2%7D%7D%5Cright)%0A%5Cpi_d(r)%5C,dr%20+%0A%5Cint_%5Cdelta%5E%5Cinfty%5Cfrac%7B1%7D%7B2%5Cpi%0Ar%7D%5Cexp%5Cleft(%7B-%5Cfrac%7Bv%5E2%7D%7B2r%5E2%7D%7D%5Cright)%0A%5Cpi_d(r)%5C,dr%20=%20I_1%20+%20I_2.%0A"> Examining this splitting, we note that <img src="https://latex.codecogs.com/png.latex?I_1"> will control the behaviour of <img src="https://latex.codecogs.com/png.latex?%5Cpi(v)"> near zero, while <img src="https://latex.codecogs.com/png.latex?I_2"> will control the tails.</p>
<p>Assuming that <img src="https://latex.codecogs.com/png.latex?%5Cint_%5Cdelta%5E%5Cinfty%20r%5E%7B-1%7D%5Cpi_d(r)%5C,dr%20%3C%20%5Cinfty">, we can bound <img src="https://latex.codecogs.com/png.latex?I_2"> as <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B1%7D%7B2%5Cpi%20%7D%5Cexp%5Cleft(%7B-%5Cfrac%7Bv%5E2%7D%7B2%5Cdelta%5E2%7D%7D%5Cright)%0A%5Cint_%5Cdelta%5E%5Cinfty%20r%5E%7B-1%7D%5Cpi_d(r)%5C,dr%20%5Cleq%20I_2%20%5Cleq%20%5Cfrac%7B1%7D%7B2%5Cpi%7D%0A%5Cint_%5Cdelta%5E%5Cinfty%20r%5E%7B-1%7D%5Cpi_d(r)%5C,dr.%0A"></p>
<p>To prove part 1, let <img src="https://latex.codecogs.com/png.latex?%5Cpi_d(r)%20%5Cleq%20Cr%5Et">, <img src="https://latex.codecogs.com/png.latex?r%20%5Cin%0A%5B0,%5Cdelta%5D"> for some <img src="https://latex.codecogs.com/png.latex?t%3E0">. Substituting this into <img src="https://latex.codecogs.com/png.latex?I_1"> and computing the resulting integral using Maple<sup>8</sup>, we get <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0AI_1%20&amp;%5Cleq%20-%20%5Cfrac%7BC%7D%7B2%5Cpi%20t%7D%5Cleft(%20%7B2%7D%5E%7B-1/2%5C,t%7D%7B%7Cv%7C%7D%5E%7Bt%7D%5CGamma%0A%5Cleft(%201-1/2%5C,t,1/2%5C,%7B%5Cfrac%20%7Bv%5E2%7D%7B%7B%5Cdelta%7D%5E%7B2%7D%7D%7D%20%5Cright)%20-%7B%7B%5Crm%0Ae%7D%5E%7B-1/2%5C,%7B%5Cfrac%20%7Bv%5E2%7D%7B%7B%5Cdelta%7D%5E%7B2%7D%7D%7D%7D%20%7D%7B%5Cdelta%7D%5E%7Bt%7D%0A%5Cright)%20=%20%5Cmathcal%7BO%7D(1),%0A%5Cend%7Balign*%7D"> where <img src="https://latex.codecogs.com/png.latex?%5CGamma(a,x)%20=%20%5Cint_x%5E%5Cinfty%0A%5Cexp%5Cleft(%7B-t%7D%5Cright)t%5E%7Ba-1%7D%5C,dt"> is the incomplete Gamma function.</p>
<p>To prove parts 2 and 3, we bound <img src="https://latex.codecogs.com/png.latex?I_1"> as follows. <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Cleft(%5Cinf_%7Br%5Cin%5B0,%5Cdelta%5D%7D%20%5Cpi_d(r)%0A%5Cright)%5Cint_0%5E%5Cdelta%5Cfrac%7B1%7D%7B2%5Cpi%0Ar%7D%5Cexp%5Cleft(%7B-%5Cfrac%7Bv%5E2%7D%7B2r%5E2%7D%7D%5Cright)%20%5C,dr%20&amp;%5Cleq%20I_1%20%5Cleq%0A%5Cleft(%5Csup_%7Br%5Cin%5B0,%5Cdelta%5D%7D%20%5Cpi_d(r)%0A%5Cright)%5Cint_0%5E%5Cdelta%5Cfrac%7B1%7D%7B2%5Cpi%20r%7D%5Cexp%5Cleft(%7B-%5Cfrac%7Bv%5E2%7D%7B2r%5E2%7D%7D%5Cright)%20%5C%5C%0A%5Cfrac%7B1%7D%7B4%5Cpi%7D%5Cleft(%5Cinf_%7Br%5Cin%5B0,%5Cdelta%5D%7D%20%5Cpi_d(r)%5Cright)%0A%5Ctext%7BE%7D_1%5Cleft(%5Cfrac%7Bv%5E2%7D%7B2%5Cdelta%5E2%7D%5Cright)%20&amp;%20%5Cleq%20I_1%20%5Cleq%0A%5Cfrac%7B1%7D%7B4%5Cpi%7D%5Cleft(%5Csup_%7Br%5Cin%5B0,%5Cdelta%5D%7D%20%5Cpi_d(r)%5Cright)%0A%5Ctext%7BE%7D_1%5Cleft(%5Cfrac%7Bv%5E2%7D%7B2%5Cdelta%5E2%7D%5Cright)%20%5C%5C%0A%5Cfrac%7B1%7D%7B8%5Cpi%7D%5Cleft(%5Cinf_%7Br%5Cin%5B0,%5Cdelta%5D%7D%20%5Cpi_d(r)%5Cright)%0A%5Cexp%5Cleft(%7B-%5Cfrac%7Bv%5E2%7D%7B2%5Cdelta%5E2%7D%7D%5Cright)%5Clog%5Cleft(%201%20+%0A%5Cfrac%7B4%5Cdelta%5E2%7D%7Bv%5E2%7D%5Cright)%20&amp;%5Cleq%20I_1%0A%5Cleq%5Cfrac%7B1%7D%7B4%5Cpi%7D%5Cleft(%5Csup_%7Br%5Cin%5B0,%5Cdelta%5D%7D%20%5Cpi_d(r)%5Cright)%0A%5Cexp%5Cleft(%7B-%5Cfrac%7Bv%5E2%7D%7B2%5Cdelta%5E2%7D%7D%5Cright)%5Clog%5Cleft(%201%20+%0A%5Cfrac%7B2%5Cdelta%5E2%7D%7Bv%5E2%7D%5Cright),%0A%5Cend%7Balign*%7D"> where <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BE%7D_1(x)%20=%20%5Cint_1%5E%5Cinfty%20t%5E%7B-1%7D%5Cexp%5Cleft(%7B-tx%7D%5Cright)%5C,dt"> and the third line of inequalities follows using standard bounds in the exponential integral<sup>9</sup>.</p>
<p>Combining the lower and upper bounds, it follows that if <img src="https://latex.codecogs.com/png.latex?0%0A%3C%5Cinf_%7Br%5Cin%5B0,%5Cdelta%5D%7D%20%5Cpi_d(r)%20%5Cleq%20%5Csup_%7Br%5Cin%5B0,%5Cdelta%5D%7D%0A%5Cpi_d(r)%20%3C%20%5Cinfty">, then <img src="https://latex.codecogs.com/png.latex?%5Cpi(v)"> has a logarithmic spike near zero. Similarly, the lower bounds show that <img src="https://latex.codecogs.com/png.latex?%5Cpi(v)%20%5Cgeq%20C%0Av%5E%7B-2%7D%5Cexp%5Cleft(-%5Cfrac%7Bv%5E2%7D%7B2%5Cdelta%5E2%7D%5Cright)"> as <img src="https://latex.codecogs.com/png.latex?v%5Crightarrow%20%5Cinfty">.</p>
<p>Part 4 follows by considering let <img src="https://latex.codecogs.com/png.latex?%5Cpi_d(r)%20=%20Cr%5E%7B-t%7D">, <img src="https://latex.codecogs.com/png.latex?r%20%5Cin%20%5B0,%5Cdelta%5D"> for some <img src="https://latex.codecogs.com/png.latex?t%3E0">. Substituting this into <img src="https://latex.codecogs.com/png.latex?I_1"> and computing the resulting integral using Maple, we get <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0AI_1%20&amp;%20=%20%5Cfrac%7BC%7D%7B2%5Cpi%20t%7D%5Cleft(%20%7B%7Cv%7C%7D%5E%7B-t%7D%5CGamma%20%5Cleft(%0A1+1/2%5C,t,1/2%5C,%7B%5Cfrac%20%7Bv%5E2%7D%7B%7B%5Cdelta%7D%5E%7B2%7D%7D%7D%20%5Cright)%0A%7B2%7D%5E%7Bt/2%7D-%7B%5Cdelta%7D%5E%7B-t%7D%7B%7B%5Crm%20e%7D%5E%7B-1/2%5C,%7B%5Cfrac%0A%7Bv%5E2%7D%7B%7B%5Cdelta%7D%5E%7B2%7D%7D%7D%7D%7D%20%5Cright)%20%5Csim%0A%5Cmathcal%7BO%7D(v%5E%7B-t%7D)%0A%5Cend%7Balign*%7D"> as <img src="https://latex.codecogs.com/png.latex?v%20%5Crightarrow%200">. We note that <img src="https://latex.codecogs.com/png.latex?I_1%20=%0A%5Cmathcal%7BO%7D%5Cleft(%5Cexp%5Cleft(-v%5E2/(2%5Cdelta%5E2)%5Cright)%5Cright)"> as <img src="https://latex.codecogs.com/png.latex?%7Cv%7C%0A%5Crightarrow%20%5Cinfty">.</p>
<p>To prove part 5, let <img src="https://latex.codecogs.com/png.latex?%5Cpi_d(r)%20=%20Cr%5E%7B-t%7D">, <img src="https://latex.codecogs.com/png.latex?r%20%5Cin%0A(%5Cdelta,%5Cinfty)"> for some <img src="https://latex.codecogs.com/png.latex?t%3E0">. Substituting this into <img src="https://latex.codecogs.com/png.latex?I_2">, we get <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0AI_2%20=%20%5Cfrac%7BC%7D%7B8%5Cpi%5E2%7D%5C,%7B2%7D%5E%7B1/2%5C,t%7D%7B%7Cv%7C%7D%5E%7B-t%7D%20%5Cleft(%20%5CGamma%0A%5Cleft(%201/2%5C,t%20%5Cright)%20-%20%5CGamma%20%5Cleft(%201/2%5C,t,1/2%5C,%7B%5Cfrac%0A%7B%7Bv%7D%5E%7B2%7D%7D%7B%7B%5Cdelta%7D%5E%7B2%7D%7D%7D%20%5Cright)%20%5Cright)%20=%0A%5Cmathcal%7BO%7D(%7Cv%7C%5E%7B-t%7D),%0A%5Cend%7Balign*%7D"> where we used the identity <img src="https://latex.codecogs.com/png.latex?%0A%5CGamma%20%5Cleft(%201/2%5C,t%20%5Cright)%20-%20%5CGamma%0A%5Cleft(%201/2%5C,t,1/2%5C,%7B%5Cfrac%20%7B%7Bv%7D%5E%7B2%7D%7D%7B%7B%5Cdelta%7D%5E%7B2%7D%7D%7D%20%5Cright)%0A%5Crightarrow%20%5CGamma%5Cleft(%201/2%5C,t%20%5Cright)%0A"> as <img src="https://latex.codecogs.com/png.latex?%7Cv%7C%5Crightarrow%0A%5Cinfty">.</p>
<strong>Done.</strong>
</details>
<p>All of this basically says the following:</p>
<ul>
<li><p>If the density of the prior on the standard deviation is finite at zero, then the implied prior on <img src="https://latex.codecogs.com/png.latex?%5Cbeta_j"> has a logarithmic spike at zero.</p></li>
<li><p>If the density of the prior on the standard has a polynomial tail, then the implied prior on <img src="https://latex.codecogs.com/png.latex?%5Cbeta_j"> has the same polynomial tail.</p></li>
<li><p>Not in the result, but computed at the time: if the prior on the standard deviation is exponential, the prior on <img src="https://latex.codecogs.com/png.latex?%5Cbeta_j"> still has Gaussian-ish tails. I couldn’t work out what happened in the hinterland between exponential tails and polynomial tails, but I suspect at some point the tail on the standard deviation does eventually get heavy enough to be seen in the marginal, but I can’t tell you when.)</p></li>
</ul>
<p>With this sort of information, you can compute the equivalent of the bounds that I did on the Laplace prior for the general case (or, actually, for the case that will have at least a little bit of a chance, which is the monotonically decreasing priors on the standard deviation).</p>
<p>In particular, <a href="https://dansblog.netlify.app/posts/2021-12-08-the-king-must-die-repost/">if you run the argument from the last post</a>, you see that you need a quite heavy tail on the standard deviation prior to get a reasonable prior on the implied sparsity. In particular, <a href="https://arxiv.org/pdf/1403.4630v4.pdf">we showed</a> that applying this reasoning to the horseshoe prior, where the prior on the local standard deviation is half-Cauchy, you can see that there is a <img src="https://latex.codecogs.com/png.latex?%5Clambda"> that gives <em>a priori</em> weight on <img src="https://latex.codecogs.com/png.latex?p%5E%7B-1%7D">-sparse signals, while also letting you have a few very large <img src="https://latex.codecogs.com/png.latex?%5Cbeta_j">s.</p>
<section id="the-design-scaling-in-these-priors-links-directly-to-an-implied-decision-process" class="level3">
<h3 class="anchored" data-anchor-id="the-design-scaling-in-these-priors-links-directly-to-an-implied-decision-process">The design-scaling in these priors links directly to an implied decision process</h3>
<blockquote class="blockquote">
<p>You’d look better if your shadow didn’t follow you around, but it looks as though you’re tethered to the ground, just like every pound of flesh I’ve ever found. – <a href="https://www.youtube.com/watch?v=mIp4X7_cA3g">Franklin Bruno in a sourceless light</a>.</p>
</blockquote>
<p>For a very simple decision process (the deterministic threshold process described in the previous post), you can work out exactly how the threshold needs to interact with the prior. In particular, we can see that if we’re trying to detect a true signal that is exactly zero (no components are active), then we know that <img src="https://latex.codecogs.com/png.latex?latex%20%5C%7C%20%5Cmathbf%7BX%7D%20%5Cboldsymbol%7B%5Cbeta%7D%20%5C%7C%20=%200">. This is not possible for these scale-mixture models, but we can require that in this case all of the components are at most <img src="https://latex.codecogs.com/png.latex?latex%20%5Cepsilon">, in which case <img src="https://latex.codecogs.com/png.latex?%0A%5C%7C%20%5Cmathbf%7BX%7D%5Cboldsymbol%7B%5Cbeta%7D%20%5C%7C%20%5Cleq%20%5Cepsilon%20%5C%7C%20%5Cmathbf%7BX%7D%20%5C%7C,%0A"> which suggests we want <img src="https://latex.codecogs.com/png.latex?%5Cepsilon%20%5Cll%20%5C%7C%20%5Cmathbf%7BX%7D%20%5C%7C_%5Cinfty%5E%7B-1%7D">. The calculation in the previous post shows that if we want this sort of almost zero signal to have any mass at all under the prior, we need to scale <img src="https://latex.codecogs.com/png.latex?%5Clambda"> using information about <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7BX%7D">.</p>
<p>Of course, this is a very very simple decision process. I have absolutely no idea how to repeat these arguments for actually good decision processes, like the predictive loss minimization <a href="https://arxiv.org/abs/1707.01694">favoured by Aki</a>. But I’d still expect that we’d need to make sure there was a priori enough mass in the areas of the parameter space where the decision process is firmly one way or another (as well as <a href="https://statmodeling.stat.columbia.edu/2017/10/29/contour-as-a-verb/">mass in the indeterminate region</a>). I doubt that the Bayesian Lasso would magically start to work under these more complex losses.</p>
</section>
</section>
<section id="models-specified-through-their-full-conditionals" class="level2">
<h2 class="anchored" data-anchor-id="models-specified-through-their-full-conditionals">Models specified through their full conditionals</h2>
<blockquote class="blockquote">
<p>Why won’t you cheat with me? You and I both know that he’s done the same. – <a href="https://www.youtube.com/watch?v=Ozsc2AQqYKw">Franklin Bruno</a></p>
</blockquote>
<p>So we can view the design dependence of sparsity priors as preparation for the forthcoming decision process. (Those of you who just mentally broke into <a href="https://www.youtube.com/watch?v=c1SiaCV26aQ">Prepare Ye The Way Of The Lord</a> from Godspell, please come to the front of the class. You are my people.) Now let’s talk about a case where this isn’t true.</p>
<p>To do this, we need to cast our minds back to a time when people really did have the original cast recording of Godspell on their mind. In particular, we need to think about <a href="https://www.youtube.com/watch?v=pqoeM18vCaU">Julian Besag</a> (who I’m sure was really into musicals about Jesus. I have no information to the contrary, so I’m just going to assume it’s true.) who wrote a series of important papers, one in <a href="https://www.jstor.org/stable/2984812">1974</a> and one in <a href="https://www.jstor.org/stable/2987782">1975</a> (and several before and after, but I can’t be arsed linking to them all. We all have google.) about specifying models through conditional independence relations.</p>
<p>These models have a special place in time series modelling (where we all know about discrete-time Markovian processes) and in spatial statistics. In particular, generalisations of Besag’s (Gaussian) conditional autoregressive (CAR) models are w<a href="https://arxiv.org/abs/1601.01180">idely used in spatial epidemiology</a>.</p>
<p>Mathematically, Gaussian CAR models (and more generally <a href="https://www.routledge.com/Gaussian-Markov-Random-Fields-Theory-and-Applications/Rue-Held/p/book/9781584884323">Gaussian Markov random fields</a> on graphs) are defined through their precision matrix, that is the inverse of the covariance matrix as <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbf%7Bx%7D%20%5Csim%20N(%5Cmathbf%7B0%7D,%20%5Ctau%5E%7B-1%7D%5Cmathbf%7BQ%7D%5E%7B-1%7D).%0A"></p>
<p>For simple models, such as the popular CAR model, we assume <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7BQ%7D"> is fixed, known, and sparse (i.e.&nbsp;it has a lot of zeros) and we typically interpret <img src="https://latex.codecogs.com/png.latex?%5Ctau"> to be the inverse of the variance of <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7Bx%7D">.</p>
<p>This interpretation of <img src="https://latex.codecogs.com/png.latex?%5Ctau"> could not be more wrong.</p>
<p>Why? Well, let’s look at the marginal distribution <img src="https://latex.codecogs.com/png.latex?%0Ax_j%20%5Csim%20N%5Cleft(0,%20%5Ctau%5E%7B-1%7D%5BQ%5E%7B-1%7D%5D_%7Bii%7D%5Cright).%0A"></p>
<p>To interpet <img src="https://latex.codecogs.com/png.latex?%5Ctau"> and the inverse variance, we need the diagonal elements of <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7BQ%7D%5E%7B-1%7D"> to all be around 1. <em>This is never the case.</em></p>
<p>A simple, mathematically tractable example is the first order random walk on a one-dimensional lattice, which can be written in terms of the increment process as <img src="https://latex.codecogs.com/png.latex?%0Ax_%7Bj+1%7D%20-%20x_j%20%5Csim%20N(0,%20%5Ctau%5E%7B-1%7D),%20%5Cqquad%20j%20=%201,%20%5Cldots%20J-1.%0A"></p>
<p>Conditioned on a particular starting point, this process looks a lot like a discrete version of Brownian motion as you move the lattice points closer together. This is a useful model for rough non-linear random effects, such as the baseline hazard rate in a Cox proportional hazard model. A long and detailed (and quite general) discussion of these models can be found in <a href="https://www.routledge.com/Gaussian-Markov-Random-Fields-Theory-and-Applications/Rue-Held/p/book/9781584884323">Rue and Held’s book</a>.</p>
<p>I am bringing this case up because you can actually work out the size of the diagonal of <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7BQ%7D%5E%7B-1%7D">. <a href="https://www.sciencedirect.com/science/article/abs/pii/S2211675313000407">Sørbye and Rue</a> talk about this in detail, but for this model maybe the easiest way to understand it is that if we had a fixed lattice with <img src="https://latex.codecogs.com/png.latex?n"> points and we’d carefully worked out a sensible prior for <img src="https://latex.codecogs.com/png.latex?%5Ctau">. Now imagine that we’ve gotten some new data and instead of only <img src="https://latex.codecogs.com/png.latex?n"> points in the lattice, we got information at a finer scale, so now the same interval is covered by <img src="https://latex.codecogs.com/png.latex?nk"> equally spaced nodes. We model this with the new first order random walk prior <img src="https://latex.codecogs.com/png.latex?%0Ax'_%7Bj+1%7D%20-%20x'_j%20%5Csim%20N(0,%5B%5Ctau'%5D%5E%7B-1%7D).%0A"></p>
<p>It turns out that we can relate the inverse variances of these two increment processes as <img src="https://latex.codecogs.com/png.latex?%5Ctau'%20=%20J%20%5Ctau">.</p>
<p>This strongly suggests that we should not use the same prior for <img src="https://latex.codecogs.com/png.latex?%5Ctau"> as we should for <img src="https://latex.codecogs.com/png.latex?%5Ctau'">, but that the prior should actually know about how many nodes there are on the lattice. Concrete suggestions are in the Sørbye and Rue paper linked above.</p>
<section id="design-dependence-for-markov-random-fields" class="level3">
<h3 class="anchored" data-anchor-id="design-dependence-for-markov-random-fields">Design dependence for Markov random fields</h3>
<blockquote class="blockquote">
<p>Not to coin a phrase, but play it as it lays – <a href="https://open.spotify.com/track/5gs7YbjVEjjKMNNO219iJe?si=e20a167de87a4dc4">Franklin Bruno in Nothing Painted Blue</a></p>
</blockquote>
<p>This type of design dependence is a general problem for multivariate Gaussian models specified through their precision (so-called Gaussian Markov random fields). The critical thing here is that, unlike the sparsity case, the design dependence does not come from some type of decision process. It comes from the gap between the parameterisation (in terms of <img src="https://latex.codecogs.com/png.latex?%5Ctau"> and <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7BQ%7D">) and the elicitable quantity (the scale of the random effect).</p>
<p>This is kinda a general lesson. <em>When specifying multivariate priors, you must always check the implications of your prior on the one- and two-dimensional quantities of interest. Because weird things happen in multivariate land!</em></p>
</section>
</section>
<section id="gaussian-process-models" class="level2">
<h2 class="anchored" data-anchor-id="gaussian-process-models">Gaussian process models</h2>
<blockquote class="blockquote">
<p>And it’s not like we’re tearing down a house of more than gingerbread. It’s not like we’re calling down the wrath of heaven on our heads. – <a href="(https://www.youtube.com/watch?v=dL-4ZQthJ5w)">Jenny Toomey sings Franklin Bruno</a></p>
</blockquote>
<p>So the design dependence doesn’t necessarily come in preparation for some kind of decision, it can also be because we have constructed (and therefore parameterised) our process in an inconvenient way. Let’s see if we can knock out another one before my bottle of wine dies.</p>
<p><a href="https://dansblog.netlify.app/posts/2021-11-03-yes-but-what-is-a-gaussian-process-or-once-twice-three-times-a-definition-or-a-descent-into-madness/">Gaussian processes</a>, the least exciting tool in the machine learner’s toolbox, are another example where your priors need to be design dependent. It will probably surprise you not a single sausage that in this case the need for design dependence comes from a completely different place.</p>
<p>For simplicity let’s consider a Gaussian process <img src="https://latex.codecogs.com/png.latex?f(t)"> in one dimension with isotropic covariance function <img src="https://latex.codecogs.com/png.latex?%0Ac(s,t)%20=%5Csigma%5E2%20(%5Ckappa%7Cs-t%7C)%5E%5Cnu%20K_%5Cnu(%7C%5Ckappa%7Cs-t%7C).%0A"></p>
<p>This is the commonly encountered Whittle-Matérn family of covariance functions. The distinguished members are the exponential covariance function when <img src="https://latex.codecogs.com/png.latex?%5Cnu%20=%200.5"> and the squared exponential function <img src="https://latex.codecogs.com/png.latex?%0Ac(s,t)=%20%5Csigma%5E2%5Cexp%5Cleft(%5Ckappa%20%7Cs-t%7C%5E2%20%5Cright),%0A"></p>
<p>which is the limit as <img src="https://latex.codecogs.com/png.latex?%5Cnu%20%5Crightarrow%20%5Cinfty">.</p>
<p>One of the inconvenient features of Matérn models in 1-3 dimensions is that it is impossible to consistently recover all of the parameters by simply observing more and more of the random effect on a fixed interval. You need to see new replicates in order to properly pin these down<sup>10</sup>.</p>
<p>So one might expect that this non-identifiability would be the source of some problems.</p>
<p>One would be wrong.</p>
<p>The squared exponential covariance function does not have this pathology, but it’s still very very hard to fit. Why? Well the problem is that you can interpret <img src="https://latex.codecogs.com/png.latex?%5Ckappa"> as an inverse-range parameter. Roughly, the interpretation is that if <img src="https://latex.codecogs.com/png.latex?%0A%7Cs%20-%20t%20%7C%20%3E%20%5Cfrac%7B%20%5Csqrt%7B%208%20%5Cnu%20%7D%20%7D%7B%5Ckappa%7D%0A"> then the value of <img src="https://latex.codecogs.com/png.latex?u(s)"> is approximately independent of the value of <img src="https://latex.codecogs.com/png.latex?u(t)">.</p>
<p>This means that a fixed data set provides no information about <img src="https://latex.codecogs.com/png.latex?%5Ckappa"> in large parts of the parameter space. In particular if <img src="https://latex.codecogs.com/png.latex?%5Ckappa%5E%7B-1%7D"> is bigger than the range of the measurement locations, then the data has almost no information about the parameter.</p>
<p>Similarly, if <img src="https://latex.codecogs.com/png.latex?%5Ckappa%5E%7B-1%7D"> is smaller than the smallest distance between two data points (or for irregular data, this should be something like “smaller than some low quantile of the set of distances between points”), then the data will have nothing to say about the parameter.</p>
<p>Of these two scenarios, it turns out that the inference is much less sensitive to the prior on small values of <img src="https://latex.codecogs.com/png.latex?%5Ckappa"> (ie ranges longer than the data) than it is on small values of <img src="https://latex.codecogs.com/png.latex?%5Ckappa"> (ie ranges shorter than the data).</p>
<p>Currently, we have two recommendations: one based around <a href="https://arxiv.org/abs/1503.00256">PC priors</a> and a very similar one based around <a href="https://mc-stan.org/docs/2_28/stan-users-guide/fit-gp.html#priors-gp.section">inverse gamma priors</a>. But both of these require you to specify the design-dependent quantity of a “minimum length scale we expect this data set to be informative about”.</p>
<section id="design-for-gaussian-processes-id-say-designing-women-but-im-aware-of-the-demographics" class="level3">
<h3 class="anchored" data-anchor-id="design-for-gaussian-processes-id-say-designing-women-but-im-aware-of-the-demographics">Design for Gaussian processes (I’d say “Designing Women”, but I’m aware of the demographics)</h3>
<blockquote class="blockquote">
<p>I’m a disaster, you’re a disaster, we’re a disaster area. – Franklin Bruno in The Human Hearts (featuring alto extraordinaire and cabaret god Ms Molly Pope)</p>
</blockquote>
<p>So in this final example we hit our ultimate goal. A case where design dependent priors are needed not because of a hacky decision process, or an awkward multivariate specification, but due to the limits of the data. In this case, priors that do not recognise the limitation of the design of the experiment will lead to poorly behaving posteriors. This manifests as the Gaussian processes severely over-fitting the data.</p>
<p>This is the ultimate expression of the point that we tried to make in the Entropy paper: <a href="http://www.stat.columbia.edu/~gelman/research/published/entropy-19-00555-v2.pdf">The prior can often only be understood in the context of the likelihood</a>.</p>
</section>
</section>
<section id="principles-can-only-get-you-so-far" class="level2">
<h2 class="anchored" data-anchor-id="principles-can-only-get-you-so-far">Principles can only get you so far</h2>
<blockquote class="blockquote">
<p>I’m making scenes, you’re constructing dioramas – <a href="https://open.spotify.com/track/4ZsjitFg4P22jukvBCSxO8?si=4c7b551f72054d92">Franklin Bruno in Nothing Painted Blue</a></p>
</blockquote>
<p>Just to round this off, I guess I should mention that the strong likelihood principle really does suggest that certain details of the design are not relevant to a fully Bayesian analysis. In particular, if the design only pops up in the normalising constant of the likelihood, it should not be relevant to a Bayesian. This seems at odds with everything I’ve said so far.</p>
<p>But it’s not.</p>
<p>In each of these cases, the design was only invoked in order to deal with some external information. For sparsity, design was needed to properly infer a sparse signal and came in through the structure of the decision process.</p>
<p>For the CAR models, the external information was that the elicitable quantity was the marginal standard deviation, which was a complicated function of the design and the standard parameter.</p>
<p>For Gaussian processes, the same thing happened: the implicit decision criterion was that we wanted to make good predictions. The design told us which parts of the parameter space obstructed this goal, and a well specified prior removed the problem.</p>
<p>There are also any number of cases in real practice where the decision at hand is stochastically dependent on the data gathering mechanism. This is why things like MRP exist.</p>
<p>I guess this is the tl;dr version of this post (because apparently I’m too wordy for some people. I suggest they read other things. Of course suggesting this in the final paragraph of such a wordy post is very me.):</p>
<p><em>Design matters even if you’re Bayesian. Especially if you want to do something with your posterior that’s more exciting than just sitting on it.</em></p>
<p><strong>Edited from an <a href="https://statmodeling.stat.columbia.edu/2017/11/05/why-wont-you-cheat-with-me/">original blog, posted November 2017</a>.</strong></p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Imagine it’s November 2017.↩︎</p></li>
<li id="fn2"><p>Again, 2017.↩︎</p></li>
<li id="fn3"><p>2021: I am still sad War Paint closed.↩︎</p></li>
<li id="fn4"><p>2017↩︎</p></li>
<li id="fn5"><p>I eventually did coincide with a Human Hearts concert and, to my extreme joy, Jenny Toomey did two songs with the band! They were supporting <a href="https://www.youtube.com/watch?v=oBwd4rAr3Rc">Gramercy Arms</a>, who I’d never heard before that night but have several perfect albums.↩︎</p></li>
<li id="fn6"><p>This is like the standard deviation we’d use in an iid normal prior for a non-sparse model.↩︎</p></li>
<li id="fn7"><p>2021: I did indeed type it all again. And a proof. Because why bother if you’re not going to do it well.↩︎</p></li>
<li id="fn8"><p>Yes. No open source for me!↩︎</p></li>
<li id="fn9"><p>Abramowitz, M. and Stegun, I. (1972). Handbook of Mathematical Functions. Formula 5.1.20↩︎</p></li>
<li id="fn10"><p>There’s a recent paper (2021) in JRSSSB that says that these nuggets are identifiable under infill with a “nugget”, which is equivalent to observing with iid noise that magically stays independent as you observe locations closer and closer together. I will let you judge how relevant this case is to your practice. But regardless, for a <em>finite</em> set of data under any reasonable likelihood, you hit these identifiabiliy problems. And in my personal experience, they persevere even with a decent number of sites.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{simpson2021,
  author = {Simpson, Dan and Simpson, Dan},
  title = {Why Won’t You Cheat with Me? {(Repost)}},
  date = {2021-12-09},
  url = {https://dansblog.netlify.app/2021-12-08-why-wont-you-cheat-with-me-repost/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-simpson2021" class="csl-entry quarto-appendix-citeas">
Simpson, Dan, and Dan Simpson. 2021. <span>“Why Won’t You Cheat with Me?
(Repost).”</span> December 9, 2021. <a href="https://dansblog.netlify.app/2021-12-08-why-wont-you-cheat-with-me-repost/">https://dansblog.netlify.app/2021-12-08-why-wont-you-cheat-with-me-repost/</a>.
</div></div></section></div> ]]></description>
  <category>Prior distributions</category>
  <category>Fundamentals</category>
  <category>Design dependence</category>
  <guid>https://dansblog.netlify.app/posts/2021-12-09-why-wont-you-cheat-with-me-repost/why-wont-you-cheat-with-me-repost.html</guid>
  <pubDate>Wed, 08 Dec 2021 14:00:00 GMT</pubDate>
  <media:content url="https://dansblog.netlify.app/posts/2021-12-09-why-wont-you-cheat-with-me-repost/sylvia2.JPG" medium="image"/>
</item>
</channel>
</rss>
