Patching the cpython stdlib is deceptively easy

Let’s say you’re debugging something in python, but the functionality you want to interact with is actually part of a C extension in the stdlib. Now what? You could use debugger wizardry, or you could just patch the stdlib.

Let’s imagine I want to print out a pointer to the current SSL socket whenever it is read.

diff --git a/Modules/_ssl.c b/Modules/_ssl.c
index f5113ddf7b2..27a40ad5467 100644
--- a/Modules/_ssl.c
+++ b/Modules/_ssl.c
@@ -2483,6 +2483,7 @@ _ssl__SSLSocket_read_impl(PySSLSocket *self, Py_ssize_t len,
     _PySSLError err;
     int nonblocking;
     PySocketSockObject *sock = GET_SOCKET(self);
+    fprintf(stderr, "%p\n", sock);
     _PyTime_t timeout, deadline = 0;
     int has_timeout;

It turns out that it’s actually quite easy to work out of tree on many parts of the Python stdlib. It looks like:

# clone an ABI-compatible cpython
git clone -b 3.12 https://github.com/python/cpython.git

# Create a new module directory
mkdir myssl
cd myssl

# Give it a setup.py
cat > setup.py <<EOF
import glob
from setuptools import Extension, setup

sources = glob.glob("**/*.c", recursive=True)
sources.remove("_ssl/debughelpers.c")
module = Extension(
    "_ssl", sources=sources, extra_link_args=["-lssl", "-lcrypto"]
)
setup(name="myssl", ext_modules=[module])
EOF

# get the module I want to build
cp -r ../cpython/Modules/_ssl* .
cp -r ../cpython/Modules/socketmodule.h .
mkdir clinic && cp ../cpython/Modules/clinic/_ssl.c.h clinic/

# build it
python3.12 setup.py build_ext --inplace

# run it
python3.12 - <<EOF
import _ssl
print(_ssl.__file__)
import requests
print(requests.get("https://1.1.1.1"))
EOF

I say “deceptively” easy because the --inplace flag copies the resulting extension module into your local directory, which makes it part of your PYTHONPATH, which immediately causes the new functionality to automatically override the old.

Why does it work? Because:

CPython’s Application Binary Interface (ABI) is forward- and backwards-compatible across a minor release (if these are compiled the same way; see Platform Considerations below). So, code compiled for Python 3.10.0 will work on 3.10.8 and vice versa, but will need to be compiled separately for 3.9.x and 3.11.x.

We’re calling the same C APIs that any other Python extension module would call, which means that we get the same guarantees of compatibility. I have used this at work to ship an optimization to the speed of TLS downloads in multithreaded contexts.

You can also use cpython internal headers, which are not subject to the ABI guarantee. In that case you’d need to recompile your module for every cpython minor release, and would make your code unsafe to ship as a standard wheel, since wheels are tagged by “platform” (e.g. 3.12), not by ABI. Compatibility issues aside, building a patched asyncio is even easier:

cp -r ../cpython/Modules/_asynciomodule.c .
mkdir clinic && cp ../cpython/Modules/clinic/_asynciomodule.c.h clinic/

cat > setup.py <<EOF
import glob, sysconfig
from setuptools import Extension, setup

include_path = sysconfig.get_path("include")
module = Extension(
    "_asyncio",
    sources=["_asynciomodule.c"],
    include_dirs=[f"{include_path}/internal"],
)
setup(name="myasyncio", ext_modules=[module])
EOF