Selected posts

Let's walk amongst the trees for a bit.

…I don’t believe in the idea that there are a few peculiar people capable of understanding math and the rest of the world is normal. Math is a human discovery, and it’s no more complicated than humans can understand. I had a calculus book once that said, “What one fool can do, another fool can.”
–Richard Feynman

It’s a great time, maybe the best time in the short history of computer technology to date, to learn programming and data science. You’re surrounded by an incredible wealth of free tools and resources, and the excitement about these disciplines in society at large is palpable. But with that wealth comes danger: snake oil salesmen and the “quick fix”. The answer is the same as it always was.

Computational notebooks combine literate programming’s interleaving of source text with prose descriptions and multimedia output with image-based development’s interactivity and mutability. It’s the combination of the two ideas that kills notebooks as a practical tool for software engineering.

“There’s an old saying in Tennessee—I know it’s in Texas, probably in Tennessee—that says, ‘Fool me once, shame on…shame on you. Fool me—you can’t get fooled again.’”
–George W. Bush

It’s true, you can’t get fooled again! Not any more than you can open an already-open door. But does your type system know that?

Today, in honor of the recent release of Python 3.8, we’ll introduce a fun type-level programming trick well-known already in other language communities, which will let us automatically check these and other invariants.

Last time, we used a minimalist parser combinator library to build a parser for an oddly familiar language called OBAN. The problem with our previous parser is that it produces extremely unhelpful error messages. This is probably fine for a parser which runs as part of an automated toolchain and processes almost-always-valid input, but is completely unacceptable for a user-facing tool.

We’ll address this, while making only minimal changes to the parser’s structure, by tweaking the “base monad” on which it's built. In other words, we’ll change what it means to chain parsers together.

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.
–Jamie Zawinski

The fundamental problem with regular expressions is that they only recognize regular languages. This sounds, and is, tautological, but it has huge implications.

It’s important, as a rule of thumb, when operating or investing in a firm which produces a physical commodity, to have the ability to reliably quantify the expected future production of the commodity given the firm’s assets. In the oil and gas industry, we have many different ways to forecast future production from an oil or gas well. Some rely on detailed measurements and explicitly incorporate detailed mathematical models of flow physics. Others use whatever historical data we can scrape together and a bit of curve-fitting.

Today, we’re talking about the second kind.

A couple weekends ago, I found myself with the desire to fetch oil and gas production data for a specific county in New Mexico from the New Mexico Oil Conservation Division (OCD).

Fortunately, the OCD provides access to historical well production via an FTP server. The OCD doesn’t seem to provide a way to query a limited time- or area-based subset of production history data, so we’re stuck with a single ZIP file for “all of New Mexico since the dawn of time”. The result is a whopping 712MB ZIP file.

Here’s where I knew I was in trouble: the only thing inside was a single 38 GB file called wcproduction.xml.

If you wish to make an apple pie from scratch, you must first create the universe.
–Carl Sagan

What is object-oriented programming really about? What’s so special about “late binding”? And why do I have to pass self around everywhere in Python? We’ll take a meandering path in today’s post which will try to answer each of these questions, and build our own miniature object system along the way.

Oh, R. I can’t tell you why “data scientists” have switched, for the same reason I can’t tell you how Santa’s reindeer achieve lift: I simply don’t believe in them. But, I can tell you why I no longer use R for new projects.


This blog will never track you, collect your data, or serve ads. If you'd like to support the blog, consider our tip jars.

DOGE: DPZnAPuvYAfCfHZaBjnEkZkdBn4trJbKUK

BTC: bc1qjtdl3zdjfelgpru2y3epa56v5l42c6g9qylmjx

ETH: 0xAf93BaFF53b58D2CA4D2b0F3F5ec79339f6b59F7