Open Source Solves J.K. Rowling Mystery

This post was co-authored by Garrett Heath.

As OSCON, a global conference on open source software, got underway in Portland this week, the timing of the recent J.K. Rowling unmasking couldn’t have been better. As my colleague and co-author, Garrett Heath, tweeted from the conference, “Accio Open Source!” For the three people left on the planet who haven’t read a Harry Potter book, that’s a common summoning charm used among Rowling’s fictional wizards.

So now we know that first time author Robert Galbraith’s mystery novel The Cuckoo’s Calling didn’t become an “instant” bestseller because the critics loved it, which they did. For the first few months after it came out in April, it sold fewer than 1,500 copies—a common fate for debut novels. When the UK’s Sunday Times cracked the case and probed Ms. Rowling into a confession, we all watched and read about the uptick in sales. It is now the number one selling book on Amazon. Not only is this a testament to the power of brand marketing—J.K. Rowling is the Coca-Cola of fiction, after all—but also to the rising prevalence and power of open source software.

The events that led to the revelation of Rowling as Galbraith could have been ripped straight from the pages of a modern spy novel. A journalist at the Sunday Times gets an anonymous tip on Twitter claiming that Rowling is the real author of The Cuckoo’s Calling, but before any verification could happen, the tipper’s Twitter account is deleted. The newspaper then calls on two academics to act as literary sleuths: Peter Millican, who teaches philosophy and computing at Oxford University, and Patrick Juola, a computer science professor at Duquesne University in Pittsburgh. The newspaper provides the men with machine-readable texts of The Cuckoo’s Calling along with Rowling’s previous novel, The Casual Vacancy. It also provides them with a few crime novels by other British women writers, to be used as textual control groups.

The software that Juola and Millican used—the Java Authorship Attribution Program—is open source and freely available on GitHub for download. The academics studied the machine-readable text of Cuckoo’s and compared it to Rowling’s previous novel. In the course of doing so, they discovered a number of linguistic signatures that pointed to the author of Harry Potter. The software is predicated on the analysis of syntax, style and punctuation, but just as importantly on the distinctive use of prepositions and articles. It turns out writers can change sentence length and rhythm and can cater to a new audience, but they’re unlikely to change how they use “around” and “at” and “on.”

A word as simple and as “marked” as “whilst” can narrow down the field of possible authors. In the early 1960s, researchers studied the Federalist Papers, co-written by Alexander Hamilton, James Madison and John Jay during the creation of the U.S. Constitution. It turned out that Madison used the more British “whilst” and “on” over “upon” in his essays. Meanwhile, Hamilton tended to use “while” and “on.” These linguistic markers allowed researches to tell which essays were primarly written by Hamilton and which by Madison. They didn’t have the benefit of open source software, but it’s worth noting that their methodical techniques laid the groundwork for future literary, open source hackers.

Another case of literary unmasking with open source software occurred with Agatha Christie’s cannon. In 2010, Ian Lancashire, an English Professor at the University of Toronto, took 16 Agatha Christie novels, written over a 50-year period, and fed the text into a computer program. (Incidentally, the software, called TACT, is freely available for download and comes with a manual published by the Modern Language Association of America.) He wasn’t looking for the true identity of a pseudonymous author, however. He was just looking for notable trends across the course of a literary career. Did a master of suspense change her style or syntax across half a century?

But what he found had startling implications: in Christie’s 73rd novel, Elephants Can Remember, the incidence of “indefinite words” like “anything,” “thing” and “nothing,” suddenly spiked. Meanwhile, the variety of words Christie used dropped by 20 percent. When Lancashire finally published his paper about his findings, he noted that the data supported a view that Agatha Christie had developed Alzheimer’s by the time she wrote her final book. In fact, she’d already lost a fifth of her vocabulary by the time she wrote her final novel.

While these textual sleuthing examples come from the world of academia, open source software promises to democratize the flow and release of information. That was the promise three years ago when Rackspace co-founded OpenStack, a framework and set of protocols that underlies the open source movement. As OSCON says on its website, “Once considered a radical upstart, open source has moved from disruption to default.” But it’s good to know that it’s still doing its share of disruption as well.

Dominic Smith is a writer and content strategist. Before joining Rackspace Marketing, he worked for many years as a technical writer and freelance copywriter, covering software, innovation and customer success stories for companies big and small, from startups to the Fortune 100. He also moonlights as a novelist and has taught writing at several universities, including Rice and the University of Texas at Austin.


  1. If not for “branding “, this book would have laid dormant, like so many others, languishing on book shelves everywhere . What this tells me is that the book itself failed to impress those who picked it up and they passed on it. Just how god is this author,”except” for the so called ” branding”. I have no interest in “wizards” , so I have never read a word of the “Potter” series and have no intention of doing so.

    • I agree. I read some of “The Casual Vacancy” and was completely unimpressed; which would be why I only read some of the book. I don’t think I would have read as much as I did were it not for the fact I had enjoyed Rowling ‘s Harry Potter books.

  2. I must be the third unread Rowling member of society. Face it boys and girls, there are billions that have NOT read Rowlings cowlings.

  3. it solved the mystery of how rowlings can blatantly rip of lord of the rings without any backlash and rake in billions.

    what is this wizardry.

    • Oh, please. Have you ever ready any fantasy other than LotR and Harry Potter? Are you even aware that other fantasy exists? Do you think that Tolkien invented wizards? Might as well say that Murder She Wrote is a ripoff of The Big Sleep because they both involve murder.

  4. This statement “For the three people left on the planet who haven’t read a Harry Potter book” would have made more sense with a book like “50 Shades of Gray”, but with Harry Potter, I never touched a book, just waited for and watched all the films.
    Don’t get me wrong, I am an avid reader, but as the first film came out, I wasn’t even aware of Harry Potter, so I was fine with just waiting for the films, as they evidently held true to the story line of the books.


Please enter your comment!
Please enter your name here