At the end of the day on Friday, Westlaw disclosed that they had found over 600 errors in the text of print reporters that they had published. In the email alerting customers to this discovery, they posted a list of all the affected cases. The cause of the mistake was “due to the introduction of an upgrade to our PDF conversion process in November 2014.”
Wait a minute…West is getting case law text by OCRing PDFs? That seems weird.
- Thought one: OCR is not a perfect science, and in law where language used is so important, I would have thought “the big guys” were doing something like sending the text overseas to be double keyed (aka retyped twice so discrepancies can be found) and then processed into online and print publications. I guess not. It would be nice if West took their pledge of transparency all the way and completely explained their process for getting law.
- Thought two: Why are they OCRing…are the courts sending them PDFs or are they relying upon scraping the court websites for case law because another publisher is responsible for that state’s legal publishing needs? My hunch was that the latter was the real reason. And if so that’s really bothersome because states just post slip opinions without any indication of latter changes. So I decided to see if I could prove that, but ended up finding something much more interesting.
I took the list of cases that had errors and started looking them up. I stuck to just the regional reporters because that’s where the state material is published and that’s my focus this year. The Northeastern Reporter mistakes were Illinois and Ohio. I had previously contacted the Ohio Reporter of Decisions to see how they distributed cases and was told that the publishers get the material from the website. But, in the name of due diligence, I decided to check to see who publishes the official Ohio case reporter. It turns out that it’s West, so there goes my theory about West relying upon scraping because another publisher has the contract to publish case law.
HERE’S THE FUN PART AND WHY I’M BLOGGING ABOUT THIS.
The Ohio Reporter of Decisions posted their contract with West on the website. (If for some reason this disappears from the web, I’ve downloaded a copy and will post it.) The legal publication process from states to publisher is sort of murky and here we have a bright shining light into the process. It’s very exciting! Some interesting bits:
Contrary to what I was told, it turns out that the court does send West copies of its decisions. Presumably the final copy, not the possibly error ridden slip opinions that the rest of us get on the web. It is unclear how they do this…just email PDFs or if there’s some special uploading queue that they use.
Why do they do this? Well, of course someone has to publish the decisions. But perhaps they chose West because it turns out that the Supreme Court gets a kick back in free books and greatly reduced Westlaw fees.
So, as a free law advocate, I see now that my ability to advocate to courts that they should make their cases freely available to all is greatly diminished by the fact that I can’t throw in free goods and services as part of the deal. All I have on my side is the idea that the law should be free to access by all citizens.