PACER Sucks

6771683657_d1751d4b11I like to think of myself as a fairly mellow chick.  I can discuss controversial topics with those whom I disagree violently with in a calm and rational manner.

And then someone mentions PACER and all bets are off.

I start talking about PACER and before I know it I turn into a wild-eyed ranting and raving hillbilly threatening to “break my foot off in the ass of whatever government contractor made that site and keeps it running.”

No, seriously, I accidentally went off on a PACER tangent once during a presentation and that’s pretty much an exact quote.  To an audience of Canadians, no less.

PACER was down most of last Friday, although it’s unclear at this writing if it was a DDoS or an internal glitch. Unfortunately it came back, so I thought this might be a good time to write about it and get it out of my system.  Because Good God, y’all, I really hate PACER.

PACER, by the way, is the Public Access to Court Electronic Records system, which manages the electronic filings for the United States Federal Court System (but not the US Supreme Court).  The PACER acronym is funny not just because it is shared with one of the most ridiculous cars to come out of Detroit, but because the words it represents have ended up being pretty much the exact opposite of what it actually is.

Oh, wait, that’s not funny.  That’s fucking tragic.

Lemme break it down for you…

PUBLIC. I always assumed that the Public in PACER was the same as the Public in Public Libraries and that all the information contained within it was free to use.  That’s my librarian bias showing, I guess, but I still think it’s valid interpretation.  However, it’s actually more like the Public in Public Records, which means everyone is free to look at them.

Having many civil servants in my family, I know government agencies aren’t rolling in cash and there’s an argument to be made that charging for the cost of copying and accessing files keeps individuals from abusing the system and wasting resources.  There’s a counter argument to made, of course, that the agencies should budget for these requests and punishing everyone that wants to access records created with their tax dollars instead of creating finite system to weed out waste is inherently unfair.

And then there’s PACER, which charges a $.10/page fee for merely looking at a document on a computer screen. Your computer screen.  Whether or not you decide to print it out on your paper on your copier using your toner.  To the tune of $150 million dollars in PROFIT a year.  The Administrative Office of Courts, the federal agency that runs PACER and the federal court system, tries to ameliorate this by not charging you if you stay under $15 of charges a quarter and limiting per-document costs to $3.  (I’ll have a little more to say about this in the next section – ACCESS)  Actual court decision viewing is free, and there’s CourtWeb, which has some cases that some judges decide to put on there without any clear indication of the holdings and level content.  There’s also apparently a way to get your entire PACER fees waived (appearing in forma pauperis doesn’t guarantee it, interestingly) but I gave up on trying to find the procedure for that after about a half hour of looking.

A few years ago there  was a pilot project to make PACER free to use in government depository libraries, but then Aaron Schwartz and Carl Malamud took them up on the offer and it was quickly closed down.  Around this same time RECAP was developed.  RECAP is a browser add-on that harvests the materials you look at on PACER and deposits them in the Internet Archive for all to use freely. (Of course, some courts make using RECAP a violation of their Terms of Service….) I applaud the efforts of the RECAP developers and contributors, although we must be realistic in that it will take a huge buy in from the public to create a usable database and the fact that all it does is remove the cost aspect of PACER.  Which brings me to….

ACCESS.  Despite my government software induced outbursts, I am a well educated, comfortably middle class person. I am very comfortable using computers. One of my graduate degrees is actually IN managing information.  I’m also not under the stress of litigation in the federal court system.  Even still, with all that in my favor, I found the process of registering for a PACER account incredibly hard.

But there’s a larger problem.  Access to information isn’t really access if it’s not meaningful access.  The search functions on PACER are horrible. Incredibly useless.   I mean, look at the ADVANCED search options…

I can haz keyword search?

I can haz keyword search?

 

There is a box for “nature of the suit” but all that does is limit your results to types of case with a taxonomy tag of which there’s no obvious definition.

Print research is different than electronic research.  In print research, you are looking FOR something.  You go to an index or table of contents and are directed to areas of information in the info container (probably a book or series of books).  A human being somewhere along the way has determined that these are the areas that have information actually about that subject.  Not every single mention of it, but actually about it. Even if the specific word that you are using for the subject isn’t used.  In a discipline like law which hinges on the use of a language that has changed significantly in the 200 years of it’s existence, this human editorial assistance is hugely important.

In electronic research, you are not trying to find something.  You are trying to eliminate possibilities through search limiters.  Back in my legal research professor days, I would have students trying out Lexis or Westlaw for the first time and come to me excited because they found so many hits to their search.  I only felt slightly bad about crushing their hopes and dreams.  In electronic research, less is more.  Especially when you have to pay a fee to look at each document to determine if it’s actually relevant.  Which is why it’s incredibly unfair to have such a terrible and useless search interface for researchers to use.

(On a tangent, this is why I think an open legal taxonomy is needed and why full text searching of case law is currently not sufficient, although my colleague Elmer Masters has shown me some impressive “predictive cataloging” of case law using SOLR.  One day it may work.  Not now, though.)

But here’s the thing.  PACER isn’t a research database.  It’s not America’s CANLii, as much as we would wish it to be. It’s a PDF dumping ground for papers filed in federal courts.  Every motion, brief, filing and ruling of every case in the federal courts, 99% of which is useless to the public at large.  And at that, it excels. Which brings me to…

COURT ELECTRONIC RECORDS.  PDFs aren’t electronic records, y’all.

Say it with me:  PDFs aren’t electronic records.

PDFs are a print artifact. They are pieces of paper that live in your computer.  Why does this matter?  Because of a little thing called <drumroll> METADATA.  You know it, you love it, you may not realize it, but it makes the world go ’round.  Or not, in the case of PACER.   When you are searching a commercial database and want to limit your search to the syllabus of the decisions, it’s not magic that it does that or the computer “just knows” to do it.  We haven’t reached the singularity….yet.  It know what part of the decision is the syllabus because someone attached some metadata to it saying “HEY. HEY YOU THERE. THIS PART? THIS IS THE SYLLABUS OF THE DECISION.”

That’s not what the code actually looks like, by the way.   You know.  Just in case you were wondering…

There are some hardy souls out there who are taking these PDFs and using something called Optical Character Recognition (or OCR) to put them in a truly electronic format.  This will allow for metadata to be added to improve the searching capabilities of these materials.  It’s a band-aid fix for something that needs welding.  Which is not to say that they should stop, but we need to also recognize that OCRing can lead to inaccurate replication of text and again, in a discipline as language specific as law, this is ultimately unacceptable.

I know legal information and research isn’t sexy.  Currently most of the oxygen in the legal technology space is being consumed by people that want to “disrupt” it.  I would love to disrupt the legal information part of legal technology, but before we can even begin to get to that point, we need to fix it.  The inadequacies of PACER make it clear that the government is unable or unwilling to create usable legal information. Our only hope is that the courts realize this fact and make the information freely available, though projects like Court Cloud,  so that others can take a stab at it.

Photo Credit: thesaltr via Compfight cc

6 Comments on PACER Sucks

  1. Kyle Mitchell
    January 26, 2014 at 10:54 pm (6 months ago)

    The situation with PDF is bad, and not just for legal work; the same malady afflicts oodles of critical financial data, such as municipal financial records. Marc Joffe recently coordinated a “PDF Hackathon” in association with Sunlight Labs animated by these latter frustrations.

    (Here comes a big chunk of text. It’s technical, but I take it this is of interest.)

    That being said, optical character recognition isn’t the state of the art in structured data extraction from all PDF documents. Certain documents, called “tagged” documents in the standard and Adobe’s software, contain embedded textual information, and can even embed structure. You’ve probably noticed that some PDFs have selectable text that can be copied and highlighted; these are “tagged”. Others, at least those created from scans alone, are literally just a series of (hopefully) compressed raster graphics superimposed upon successive, otherwise blank digital pages. As you say, just print artifacts.

    The advantage of tagged PDF text data is that the textual information is often produced by a program that had a structured, or least accurate textual, source copy of the document, as typed by the author. Word, for instance, has not only a character-by-character copy, but the paragraph structure as well. While this information can be stored in the PDF in Battleship board game fashion—each character with a set of coordinates on the page—those coordinates are still far more regular and predictable than OCR (text blocks aren’t distorted or skewed by less-than-flat placement on a scanner), and don’t have the error rate that OCR’d characters do. (If you apply OCR in Acrobat, it imposes its guesses in Battleship form, but all over the place, and the guesses are often wrong. Recognizing indentations, centered lines, and even where words begin and end is a non-trivial software problem with all these inaccuracies.)

    Here’s the kicker: If the PDF-producing application is wise to tagging, it will also embed tags that make guessing where lines break and headings begin unnecessary. Essentially, it will embed HTML-like content markup in the PDF. This is not a kludge; this is what they had in mind when tags were rolled into PDF.

    So PDFs run the gamut from digital slideshow (and more like the analog kind) to containers for highly structured data that can also be printed in a canonical way. Non-tagged garbage in, garbage data out, maybe. Good PDF in, good data, well, already there. The latter has been a big help for, say, Section 508 compliance in the Executive.

    As with all judiciary IT issues that I’m familiar with, practices aren’t uniform. The Seventh Circuit’s slip opinions are attractive, tagged PDFs, as are the Supreme Court’s, though they don’t all embed structure. Some courts are “scanning to PDF”. Some practitioners are doing likewise. Others are writing briefs in LaTeX or using modern PDF printers or conversion tools that preserve structure. The CM/ECF transition to PDF/A standardizes on a format that at least supports good practices, though it doesn’t require them.

    Here’s a heretical thought:standardize on Word OpenXML (ECMA-376) instead. Problem solved?

  2. Bunnie Watson
    January 27, 2014 at 2:19 pm (6 months ago)

    Wow, what a whinefest. As a law librarian who has been in the biz for 30 years, I can testify to how much better PACER is today than when it started in the mid 1990s. That was before broadband and PDFs, so all you got for your frustrating modem connection(s) was a docket sheet. Then you still had to 1) call the surly clerk’s office in the court jurisdiction; 2) find out how many pages were in your document and how much they charged per page to photocopy it; 3) send your check to them via snail mail; and 4) wait for them to mail it back to you. Rinse and repeat. Plus they started with bankruptcy courts, not district or appeals circuits.

    PACER absolutely has a long way to go to be truly electronic and searchable. Lexis and West recognized that, and thus we have CourtLink and Court Express with full-text access. They also charge more than the 10 cents per page for the data they scrape from PACER, to cover the cost of massaging and digitizing no doubt. If I know what I am looking for – party names are a must, case file number and jurisdiction are nice-to-haves – PACER is my preferred go-to source.

    I agree that government information we’ve already paid for once should be free. Since when has the government really cared about the information needs of American citizens? Talk to a federal depository librarian at your local state or university library and you’ll get an earful comparable to this post. In this Congressional era of making every goverment function pay for itself (unemployment benefits, anyone?) it is not likely PACER will be generously funded and go back to being free. Congress won’t even approve and fund judges. I’m thankful we’ve gotten this far in twenty years. But don’t get ME started about state courts records…

  3. Joanne Colvin
    February 2, 2014 at 6:57 pm (6 months ago)

    You mention the Nature of Suit (NOS) field in the advanced search interface. I’d be very skeptical of any search results that rely on that field. A few years ago I was looking for cases on a type of anti-trust litigation, and I was happy to see that there was an entry in the NOS taxonomy that fit the bill. My search results were so bizarre, though, that I called the Administrative Office of the Courts and talked to someone who worked on the PACER program. I was told that the NOS designation for a case was assigned by someone in the clerk’s office, not by the judge, and that there is no common set of standards for all of the courts. As a result of the arbitrary assignment of NOS codes, my search results had a lot of irrelevant hits. More importantly, the results did not include many of the cases that I knew were directly on point. PACER is definitely useful only when looking for known items.

  4. Courtney Minick
    February 4, 2014 at 6:52 pm (6 months ago)

    My biggest gripe about PACER is that the government treats it like a free, publicly accessible repository for court opinions. It is absolutely NOT that.

    First, you have to pay to SEARCH. That is ridiculous, straight up.

    Second, you can access some opinions for free, once you pay to search for them. That is, IF they have been properly marked as an opinion of public interest by the court or judge. Some judges mark everything, some mark nothing.

    Third, the opinions inside of PACER are slip opinions. They are not official, and they can’t be cited. These opinions are the ones going into FDSys, by the way.

    Yeah, this is whiny, but to be honest I’m tired of hearing about the poor government and how they’re underfunded and we should be happy with what we get – especially when they are running a huge surplus on this program, they purposely limit access when that surplus is threatened (see thumbdrive corps) and they give all the opinions away to a private publisher who charges us to read the law we paid for.

    One last thing- if you want free search tool for PACER, use Justia Dockets. We crawl PACER and post most of the dockets. If it’s an important case, we pay to download and feature the documents, if it’s not, you’ll need to use your own PACER account to pull them. But at least it’s easier to search (and doesn’t cost anything).

  5. Michael Sander
    May 15, 2014 at 5:16 pm (2 months ago)

    PACER definitely has its drawbacks. Another problem is that the search functionality is very limited, you can basically only search by party name. If you want to search by judge, or a particular procedural issue, you’re out of luck.

    That’s part of the reason why I built Docket Alarm. It fills in many of the features that PACER is lacking, namely full text search and alert functions. You can also save money using it, as you can download all documents we have in our database without paying a PACER fee. Take a look at http://www.docketalarm.com.

Leave a Reply