A series of interviews on

the mechanics of business and real-world applications of machine intelligence

CTL is a series of interviews with executives at the largest global businesses. We go behind the scenes, understand how these businesses operate, and explore the most interesting applications of AI.

We have a strict no-buzzwords, no hand-waving policy. It's about near-term tactical ROI, and how we get there.

Also posting on: LinkedInTwitter
The legal justice argument for GenAI to exist in law, and a tour of the existential disruption it's introduced
With Maura Grossman

Key Takeaways

What happens to evidence in the world of deepfakes? Does the NY Times have a reasonable complaint about OpenAI training on their content? How do you prevent someone maliciously generating cases at scale? Maura practiced law for 17 years, was a pioneer in eDiscovery, is now a professor of computer science, and is incredibly well-read and thoughtful on all these big questions.

Topics Covered

  • eDiscovery is a useful precedent for how the legal industry deals with new technology. You pioneered that effort; what can we learn from your experience?
    • In litigation you exchange evidence, and in the 2000s this went from a couple of banker boxes to millions of emails and text messages
    • The question is how you find those 20 or 60 emails that shed some light on the dispute; eDiscovery is that process
    • Prior to the introduction of technology, this was a manual process; you had 20 boxes of documents and put green stickies on relevant documents and red on irrelevant
    • In the digital age, you then had a search engine, so you try to come up with keywords to search for things, but the issue was people would speak in code
    • In big cases like the World Trade Center litigation I worked on, this was just infeasible; 3 attorneys and 20 million documents, it just didn't work
    • "Technology created this problem; technology must be able to solve it"
    • I came across Gordon Cormack who was a leading expert in spam detection; we realized this technology could be applied in discovery, tried it, and it worked much better than the keyword searches
    • Published in 2011 in the Richmond Journal of Law & Technology the results of our formal experiment that demonstrated this performance, which the courts could then rely on to say this tech has passed scientific muster
  • Why do you need to disclose that you use eDiscovery tools, and not e.g. Google search? Where is that line drawn?
    • Rule 26G in federal court requires the attorney for party producing material to certify they've done a reasonable search
    • Judges don't get involved unless you can't agree on the search terms, or the search doesn't turn up documents you know should come up
    • If earlier in the process you say your plan is to throw all the documents down the stairs and the ones that are face up you produce and face down you don't; that's where opposing counsel says that's absurd and challenges it
    • I've been called in as an expert in cases where someone has proposed to do something using technology that is so facially inadequate
    • The courts hate these cases because it forces them to decide in the abstract if the technology is reasonable
  • Does this apply to corporate disputes as well? Can someone argue a contact isn't valid because of the tools used to prepare it?
    • It can come up about the validity of the evidence; for example you may have a dispute about which was the final version of the contract that was agreed upon
  • Is there a standard test by which people evaluate tools to determine if they should be approved or not?
    • In federal court, for someone to be considered an expert, Federal Rule of Evidence 702 says you have to show expertise in that area either by training or experience; this also applies when judges decide to admit evidence (is it reliable)
    • Radar runs, DUI checks, etc. all needed to pass this test
    • With discovery, we saw this technology as necessary to be evaluated like any other, so if it was evaluated under rule 702, it would pass muster
  • Why was the case about ChatGPT-created citations actually worse than most perceive?
    • Case in Southern District of New York; six citations submitted didn't actually exist
    • Normally as an attorney you're expected to read the cases you cite to make sure they say what you think they say; in the case the attorney did not and just assumed they were proper
    • The opposing counsel said we can't find these cases, so they motion to the court to ask the lawyer to produce the cases; but then the lawyer who submitted this goes back to ChatGPT and asks it to find these cases; it generates them; the lawyer prints them out and submits them to the court
  • How did you then see judges respond to this event?
    • Courts got alarmed; 6 courts in US and Canada issued standing orders to add language to effect of "if you use AI, you must disclose it"
    • It wasn't clear then what was in scope and what was out of scope; they're all inconsistent
    • A judge, a computer scientist, and I then wrote an article saying "this isn't the best way to approach this problem"...
      • 1: if you're going to make a rule, it should be consistent across all courts
      • 2: maybe you ought to consult with some people who actually know the difference between AI and generative AI
      • 3: by requiring disclosure, you're getting into legal strategy, which starts to invade on material laywers consider secret or protected
      • 4: you're going to discourage people who can't afford lawyers being able to use these tools
  • Where will the line be drawn between disallow entirely or just disclose?
    • There is something called rule 11 that requires you certify you have a legal and factual basis to assert what I am asserting, so this issue of generated citations is already covered under rule 11
    • We also have ethics rules we are required to follow as members of the Bar
      • Rule 1.1: I have to be competent in my representation of you, and that includes technologically competant
      • #2: I have an obligation to be candid with the court; citing fake cases or facts means I am not competant
    • The careers of the people that did this are "in the tank", but there won't be more than a handful of these because of the consequences, so let's give people notice, but this requirement of disclosure and certification to us seemed like overkill
    • But if you are going to do it, make sure it's very clear and consistent
  • It sounds like you're saying people should not be barred from using these genAI tools?
    • Yes, but it depends on how you use it and for what...
    • You could use it to find a bunch of cases of opinions from "Judge Andrew" and from those determine what arguments are most appeling
    • Here's my brief; make it more concise
    • These are all acceptable uses
    • "The world is completely different for the 80 percent of people who can't afford lawyers"
    • Are they better off using a tool that can generate a complaint or pleading/filing in the proper form, even if it's not perfect? At least they can pursue their rights.
    • "I'd like a complaint against Andrew, etc etc" and GPT can do this; we call this "access to justice"; this is something that will increase the ability of people to bring lawsuits or defend lawsuits that they couldn't before
    • This has the potential to massively increase the number of cases; how do you think about this future world?
    • It's a double edged sword; I can draw 50 complaints and file in every state; it can compound mischief or the vexatious as well as the legitimate lawsuits
    • I don't know what this does unless we have tools that can start to screen legitimate cases, to determine if a human should be allowed to read this
  • Interesting to think about how then do you start to litigate around the models that make those screening decisions?
  • In the genAI curve of acceptance, where are we? are a lot of other people saying "look we really need to allow people access to these tools?" Is that likely to happen in the next 24 months
    • It's all a cost benefit calculation; it's frightening that so many people can't exercise their rights legitimately because they can't afford lawyers
    • Even if all of us did pro bono work, there still wouldn't be enough capacity
    • The technology will get better as well and that will drive adoption; the same thing happened with spam filters
    • When we get used to AI, we just call it software
  • **Let's go to the topic of evidence and deepfakes"
    • Deepfakes and ability to defraud people is increasing exponentially; you had enough of my voice in the first minute to create a tape and call my bank with my voice and say "I'd like to withdraw $20k"
    • We're moving into this world where our eyes and ears are no longer going to be terribly good at accessing evidence, and deciding who's guilty and who's not
    • There was a recent experiment where a bunch of students were playing a gambling game, but after the game the participants were shown a deepfaked videotape of Andrew cheating
    • Afterwards they asked the participants "are you willing to sign an affidavit to swear you saw Andrew cheat"; more than half of them said yes
  • How do we deal with this?
    • Disclosure doesn't work because bad actors won't comply and watermarking doesn't work because it can be manipulated
    • There are tools that can distinguish between what's written by bots and humans based on the grammar, but it turns out non-native English speakers get tagged as bots more often; there are a lot of technical challenges
  • It doesn't seem like we have a good solution. What will the legal profession look like in this new world?
    • There will be an ongoing arms race between generators and detectors
    • Ultimately we will just be living in a "trust but verify" world
    • Now you need to do a lot more work on each case, and the amount of cases will increase massively, so the problem only gets larger
  • Do publishers have any recourse for models being trained on their data?
    • Fair use -- students shouldn't need to buy a book just to read a couple paragraphs, and if you go look at all Van Goghs in museums and then try to paint one in that style, that's all considered fair use
    • What's more problematic is what happened in Warhol case; if I create something based on your work and mine now competes in the same market as yours, that's not fair
    • Courts will need to decide what is and isn't fair use; there are technical arguments; there are legal arguments; courts will need to sort through this
    • The counter argument to licensing demands from the NYT is: well you posted material on the public internet, if you didn't want it to be seen, don't put it there; but it's not clear all the things on which models were trained were on the open internet