Closing the Loop: Inspecting mega-scale infrastructure

Inspecting mega-scale infrastructure

With Sam Tukra, Senior ML Researcher at Shell

Key Takeaways

A pipeline going down even 1% of the time means tens of millions of dollars in losses, and serious health and safety concerns. And just one refinery can be 1000s of acres in size, so the scale over which inspection must happen is enormous. The availability of data to build models to run these inspections is a key bottleneck. New approaches in the areas of self-supervised learning show promise in solving this problem.

Topics Covered

Breaking down Shell's business & key concerns
- Shell's business can be broken into three streams: upstream (finding new sources of energy), midstream (transport from point A to point B), and downstream (conversion of products into others, e.g. oil into gasoline for cars)
- Shell's goal is to achieve net zero carbon emissions by 2050, which means all hands must be on deck to make all of these processes as efficient as possible; that's the role of digital transformation today
- All issues come down to corrosion; all equipment degrades over time and you need to predict failures before they happen, for reasons of both business and health & safety
- Each region of Shell has its own machine learning teams, but based on where those teams sit there might be a particular focus, e.g. in New York you have more sales/business and in Texas you have a much deeper operational focus
The role of regulation
- The role of regulation is significant; every component of every piece of infrastructure down to the valves has a set inspection schedule, and it's very costly for business to be halted because of failed regulatory compliance
- Inspection standards are tracked in a central system according to the region-specific regulatory requirements, but there's also an internal inspection schedule to stay ahead of any issues
Historical framing / what progress has already been made
- We now have sensor data on pipes and values and cameras at every location, but it's not like the problem has been solved; we just have higher and higher standards/thresholds and now a huge amount more data available to solve problems
- The accuracy thresholds are on the order of 99.6%, so all our efforts go into 1) "how do we get that extra 0.5% or 1% accuracy", or 2) if we can't get that extra 1%, can we create something to better guide the work to prevent that failure in the future
The forefront of research in CV for inspections
- Self-supervised learning
  - The bleeding edge of research in the industry comes in the world of self-supervised learning
  - "How do we create a method that continuously organically allows our models to evolve over time without us having to do this data collection and redeploying?"
  - The concept is very similar to how the human mind works; you aren't starting from scratch on every new problem
  - In the industrial case, we teach the model to reconstruct cracks from a set of images; in doing this, the model learns a generalized representation of a crack
  - When you fine-tune this model for a specific task, you get much better performance; this is called "masked autoencoders"
- Catastrophic forgetting
  - The other key challenge in transferring knowledge is "catastrophic forgetting"; from task 1 to task 2, knowledge is overwritten; when humans are learning to ride a bike, then drive a car, we don't then forget how to ride a bike
- AI 2.0 will be the creation of specialized mini-models for specific tasks, but with far less problem-specific data
  - What's key here is a model that passes knowledge in both directions; it should be able to update the generalized knowledge as it updates the fine-tuned versions
  - This area of work around self-supervised learning and catastrophic forgetting is a very active new area of work; it's far from solved
  - The goal is to get the model to a point where it can decide if it wants to learn from a new piece of information or not
Commercial patterns that may emerge to help companies without in-house data science teams
- What do small companies do in this situation? Shell has such a data advantage; do the small companies form a collective, or what's their response?
- Either you have an "inspections as a service" model available as a commercial service, or you have a common model that is very efficient about being fine-tuned to specific situations
- Is it too simplistic a view to think "inspection as a service" APIs can work? At some level will you always need an internal team to build these systems?
  - It's not inconcievable to say both will exist; the answer is likely somewhere in the middle
The promise of multi-modal models
- NLP has changed the game for computer vision; you no longer need those strict rules of is this class a, b, or c; you can probe the model's latent understanding
- Even though diffusion models haven't been trained on specifically cracks, if you ask it to generate a crack, it can do that
Key limitations that prevent models from being deployed today
- The biggest issue is typically graduating from POC to production; your work needs to be packaged in a way that's usable and then deployed
- Some of the refineries don't even have internet access; there's no model updates and there's no cloud
- Even in Central London you have bad connectivity; and you can't afford to have predictions from the model not happen; there's such a liability with downtime and you need to minimize latency as much as possible
- How well-solved is this deployment problem?
  - It's an ongoing this we constantly experiment with and we have teams that deal with this
- How do you deal with retraining the model post-deployment?
  - This is where self-supervised learning shines; no new active labels are needed, so no need to deal with retraining post-deployment

A series of interviews on

the mechanics of business and real-world applications of machine intelligence

Key Takeaways

Topics Covered