An idea for collecting vehicle sensor and other information and using machine learning on AWS to predict and diagnose vehicle problems before they become serious.
Introduction
In recent years, the "connected car" has gone from being something that was conceptual to something that existed and occasionally encountered to something that is now an option in many cars. Even for cars that were manufactured without connectivity in mind, devices are available that connect to the vehicle's diagnostic port and can transmit real-time data on the vehicle's operations (something I've done before). They work with almost any car made in the past 25 years. On an individual vehicle, this information is interesting at most but not something with which to make widely applicable inferences. Across a fleet of vehicles, this information may have predictive value and is useful in making decisions. Across much larger collections of similar vehicles, there is potential for rich inferences to be made. Vehicle maintenance is usually based on recommendations and general rules. With additional data, these cycles could be adjusted informed by the actual state of the vehicle; scheduled earlier for vehicles that show higher failure risks for particular systems or delay of indicators show that maintenance on a system would be premature. Where possible practices could be implemented to contribute to longer usage of some systems within the car.
The desired outcome of this system is ultimately to save resources. Resources include fuel, repair parts, and the efforts involved in diagnosing problems that occur. It could also be useful in estimating the future costs of maintenance of a vehicle and making a decision on whether it is more advantageous to keep or replace a vehicle. The monetary cost would be one of the cost functions that we could consider when comparing resource costs.
Why are the Miles Driven Alone Not a Sufficient Metric?
The miles driven is an important metric, but it isn't necessarily a sufficient one. Let's look at a specific system within the vehicle; the starter. The starter is an electric motor that initiates movement of the parts of the engine so that it can begin to generate its own power. Once the engine is running, the starter is not doing anything. An aspect of the wear-and-tear that the starter experiences is from the number of times that a car has been started. Imagine two identical vehicles, one that normally takes 50-mile trips and another that normally takes 5-mile trips. For 1,000 miles of usage, the vehicle that is taking 50 miles trips has used its starter 20 times while the vehicle that takes 5-mile trips has used its starter 200 times. Because vehicles are complex machines, there are many other ways in which two vehicles that have similar use according to one metric may have differences in how much a component is stressed or worn.
What Metrics May be Worth Considering?
There are a number of conditions external to a vehicle that may have impact on failures that we may want to record. Temperature, road conditions, and exposure to elements such as rain or being close to salty bodies of water can affect various systems over a long period of time. Saving these external elements alongside operation data may be helpful. We don't know all of the factors that have an effect on some part. We will let the ML discover the relationships that exists.
For internal conditions, a car already has a number of sensors that will be useful. Some information would need to be manually captured such as failures in a system not monitored by the car's computer (tires, suspension, so on). There are also some problem conditions that are not failures (the car can still be driven) but require addressing before they become failures and to address inefficiencies that the problem conditions bring (consuming more fuel than normal, difference in the vehicle emissions, higher operation temperatures).
The vehicle's sensors provide the measurement through which we know the status of various parts of the car. But a sensor can fail and give bad information too. We want to be able to recognize when this happens. Various sensors may have a certain relationship with each other and sensor failure may be detectable if measurements are taken that are inconsistent. For example, if the fuel level sensor begins providing readings that are increasing significantly while the engine rpm readings and speed are non-zero, then there may be a problem with the fuel level sensor.
Rather than throwing only raw sensor data at an ML algorithm, there are some other features that we might want to extract or calculate. There may also be data that for some purposes are noise. There’s a balancing act of wanting to have data that is high resolution enough to make inferences while also wanting to be able to make those inferences when possible with a small data set. I lean towards capturing data at a resolution that may be higher than needed and reducing it before using it. Low resolution data can be produced from high resolution data. If a resolution selection is made and later, we discover that the resolution isn't sufficient we still have the high-resolution data from which to make a different selection. Since it may be necessary to collect data over a period of time before there is enough to sufficiently train a ML agent, I wouldn’t want to discover after collecting data over a long period of time that the resolution isn’t high enough. It is easier to discard unneeded data than recover from not having enough. Especially when a significant amount of time is needed to collect that data.
There may also be new information that can be calculated from recorded data through averages, sums, derivatives, and standard deviations. Rolling averages over some sensors could also be useful in cleaning some noise out of the data.
Data Labels
Our data needs to be labeled to be useful. For our initial labels, we'll need to use recommendations on what would be normal readings from some of the sensors. For many systems, "working" and "failed" are not the only suitable labels. There are also systems that could be working with degraded performance (such as a battery that doesn't hold as much of a charge). Many vehicles will also label sensor readings as a problem and save them internally when a problem is detected; when the check engine light turns on the readings from the sensors are stored alongside a problem code.
Additional Data
I haven't seen these in a vehicle before, but I think that vibration sensors could also provide useful information. Sometimes, a developing vehicle problem is identified by changes in the sounds that a vehicle makes (sounds themselves being vibrations). I believe that with a few well-placed vibration sensors, we could learn to associate some types of vibrations with some types of problems. Some of the anticipated detectable problems include detecting the warping of brake discs and imbalance of the tires. I believe that problems with other rotating parts could be detectable too. This is a belief that would need to be validated by collecting vibration data that could be correlated to other sensor and performance data. An example of a real-world use of vibration sensors to diagnose a problem, SpaceEx was able to use information recorded from a vibration sensor to detect the cause of a failed launch; a part provided by a third party failed during launch. This information helped them narrow down the cause of the failure and further testing showed that the parts from the specific supplier in question were not meeting the required specification.
Further Collection and Usage of the Data
Once a useful ML is trained sufficiently to be useful, one would want to continue to train it to both improve accuracy and adjust for changes in vehicle behaviours over time. One way to accomplish this is to allow users to opt into data for their engine to be collected. A benefit that the user could get from this is diagnostic information at no additional cost. Even for those that don't feel comfortable sharing their engine data, it would be possible to produce a reduced form of the trained AI in an application so that their phone could process and interpret the data for them.
History
- 1st November, 2019: Initial version