Behind the scenes at locations around the world the auto makers are running tests on autonomous cars for literally thousands of hours. The industry has poured more than $80 billion into R&D on autonomous cars over the last four years, so they are serious about making this happen.
Those of us working on these tests have one overwhelming challenge: how to manage all the data that gets generated during the tests. One eight-hour shift can create more than 100 terabytes of data. In a week of testing multiple cars, we’re talking about petabytes of data. And often — at rural testing centers, for example — Internet bandwidth speeds are simply insufficient to ensure that the data reaches our data centers in North America, Europe and Asia at the end of the test day.
Right now, we have two main ways to transport data back to a data center. They are both cumbersome, but have different plusses and minuses. Until advances in technology make these challenges easier to manage, here’s what we do today:
- Connect the car to the data center. Test cars generate about 28 terabytes of data in an hour and it takes 30 to 60 minutes to offload that data by sending it to the data center over a fiber optic connection. While this is a time-consuming option, it remains viable in cases where the data gets processed in somewhat smaller increments.
- Take/ship the media to a special station. In many situations the data loads are too large and the fiber connections unavailable (e.g., at geographically remote test locations such as deserts, ice lakes and rural areas) to upload data directly from the car to the data center. In these cases we remove a plug-in-disk from the car and take it or ship it to a “Smart Ingest Station” where the data is uploaded to a central data lake. Because it only takes a couple of minutes to swap out the disks, the car stays available for testing. The downside of this option is we need to have several sets of disks, so compared to Option 1 we are buying time by spending money.
In three to five years we may get to the point where both options are outmoded by advances in technology that make it possible for the computers in the car to run analysis and select the needed data. If the test car could isolate the test-car video on, for example, right-hand turns at a stop light, the need to send terabytes of data back to the main data center would be alleviated and the testers could send these smaller data sets over the Internet.
Of course, we’re several years away from having such a capability. In the past year, IBM and Sony have been working on a 330 terabyte tape drive that promises faster and more resilient data storage in a form factor that can fit in the palm of your hand. Once such products are commercialized, it should make our lives a bit easier.
Ultimately, we’d like the ability to move our various equipment easily in and out of hotel rooms and carry it on plane trips in our pockets or briefcases. Today, the equipment is often clunky and hard to move around. While technology can help, we have to be realistic and understand the data challenges surrounding autonomous cars are likely to increase exponentially. The challenges may grow, but at least sometime soon the gear we use won’t be so cumbersome that our muscles ache at the end of the day.