SD Occasions Open-Supply Undertaking of the Week: AvroTensorDataset


Earlier this week, LinkedIn introduced it was open-sourcing AvroTensorDataset, which is a “TensorFlow dataset for studying, parsing, and processing Avro information.” Apache Avro is the first storage format that LinkedIn makes use of for its coaching information. 

In keeping with LinkedIn, it was experiencing bottlenecks in its machine studying workloads that have been brought on by the necessity to learn a number of terabytes of enter information. AvroTensorDataset can pace up preprocessing of knowledge by a number of orders of magnitude, in response to the corporate.

The instrument was constructed internally at LinkedIn, and it needed to open-source the undertaking in order that others may expertise the massive efficiency boosts to coaching workloads. It has been in manufacturing for over a yr already at LinkedIn. 

LinkedIn says that with this instrument it has been capable of enhance processing pace by 162x in comparison with current options and has decreased total coaching time by 66%

“ATDSDataset is LinkedIn’s answer to effectively learn Avro information into TensorFlow. Via a number of efficiency enhancements, we have been capable of pace up I/O throughput by orders of magnitude over current Avro reader options. Our staff at LinkedIn labored intently with the TensorFlow I/O group to open-source this characteristic, and we hope that by open-sourcing it, the TensorFlow group may profit from these efficiency enhancements,” Jonathan Hung, workers software program engineer at LinkedIn, wrote in a weblog publish

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles