//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>
Lightelligence developed a photonic interposer know-how that may join cores on an digital application-specific IC (ASIC) in arbitrary, on-chip community topologies, together with full mesh or toroidal configurations, providing efficiency advantages and less complicated software program versus nearest-neighbor configurations, Mo Steinman, VP of engineering at Lightelligence, informed EE Occasions.
The corporate developed its personal 64-core AI inference accelerator ASIC, with cores related in a topology that permits all-to-all broadcast through the corporate’s optical network-on-chip (oNoC) interposer know-how, assembled in a system-in-package (SiP) it calls Hummingbird. This affords latency and energy effectivity advantages, Steinman mentioned, declining to disclose efficiency figures or benchmarks.

“This can be a software we will use for addressing interconnect challenges at favorable density and energy traits, but in addition to simplify software program growth, it’s actually to keep away from the problem of the scheduling drawback,” he added.
Steinman described the scheduling drawback that arises in many-core or chiplet designs the place every core or chiplet can solely talk with its nearest neighbors.

“If I’ve to leap one, two, three or 4 [cores or chiplets] away, {the electrical} interface energy traits and capabilities begin to turn out to be a problem,” Steinman mentioned. “However for optics, the definition of what’s brief attain and what’s lengthy attain are very totally different than for electronics… Even at, say, wafer scale, the attenuation [for photonics] may be very manageable… The facility and latency are pretty unbiased of that distance.”
Topologies like toroidal configurations are difficult to realize with electrical interconnects.
“[With our oNoC technology] there isn’t essentially a predisposed recipe to the kinds of topologies we will entertain,” he mentioned. “So it’s a strong software that we will use to work with companions to unravel their connectivity issues which can be distinctive—not mapped to a preconceived topology—there’s a variety of flexibility there.”

All-to-all broadcast
Lightelligence’s Hummingbird is a SiP combining its 64-core AI inference accelerator ASIC with the optical network-on-chip interposer. That is the primary particular implementation of Lightelligence’s oNoC know-how, which carries knowledge, encoded onto mild, in an all-to-all broadcast mode between the 64 cores.
“For convolution, which is an enormous a part of AI, that permits us to do a really fascinating mathematical operate the place every core is doing a bit of the work after which concurrently blasting it to each different core each clock cycle,” Steinman mentioned.

Hummingbird’s accelerator is a SIMD (single instruction, a number of knowledge) machine with a “pretty easy” proprietary instruction set, he mentioned. Every of the similar cores has SRAM and compute for scalar and vector operations, plus embedded transmitter and receiver circuitry that converts between {the electrical} and optical domains.
There’s an analog interface on the ASIC that’s coupled to the photonic interposer. When mild from a laser mounted on the interposer passes it, the circuitry on the ASIC alters the refractive index of the silicon waveguide beneath to modulate the sunshine passing by (full darkness isn’t required for a zero, it simply needs to be modulated sufficiently to tell apart it from a one).
On the different finish, there’s a receiver photodiode that converts incoming mild pulses into {an electrical} present. This present is amplified and analog circuitry does threshold detection to transform the sign right into a bitstream. Options like error correction code (ECC), framing, encoding and extra may be layered on prime, Steinman mentioned.
The analog circuitry on the digital die may be calibrated to account for course of variabilities.
“[Refractive index] will fluctuate from die-to-die and transmitter-to-transmitter, so our digital circuitry is ready to regulate to these traits,” he mentioned. “One of many issues we do early within the powerup sequence is to calibrate the design—run identified patterns by means of it and see what the circuit response is—so we will regulate knobs on the analog facet.”
Whereas Lightelligence makes use of an optical NoC on its PACE optical compute product, the know-how on Hummingbird is kind of totally different, Steinman mentioned.
“There’s just a little little bit of IP reuse, however it’s just a little totally different due to the kind of communication—that is excessive velocity digital versus PACE the place there’s an analog computation, it’s not simply ones and zeroes,” he mentioned.
Proof factors

Hummingbird is accessible on a PCIe card. Constructing an entire system full with an AI software program stack was essential to work out all of the kinks, in accordance with Steinman.
“Our perception is that if we’re going to develop some new form of interconnect, there’s sure to be implications at each degree,” he mentioned. “In a pc system there’s digital design, in our case we even have analog and photonic design, there’s packaging, there’s system design, there’s software program implications, and every part has some form of implication or second- or third-order impact.”
One factor Lightelligence discovered was that they wanted one other interposer layer—a laminate interposer—between the digital and photonic dies to ship energy to the digital chip. Future generations of the know-how will allow direct connections between the 2 dies.

“3D applied sciences are leading edge, and we didn’t wish to look ahead to the complete enablement of that to carry this out,” Steinman mentioned. “We felt this was the best way we might do a primary implementation that can solely get higher when we now have the 3D stacking, once we can remove [the laminate interposer] layer.”
Lightelligence additionally has a full AI software program stack up and operating, he mentioned, which may run Pytorch fashions. The general intention is to summary away any “unique” applied sciences, presenting merely a PCIe card with a software program stack that can be utilized like some other AI accelerator.
The intention for Hummingbird is to show out the software program stack and get buyer suggestions on performance, Steinman mentioned.

“We don’t have any illusions that that is going to supplant Nvidia, it’s extra in regards to the prospects of the know-how—we want a authentic, functioning proof level,” he addded.
“We wish to use Hummingbird primarily as a automobile to allow conversations, to get to purpose-built semi-custom implementations with companions,” he mentioned. “The subsequent era will in all probability be semi-custom implementations working with companions, then possibly develop an ordinary interface template that’s a bit extra generic. I believe these first few adopters will wish to do a really shut collaboration, however we’re open to any mannequin; we don’t wish to pre-suppose the best way folks wish to do enterprise, and we’re versatile sufficient to try this at this level.”
Future generations of Hummingbird will use reticle-stitching know-how (which etches a take a look at sample on the reticle boundary to check stepper alignment) to permit photonic interposers greater than the reticle restrict to assist many-chiplet architectures. Future know-how generations may see separate photonic transmitter/receiver chiplets related electrically to compute and reminiscence chiplets, and/or licensed transmitter/receiver IP embedded into buyer chiplets.
Hummingbird PCIe playing cards have been sampled to an early accomplice, with full availability of the cardboard and the software program growth equipment coming in Q3 2023.
