We’re Building the Future of Data Infrastructure

Archive for the 'AI' Category

  • June 12, 2024

    How AI Will Change the Building Blocks of Semis

    By Michael Kanellos, Head of Influencer Relations, Marvell

    Aaron Thean, points to a slide featuring the downtown skylines of New York, Singapore and San Francisco along with a prototype of a 3D processor and asks, “Which one of these things is not like the other?”

    The answer? While most gravitate to the processor, San Francisco is a better answer. With a population well under 1 million, the city’s internal transportation and communications systems don’t come close to the level of complexity, performance and synchronization required by the other three.

    With future chips, “we’re talking about trillions of transistors on multiple substrates,” said Thean, the deputy president of the National University of Singapore and the director of SHINE, an initiative to expand Singapore’s role in the development of chipets, during a one-day summit sponsored by Marvell and the university.

  • June 11, 2024

    How AI Will Drive Cloud Switch Innovation

    This article is part five in a series on talks delivered at Accelerated Infrastructure for the AI Era, a one-day symposium held by Marvell in April 2024. 

    AI has fundamentally changed the network switching landscape. AI requirements are driving foundational shifts in the industry roadmap, expanding the use cases for cloud switching semiconductors and creating opportunities to redefine the terrain.

    Here’s how AI will drive cloud switching innovation.

    A changing network requires a change in scale

    In a modern cloud data center, the compute servers are connected to themselves and the internet through a network of high-bandwidth switches. The approach is like that of the internet itself, allowing operators to build a network of any size while mixing and matching products from various vendors to create a network architecture specific to their needs.

    Such a high-bandwidth switching network is critical for AI applications, and a higher-performing network can lead to a more profitable deployment.

    However, expanding and extending the general-purpose cloud network to AI isn’t quite as simple as just adding more building blocks. In the world of general-purpose computing, a single workload or more can fit on a single server CPU. In contrast, AI’s large datasets don’t fit on a single processor, whether it’s a CPU, GPU or other accelerated compute device (XPU), making it necessary to distribute the workload across multiple processors. These accelerated processors must function as a single computing element. 

    AI calls for enhanced cloud switch architecture

    AI requires accelerated infrastructure to split workloads across many processors.

  • June 06, 2024

    Silicon Photonics Comes of Age

    This article is part four in a series on talks delivered at Accelerated Infrastructure for the AI Era, a one-day symposium held by Marvell in April 2024. 

    Silicon photonics—the technology of manufacturing the hundreds of components required for optical communications with CMOS processes—has been employed to produce coherent optical modules for metro and long-distance communications for years. The increasing bandwidth demands brought on by AI are now opening the door for silicon photonics to come inside data centers to enhance their economics and capabilities.  

    What’s inside an optical module?

    As the previous posts in this series noted, critical semiconductors like digital signal processors (DSPs), transimpedance amplifiers (TIAs) and drivers for producing optical modules have steadily improved in terms of performance and efficiency with each new generation of chips thanks to Moore’s Law and other factors.

    The same is not true for optics. Modulators, multiplexers, lenses, waveguides and other devices for managing light impulses have historically been delivered as discrete components.

    “Optics pretty much uses piece parts,” said Loi Nguyen, executive vice president and general manager of cloud optics at Marvell. “It is very hard to scale.”

    Lasers have been particularly challenging with module developers forced to choose between a wide variety of technologies. Electro-absorption-modulated (EML) lasers are currently the only commercially viable option capable of meeting the 200G per second speed necessary to support AI models. Often used for longer links, EML is the laser of choice for 1.6T optical modules. Not only is fab capacity for EML lasers constrained, but they are also incredibly expensive. Together, these factors make it difficult to scale at the rate needed for AI.

  • June 02, 2024

    A Deep Dive into the Copper and Optical Interconnects Weaving AI Clusters Together

    This article is part three in a series on talks delivered at Accelerated Infrastructure for the AI Era, a one-day symposium held by Marvell in April 2024.

    Twenty-five years ago, network bandwidth ran at 100 Mbps, and it was aspirational to think about moving to 1 Gbps over optical. Today, links are running at 1 Tbps over optical, or 10,000 times faster than cutting edge speeds two decades ago.

    Another interesting fact. “Every single large language model today runs on compute clusters that are enabled by Marvell’s connectivity silicon,” said Achyut Shah, senior vice president and general manager of Connectivity at Marvell.

    To keep ahead of what customers need, Marvell continually seeks to boost capacity, speed, and performance of the digital signal processors (DSPs), transimpedance amplifiers or TIAs, drivers, firmware and other components inside interconnects. It’s an interdisciplinary endeavor involving expertise in high frequency analog, mixed signal, digital, firmware, software and other technologies. The following is a map to the different components and challenges shaping the future of interconnects and how that future will shape AI.

    Inside the Data Center

    From a high level, optical interconnects perform the task their name implies: they deliver data from one place to another while keeping errors from creeping in during transmission. Another important task, however, is enabling data center operators to scale quickly and reliably.

    “When our customers deploy networks, they don’t start deploying hundreds or thousands at a time,” said Shah. “They have these massive data center clusters—tens of thousands, hundreds of thousands and millions of (computing) units—that all need to work and come up at the exact same time. These are at multiple locations, across different data centers. The DSP helps ensure that they don’t have to fine tune every link by hand.”

    Optical Interconnect Module

     

  • May 23, 2024

    Scaling AI Means Scaling Interconnects

    This article is part two in a series on talks delivered at Accelerated Infrastructure for the AI Era, a one-day symposium held by Marvell in April 2024.

    Interconnects have played a key role in enabling technology since the dawn of computing. During World War II, Alan Turing used the Turing machine to perform mathematical computations to break the Nazi’s code. This fast—at least at the time—computer used a massive parallel system and numerous interconnects. Eighty years later, interconnects play a similar role for AI—providing a foundation for massively parallel problems. However, with the growth of AI comes unique networking challenges—and Marvell is poised to meet the needs of this ever-growing market.

    What’s driving interconnect growth?
    Before 2023, the interconnect world was a different place. Interconnect speeds were driven by the pace of cloud data center server upgrades: the upgrades occurred every four years so the speed of interconnects doubled every four years at the same time. In 2023, generative AI took the interconnect wheel, and demand for AI is driving speeds to double every two years. And, while copper remains a viable technology for chip-to-chip and other short reach connections, optical is the dominant medium for AI.

    “Optical is the only technology that can give you the bandwidth and reach needed to connect hundreds and thousands and tens of thousands of servers across the whole data center,” said Dr. Loi Nguyen, executive vice president and general manager of Cloud Optics at Marvell. “No other technology can do the job—except optical.”

    AI doubles interconnect speed in half the time

  • May 14, 2024

    The AI Opportunity at Marvell

    Two trillion dollars. That’s the GDP of Italy. It’s the rough market capitalization of Amazon, of Alphabet and of Nvidia. And, according to analyst firm Dell’Oro, it’s the amount of AI infrastructure CAPEX expected to be invested by data center operators over the next five years. It’s an historically massive investment, which begs the question: Does the return on AI justify the cost?

    The answer is a resounding yes.

    AI is fundamentally changing the way we live and work. Beyond chatbots, search results, and process automation, companies are using AI to manage risk, engage customers, and speed time to market. New use cases are continuously emerging in manufacturing, healthcare, engineering, financial services, and more. We’re at the beginning of a generational inflection point that, according to McKinsey, has the potential to generate $4.4 trillion in annual economic value. 

    In that light, two trillion dollars makes sense. It will be financed through massive gains in productivity and efficiency.

    Our view at Marvell is that the AI opportunity before us is on par with that of the internet, the PC, and cloud computing. “We’re as well positioned as any company in technology to take advantage of this,” said chairman and CEO Matt Murphy at the recent Marvell Accelerated Infrastructure for the AI Era investor event in April 2024.

  • January 25, 2024

    How PCIe Interconnect is Critical for the Emerging AI Era

    By Annie Liao, Product Management Director, Connectivity, Marvell

    PCIe has historically been used as protocol for communication between CPU and computer subsystems. It has gradually increased speed since its debut in 2003 (PCI Express) and after 20 years of PCIe development, we are currently at PCIe Gen 5 with I/O bandwidth of 32Gbps per lane. There are many factors driving the PCIe speed increase. The most prominent ones are artificial intelligence (AI) and machine learning (ML). In order for CPU and AI Accelerators/GPUs to effectively work with each other for larger training models, the communication bandwidth of the PCIe-based interconnects between them needs to scale to keep up with the exponentially increasing size of parameters and data sets used in AI models. As the number of PCIe lanes supported increases with each generation, the physical constraints of the package beachfront and PCB routing put a limit to the maximum number of lanes in a system. This leaves I/O speed increase as the only way to push more data transactions per second. The compute interconnect bandwidth demand fueled by AI and ML is driving a faster transition to the next generation of PCIe, which is PCIe Gen 6.

    PCIe has been using 2-level Non-Return-to-Zero (NRZ) modulation since its inception. Increasing PCIe speed up to Gen 5 has been achieved through doubling of the I/O speed. For Gen 6, PCI-SIG decided to adopt Pulse-Amplitude Modulation 4 (PAM4), which carries 4-level signal encoding 2 bits of data (00, 01, 10, 11). The reduced margin resulting from the transition of 2-level signaling to 4-level signaling has also necessitated the use of Forward Error Correction (FEC) protection, a first for PCIe links. With the adoptions of PAM4 signaling and FEC, Gen 6 marks an inflection point for PCIe both from signaling and protocol layer perspectives. 

    In addition to AI/ML, disaggregation of memory and storage is an emerging trend in compute applications that has a significant impact in the applications of PCIe based interconnect. PCIe has historically been adopted on-board and for in-chassis interconnects. Attaching more front-facing NVMe SSDs is one of the common PCIe interconnect examples. With the increasing trends toward flexible resource allocation, and the advancement of CXL technology, the server industry is now moving toward disaggregated and composable infrastructure. In this disaggregated architecture, the PCIe end points are located at different chassis away from the PCIe root complex, requiring the PCIe link to travel out of the system chassis. This is typically achieved through direct attach cables (DAC) that can range up to 3-5m.

  • October 19, 2023

    Shining a Light on Marvell Optical Technology and Innovation in the AI Era

    By Kristin Hehir, Senior Manager, PR and Marketing, Marvell

    The sheer volume of data traffic moving across networks daily is mind-boggling almost any way you look at it. During the past decade, global internet traffic grew by approximately 20x, according to the International Energy Agency. One contributing factor to this growth is the popularity of mobile devices and applications: Smartphone users spend an average of 5 hours a day, or nearly 1/3 of their time awake, on their devices, up from three hours just a few years ago. The result is incredible amounts of data in the cloud that need to be processed and moved. Around 70% of data traffic is east-west traffic, or the data traffic inside data centers. Generative AI, and the exponential growth in the size of data sets needed to feed AI, will invariably continue to push the curb upward.

    Yet, for more than a decade, total power consumption has stayed relatively flat thanks to innovations in storage, processing, networking and optical technology for data infrastructure. The debut of PAM4 digital signal processors (DSPs) for accelerating traffic inside data centers and coherent DSPs for pluggable modules have played a large, but often quiet, role in paving the way for growth while reducing cost and power per bit.

    Marvell at ECOC 2023

    At Marvell, we’ve been gratified to see these technologies get more attention. At the recent European Conference on Optical Communication, Dr. Loi Nguyen, EVP and GM of Optical at Marvell, talked with Lightwave editor in chief, Sean Buckley, on how Marvell 800 Gbps and 1.6 Tbps technologies will enable AI to scale.   

  • September 05, 2023

    800G: An Inflection Point for Optical Networks

    By Samuel Liu, Senior Director, Product Line Management, Marvell

    Digital technology has what you could call a real estate problem. Hyperscale data centers now regularly exceed 100,000 square feet in size. Cloud service providers plan to build 50 to 100 edge data centers a year and distributed applications like ChatGPT are further fueling a growth of data traffic between facilities. Similarly, this explosive surge in traffic also means telecommunications carriers need to upgrade their wired and wireless networks, a complex and costly undertaking that will involve new equipment deployment in cities all over the world.

    Weaving all of these geographically dispersed facilities into a fast, efficient, scalable and economical infrastructure is now one of the dominant issues for our industry.

    Pluggable modules based on coherent digital signal processors (CDSPs) debuted in the last decade to replace transponders and other equipment used to generate DWDM compatible optical signals. These initial modular products didn’t offer the same performance as the incumbent solutions, and could only be deployed in limited use cases. These early modules, with their large form factors, had performance limitations and did not support the required high-density data transmission. Over time, advances in technology optimized the performance of pluggable modules, and CDSP speeds grew from 100 to 200 and 400 Gbps. Continued innovation, and the development of an open ecosystem, helped expand the potential applications.

  • June 27, 2023

    Scaling AI Infrastructure with High-Speed Optical Connectivity

    By Suhas Nayak, Senior Director of Solutions Marketing, Marvell

     

    In the world of artificial intelligence (AI), where compute performance often steals the spotlight, there's an unsung hero working tirelessly behind the scenes. It's something that connects the dots and propels AI platforms to new frontiers. Welcome to the realm of optical connectivity, where data transfer becomes lightning-fast and AI's true potential is unleashed. But wait, before you dismiss the idea of optical connectivity as just another technical detail, let's pause and reflect. Think about it: every breakthrough in AI, every mind-bending innovation, is built on the shoulders of data—massive amounts of it. And to keep up with the insatiable appetite of AI workloads, we need more than just raw compute power. We need a seamless, high-speed highway that allows data to flow freely, powering AI platforms to conquer new challenges. 

    In this post, I’ll explain the importance of optical connectivity, particularly the role of DSP-based optical connectivity, in driving scalable AI platforms in the cloud. So, buckle up, get ready to embark on a journey where we unlock the true power of AI together. 

  • June 12, 2023

    AI and the Tectonic Shift Coming to Data Infrastructure

    By Michael Kanellos, Head of Influencer Relations, Marvell

    AI’s growth is unprecedented from any angle you look at it. The size of large training models is growing 10x per year. ChatGPT’s 173 million plus users are turning to the website an estimated 60 million times a day (compared to zero the year before.). And daily, people are coming up with new applications and use cases. 

    As a result, cloud service providers and others will have to transform their infrastructures in similarly dramatic ways to keep up, says Chris Koopmans, Chief Operations Officer at Marvell in conversation with Futurum’s Daniel Newman during the Six Five Summit on June 8, 2023. 

    “We are at the beginning of at least a decade-long trend and a tectonic shift in how data centers are architected and how data centers are built,” he said.  

    The transformation is already underway. AI training, and a growing percentage of cloud-based inference, has already shifted from running on two-socket servers based around general processors to systems containing eight more GPUs or TPUs optimized to solve a smaller set of problems more quickly and efficiently.  

  • May 22, 2023

    Are We Ready for Large-scale AI Workloads?

    By Noam Mizrahi, Executive Vice President, Chief Technology Officer, Marvell

    Originally published in Embedded

    ChatGPT has fired the world’s imagination about AI. The chatbot can write essays, compose music, and even converse in different languages. If you’ve read any ChatGPT poetry, you can see it doesn’t pass the Turing Test yet, but it’s a huge leap forward from what even experts expected from AI just three months ago. Over one million people became users in the first five days, shattering records for technology adoption.

    The groundswell also strengthens arguments that AI will have an outsized impact on how we live—with some predicting AI will contribute significantly to global GDP by 2030 by fine-tuning manufacturing, retail, healthcare, financial systems, security, and other daily processes.

    But the sudden success also shines light on AI’s most urgent problem: our computing infrastructure isn’t built to handle the workloads AI will throw at it. The size of AI networks grew by 10x per year over the last 5 years. By 2027 one in five Ethernet switch ports in data centers will be dedicated to AI, ML and accelerated computing.

  • March 10, 2023

    Introducing Nova, a 1.6T PAM4 DSP Optimized for High-Performance Fabrics in Next-Generation AI/ML Systems

    By Kevin Koski, Product Marketing Director, Marvell

    Last week, Marvell introduced Nova™, its latest, fourth generation PAM4 DSP for optical modules. It features breakthrough 200G per lambda optical bandwidth, which enables the module ecosystem to bring to market 1.6 Tbps pluggable modules. You can read more about it in the press release and the product brief.

    In this post, I’ll explain why the optical modules enabled by Nova are the optimal solution to high-bandwidth connectivity in artificial intelligence and machine learning systems.

    Let’s begin with a look into the architecture of supercomputers, also known as high-performance computing (HPC).

    Historically, HPC has been realized using large-scale computer clusters interconnected by high-speed, low-latency communications networks to act as a single computer. Such systems are found in national or university laboratories and are used to simulate complex physics and chemistry to aid groundbreaking research in areas such as nuclear fusion, climate modeling and drug discovery. They consume megawatts of power.

    The introduction of graphics processing units (GPUs) has provided a more efficient way to complete specific types of computationally intensive workloads. GPUs allow for the use of massive, multi-core parallel processing, while central processing units (CPUs) execute serial processes within each core. GPUs have both improved HPC performance for scientific research purposes and enabled a machine learning (ML) renaissance of sorts. With these advances, artificial intelligence (AI) is being pursued in earnest.

  • March 02, 2023

    Introducing the 51.2T Teralynx 10, the Industry’s Lowest Latency Programmable Switch

    By Amit Sanyal, Senior Director, Product Marketing, Marvell

    If you’re one of the 100+ million monthly users of ChatGPT—or have dabbled with Google’s Bard or Microsoft’s Bing AI—you’re proof that AI has entered the mainstream consumer market.

    And what’s entered the consumer mass-market will inevitably make its way to the enterprise, an even larger market for AI. There are hundreds of generative AI startups racing to make it so. And those responsible for making these AI tools accessible—cloud data center operators—are investing heavily to keep up with current and anticipated demand.

    Of course, it’s not just the latest AI language models driving the coming infrastructure upgrade cycle. Operators will pay equal attention to improving general purpose cloud infrastructure too, as well as take steps to further automate and simplify operations.

    Teralynx 10

    To help operators meet their scaling and efficiency objectives, today Marvell introduces Teralynx® 10, a 51.2 Tbps programmable 5nm monolithic switch chip designed to address the operator bandwidth explosion while meeting stringent power- and cost-per-bit requirements. It’s intended for leaf and spine applications in next-generation data center networks, as well as AI/ML and high-performance computing (HPC) fabrics.

    A single Teralynx 10 replaces twelve of the 12.8 Tbps generation, the last to see widespread deployment. The resulting savings are impressive: 80% power reduction for equivalent capacity.

  • February 14, 2023

    The Three Things Next-Generation Data Centers Need from Networking

    By Amit Sanyal, Senior Director, Product Marketing, Marvell

    Data centers are arguably the most important buildings in the world. Virtually everything we do—from ordinary business transactions to keeping in touch with relatives and friends—is accomplished, or at least assisted, by racks of equipment in large, low-slung facilities.

    And whether they know it or not, your family and friends are causing data center operators to spend more money. But it’s for a good cause: it allows your family and friends (and you) to continue their voracious consumption, purchasing and sharing of every kind of content—via the cloud.

    Of course, it’s not only the personal habits of your family and friends that are causing operators to spend. The enterprise is equally responsible. They’re collecting data like never before, storing it in data lakes and applying analytics and machine learning tools—both to improve user experience, via recommendations, for example, and to process and analyze that data for economic gain. This is on top of the relentless, expanding adoption of cloud services.

  • December 05, 2022

    Leading Lights Award Recognizes Deneb CDSP Leadership

    By Johnny Truong, Senior Manager, Public Relations, Marvell

    At this weeks’ Leading Lights Awards Ceremony, hosted by Light Reading, Editor-in-Chief Phil Harvey announced that the Marvell® Deneb™ Coherent Digital Signal Processor (CDSP) is the winner of the Most Innovative Service Provider Transport Solution category. This recognition is awarded to the optical systems vendor or optical components vendor providing the most innovative optical transport solution for service provider customers.

    Driving the industry's largest standards-based ecosystem, the Marvell Deneb CDSP enables disaggregation which is critical for carriers to lower their CAPEX and OPEX as they increase network capacity. This recognition underscores Marvell’s success in bringing leading-edge density and performance optimization advantages to carrier networks.

    In its 18th year, the Leading Lights is Light Reading’s flagship awards program which recognizes top companies and executives for their outstanding achievements in next-generation communications technology, applications, services, strategies, and innovations.

    Visit the Light Reading blog for a full list of categories, finalists and winners.

  • October 12, 2022

    The Evolution of Cloud Storage and Memory

    By Gary Kotzur, CTO, Storage Products Group, Marvell and Jon Haswell, SVP, Firmware, Marvel

    The nature of storage is changing much more rapidly than it ever has historically. This evolution is being driven by expanding amounts of enterprise data and the inexorable need for greater flexibility and scale to meet ever-higher performance demands.

    If you look back 10 or 20 years, there used to be a one-size-fits-all approach to storage. Today, however, there is the public cloud, the private cloud, and the hybrid cloud, which is a combination of both. All these clouds have different storage and infrastructure requirements. What’s more, the data center infrastructure of every hyperscaler and cloud provider is architecturally different and is moving towards a more composable architecture. All of this is driving the need for highly customized cloud storage solutions as well as demanding the need for a comparable solution in the memory domain.

Archives