Home » Doug Fuller, VP of Software Engineering at Cornelis Networks

Doug Fuller, VP of Software Engineering at Cornelis Networks

by Narnia
0 comment

As Vice President of Software Engineering, Doug is accountable for all facets of the Cornelis Networks’ software program stack, together with the Omni-Path Architecture drivers, messaging software program, and embedded machine management programs. Before becoming a member of Cornelis Networks, Doug led software program engineering groups at Red Hat in cloud storage and information providers. Doug’s profession in HPC and cloud computing started at Ames National Laboratory’s Scalable Computing Laboratory. Following a number of roles in college analysis computing, Doug joined the US Department of Energy’s Oak Ridge National Laboratory in 2009, the place he developed and built-in new applied sciences on the world-class Oak Ridge Leadership Computing Facility.

Cornelis Networks is a expertise chief delivering purpose-built high-performance materials for High Performance Computing (HPC), High Performance Data Analytics (HPDA), and Artificial Intelligence (AI) to main business, scientific, tutorial, and authorities organizations.

What initially attracted you to laptop science?

I simply appeared to get pleasure from working with expertise. I loved working with the computer systems rising up; we had a modem at our college that permit me check out the Internet and I discovered it fascinating. As a freshman in faculty, I met a USDOE computational scientist whereas volunteering for the National Science Bowl. He invited me to tour his HPC lab and I used to be hooked. I’ve been a supercomputer geek ever since.

You labored at Red Hat from 2015 to 2019, what have been a number of the tasks you labored on and your key takeaways from this expertise?

My principal challenge at Red Hat was Ceph distributed storage. I’d beforehand centered completely on HPC and this gave me a chance to work on applied sciences that have been vital to cloud infrastructure. It rhymes. Many of the ideas of scalability, manageability, and reliability are extraordinarily related though they’re geared toward fixing barely totally different issues. In phrases of expertise, my most essential takeaway was that cloud and HPC have so much to be taught from each other. We’re more and more constructing totally different tasks with the identical Lego set. It’s actually helped me perceive how the enabling applied sciences, together with materials, can come to bear on HPC, cloud, and AI purposes alike. It’s additionally the place I actually got here to know the worth of Open Source and learn how to execute the Open Source, upstream-first software program growth philosophy that I introduced with me to Cornelis Networks. Personally, Red Hat was the place I actually grew and matured as a pacesetter.

You’re at present the Vice President of Software Engineering at Cornelis Networks, what are a few of your obligations and what does your common day appear like?

As Vice President of Software Engineering, I’m accountable for all facets of the Cornelis Networks’ software program stack, together with the Omni-Path Architecture drivers, messaging software program, cloth administration, and embedded machine management programs. Cornelis Networks is an thrilling place to be, particularly on this second and this market. Because of that, I’m undecided I’ve an “common” day. Some days I’m working with my crew to resolve the newest expertise problem. Other days I’m interacting with our {hardware} architects to verify our next-generation merchandise will ship for our prospects. I’m typically within the subject assembly with our superb group of shoppers and collaborators ensuring we perceive and anticipate their wants.

Cornelis Networks gives subsequent technology networking for High Performance Computing and AI purposes, may you share some particulars on the {hardware} that’s supplied?

Our {hardware} consists of a high-performance switched cloth sort community cloth answer. To that finish, we offer all the required units to completely combine HPC, cloud, and AI materials. The Omni-Path Host-Fabric Interface (HFI) is a low-profile PCIe card for endpoint units. We additionally produce a 48-port 1U “top-of-rack” swap. For bigger deployments, we make two fully-integrated “director-class” switches; one which packs 288 ports in 7U and an 1152-port, 20U machine.

Can you talk about the software program that manages this infrastructure and the way it’s designed to lower latency?

First, our embedded administration platform offers simple set up and configuration in addition to entry to all kinds of efficiency and configuration metrics produced by our swap ASICs.

Our driver software program is developed as a part of the Linux kernel. In reality, we submit all our software program patches to the Linux kernel group immediately. That ensures that each one of our prospects get pleasure from most compatibility throughout Linux distributions and straightforward integration with different software program reminiscent of Lustre. While not within the latency path, having an in-tree driver dramatically reduces set up complexity.

The Omni-Path cloth supervisor (FM) configures and routes an Omni-Path cloth. By optimizing visitors routes and recovering shortly from faults, the FM offers industry-leading efficiency and reliability on materials from tens to hundreds of nodes.

Omni-Path Express (OPX) is our high-performance messaging software program, not too long ago launched in November 2022. It was particularly designed to scale back latency in comparison with our earlier messaging software program. We ran cycle-accurate simulations of our ship and obtain code paths so as to reduce instruction rely and cache utilization. This produced dramatic outcomes: once you’re within the microsecond regime, each cycle counts!

We additionally built-in with the OpenFabrics Interfaces (OFI), an open customary produced by the OpenFabrics Alliance. OFI’s modular structure helps reduce latency by permitting higher-level software program, reminiscent of MPI, to leverage cloth options with out further operate calls.

The complete community can also be designed to extend scalability, may you share some particulars on the way it is ready to scale so effectively?

Scalability is on the core of Omni-Path’s design ideas. At the bottom ranges, we use Cray link-layer expertise to appropriate hyperlink errors with no latency affect. This impacts materials in any respect scales however is especially essential for large-scale materials, which naturally expertise extra hyperlink errors. Our cloth supervisor is concentrated each on programming optimum routing tables and on doing so in a speedy method. This ensures that routing for even the biggest materials might be accomplished in a minimal period of time.

Scalability can also be a vital part of OPX. Minimizing cache utilization improves scalability on particular person nodes with giant core counts. Minimizing latency additionally improves scalability by bettering time to completion for collective algorithms. Using our host-fabric interface sources extra effectively permits every core to speak with extra distant friends. The strategic selection of libfabric permits us to leverage software program options like scalable endpoints utilizing customary interfaces.

Could you share some particulars on how AI is included into a number of the workflow at Cornelis Networks?

We’re not fairly prepared to speak externally about our inner makes use of of and plans for AI. That mentioned, we do eat our personal pet food, so we get to benefit from the latency and scalability enhancements we have made to Omni-Path to help AI workloads. It makes us all of the extra excited to share these advantages with our prospects and companions. We have actually noticed that, like in conventional HPC, scaling out infrastructure is the one path ahead, however the problem is that community efficiency is definitely stifled by Ethernet and different conventional networks.

What are some modifications that you simply foresee within the {industry} with the appearance of generative AI?

First off, the usage of generative AI will make folks extra productive – no expertise in historical past has made human beings out of date. Every expertise evolution and revolution we’ve had from the cotton gin to the automated loom to the phone, web and past have made sure jobs extra environment friendly, however we haven’t labored humanity out of existence.

Through the applying of generative AI, I consider firms will technologically advance at a sooner charge as a result of these operating the corporate could have extra free time to give attention to these developments. For occasion, if generative AI offers extra correct forecasting, reporting, planning, and so forth. – firms can give attention to innovation of their subject of experience

I particularly really feel that AI will make every of us a multidisciplinary skilled. For instance, as a scalable software program skilled, I perceive the connections between HPC, large information, cloud, and AI purposes that drive them towards options like Omni-Path. Equipped with a generative AI assistant, I can delve deeper into the that means of the purposes utilized by our prospects. I’ve little doubt that it will assist us design much more efficient {hardware} and software program for the markets and prospects we serve.

I additionally foresee an general enchancment in software program high quality. AI can successfully operate as “one other set of eyes” to statically analyze code and develop insights into bugs and efficiency issues. This will likely be significantly fascinating at giant scales the place efficiency points might be significantly tough to identify and costly to breed.

Finally, I hope and consider that generative AI will assist our {industry} to coach and onboard extra software program professionals with out earlier expertise in AI and HPC. Our subject can appear formidable to many and it could actually take time to be taught to “suppose in parallel.” Fundamentally, identical to machines made it simpler to fabricate issues, generative AI will make it simpler to think about and purpose about ideas.

Is there anything that you simply wish to share about your work or Cornelis Networks normally?

I’d prefer to encourage anybody with the curiosity to pursue a profession in computing, particularly in HPC and AI. In this subject, we’re outfitted with essentially the most highly effective computing sources ever constructed and we convey them to bear towards humanity’s best challenges. It’s an thrilling place to be, and I’ve loved it each step of the way in which. Generative AI brings our subject to even newer heights because the demand for rising functionality will increase drastically. I can not wait to see the place we go subsequent.

Thank you for the good interview, readers who want to be taught extra ought to go to Cornelis Networks.

You may also like

Leave a Comment